Ad Widget

Collapse

SNMP problems with Zabbix

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Steveo
    Member
    • Jun 2013
    • 31

    #16
    Originally posted by tchjts1
    That's interesting. Housekeeper is a resource hog when it runs. Do you have housekeeper set at the default of 1 hour?

    You are going to run into issues though if you leave it disabled. Your DB will grow exponentially huge. (Unless you have it partitioned and are managing the data that way)
    I had it set to run every hour, I would rather use the housekeeper feature then have to partition tables. It is possible to have it run from the DB server instead of the application server?

    Comment

    • Steveo
      Member
      • Jun 2013
      • 31

      #17
      Well, after going through my logs from last night and watching them this morning, it seems my issue is not resolved. I still see the SNMP errors, followed immediately by the connection restored message, but it appears to be far less often and affects far less hosts. Disabling housekeeping helped, but didn't completely resolve the issues.

      Comment

      • Steveo
        Member
        • Jun 2013
        • 31

        #18
        I have re-enabled the housekeeping, it seems to be more stable now. I am still getting the SNMP item failed, followed by connection restored for 8-10 random hosts every so often. I am closer, but something still isn't right.

        SNMP item [ifAdminStatus[IP11]] on host [ENODEDC601] failed: first network error, wait for 15 seconds
        resuming SNMP checks on host [ENODEDC601]: connection restored
        SNMP item [ifOperStatus[TenGigabitEthernet 0/32]] on host [EPCOTECT-FT01-01] failed: first network error, wait for 15 seconds
        SNMP item [ifInOctets[TenGigabitEthernet 0/26]] on host [DDSMWDC6-FT01-01] failed: first network error, wait for 15 seconds
        resuming SNMP checks on host [EPCOTECT-FT01-01]: connection restored
        resuming SNMP checks on host [DDSMWDC6-FT01-01]: connection restored
        SNMP item [ifOutErrors[Port5-GigabitEthernet]] on host [ENODE1180CP01] failed: first network error, wait for 15 seconds
        SNMP item [ifAdminStatus[Ciena CN 3920 10/100/G 11]] on host [TEAMDISN-CN01-01] failed: first network error, wait for 15 seconds
        SNMP item [ifInErrors[Port5-GigabitEthernet]] on host [ENODE1180CP01] failed: first network error, wait for 15 seconds
        SNMP item [ifOutErrors[Port3-GigabitEthernet]] on host [ENODEPCT01] failed: first network error, wait for 15 seconds
        SNMP item [ifInErrors[Port5-GigabitEthernet]] on host [ENODEDTDMANAQUIN01] failed: first network error, wait for 15 seconds
        SNMP item [ifOutOctets[IP0]] on host [ENODEDAAR01] failed: first network error, wait for 15 seconds
        SNMP item [ifInErrors[TenGigabitEthernet 0/47]] on host [CELHUB01-FT01-01] failed: first network error, wait for 15 seconds
        SNMP item [ifInOctets[IP0]] on host [ENODEOSCLIBRARY01] failed: first network error, wait for 15 seconds
        SNMP item [ifOutOctets[Port3-GigabitEthernet]] on host [ENODECSBC01] failed: first network error, wait for 15 seconds
        NMP item [ifOutOctets[IP0]] on host [ENODE1170CELBV] failed: first network error, wait for 15 seconds
        SNMP item [ifNumber] on host [ENODEC202] failed: first network error, wait for 15 seconds
        SNMP item [ifOutOctets[Port3-GigabitEthernet]] on host [ENODEPERIDOT01] failed: first network error, wait for 15 seconds
        SNMP item [ifOutErrors[remote]] on host [POPCENTU-CN01-01] failed: first network error, wait for 15 seconds
        SNMP item [ifInOctets[TenGigabitEthernet 0/40]] on host [WCCSCTCO-FT01-01] failed: first network error, wait for 15 seconds
        resuming SNMP checks on host [CELHUB01-FT01-01]: connection restored
        resuming SNMP checks on host [ENODEDAAR01]: connection restored
        resuming SNMP checks on host [TEAMDISN-CN01-01]: connection restored
        resuming SNMP checks on host [ENODE1180CP01]: connection restored
        resuming SNMP checks on host [ENODEDTDMANAQUIN01]: connection restored
        resuming SNMP checks on host [ENODECSBC01]: connection restored
        resuming SNMP checks on host [ENODE1170CELBV]: connection restored
        resuming SNMP checks on host [ENODEC202]: connection restored
        resuming SNMP checks on host [ENODEOSCLIBRARY01]: connection restored
        resuming SNMP checks on host [ENODEPCT01]: connection restored
        resuming SNMP checks on host [ENODEPERIDOT01]: connection restored
        resuming SNMP checks on host [WCCSCTCO-FT01-01]: connection restored
        resuming SNMP checks on host [POPCENTU-CN01-01]: connection restored

        Comment

        • Mark T
          Junior Member
          • Nov 2011
          • 6

          #19
          Having this exact same issue. Note that this is happening in a freshly installed Zabbix 2.0.6, running the latest Debian on kernel 3.2.whatever. Using a fresh, clean MySQL database and only monitoring 2 hosts so far.

          Code:
          7391:20130626:150744.333 enabling SNMP checks on host [MyDumbHostname]: host became available
          snmp_build: unknown failure  7386:20130626:150844.349 SNMP item [0] on host [MyDumbHostname] failed: first network error, wait for 15 seconds
          snmp_build: unknown failure  7391:20130626:150859.345 SNMP item [0] on host [MyDumbHostname] failed: another network error, wait for 15 seconds
          snmp_build: unknown failure  7391:20130626:150914.351 SNMP item [0] on host [MyDumbHostname] failed: another network error, wait for 15 seconds
          snmp_build: unknown failure  7391:20130626:150929.352 SNMP item [0] on host [MyDumbHostname] failed: another network error, wait for 15 seconds
          snmp_build: unknown failure  7391:20130626:150944.356 temporarily disabling SNMP checks on host [MyDumbHostname]: host unavailable
          7391:20130626:151144.373 enabling SNMP checks on host [MyDumbHostname]: host became available
          And it keeps looping like that, for all monitored hosts. Now, it doesn't seem to affect anything - data keeps coming in just fine. But it probably means SOMETHING is wrong, and I'd like to keep this log as clean as possible.

          Comment

          • Steveo
            Member
            • Jun 2013
            • 31

            #20
            I finally got everything working. I edited the template to increase the intervals for alias, description, operational status, etc. I left the traffic at 60 sec. I also made sure to start an unreachable poller for every poller I started. I have housekpeer running every hour. I also increased the DB syncers. I am no longer getting the SNMP errors in my logs unless there is actually an issue.

            Comment

            • tchjts1
              Senior Member
              • May 2008
              • 1605

              #21
              Originally posted by Steveo
              I also increased the DB syncers.
              What did you set that value to? It is not one that you want to go overboard with!

              Comment

              • Steveo
                Member
                • Jun 2013
                • 31

                #22
                Originally posted by tchjts1
                What did you set that value to? It is not one that you want to go overboard with!
                I set the DB sycners to 25

                Comment

                • tchjts1
                  Senior Member
                  • May 2008
                  • 1605

                  #23
                  Originally posted by Steveo
                  I set the DB sycners to 25
                  Ooooo. I don't think I would do that. At one time I had Zabbix support review my settings and they strongly suggested I back down from having mine set at 16.

                  In fact, this is a quote from that review:

                  I think it's too many to have StartDBSyncers=16. The extra parallelism in DB writing depends on DB performance. It's rarely needed to increase default StartDBSyncers=4, and only then in big environments, with very good DB performance. I don't remember cases where we needed to setup more than 8 syncers.

                  Comment

                  Working...