Ad Widget

Collapse

Unreachable poller 100% / db working fine

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chlehmann
    Junior Member
    • Mar 2019
    • 14

    #1

    Unreachable poller 100% / db working fine

    Hi all
    On my test-system, i have a strange behaviour. My environment was working fine until i configured the wrong cmdb-backend and more or less completely deleted all hosts and re-imported them again. Since then, unreachable-pollers are on a constant 100%, all other pollers and internal processes are on healthy levels. DB is working well. Increasing (doubling) pollers to 24 did not work.

    Code:
    select host,hostid,name,available,snmp_available,ipmi_available,jmx_available from hosts where status = 0 and (available = 2 or snmp_available = 2 or ipmi_available = 2 or jmx_available =
    2);
    returned 91 hosts, but out of 2100 thats nothing.

    Who has an idea what could have gone wrong? I assume garbage in my database, but i was not able to find it yet. Vacuum did not help!

    Chris
  • ingus.vilnis
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Mar 2014
    • 908

    #2
    Hi Chris,

    Check the output of "ps aux | grep unreachable" command on your Zabbix server. Are they all constantly busy?

    What values have you set for the following settings in zabbix_server.conf?
    Timeout
    UnreachablePeriod
    UnavailableDelay
    UnreachableDelay

    Comment

    • chlehmann
      Junior Member
      • Mar 2019
      • 14

      #3
      Hi Ingus
      Thank you for your reply!
      Originally posted by ingus.vilnis
      Check the output of "ps aux | grep unreachable" command on your Zabbix server. Are they all constantly busy?
      Yes, unfortunately. They all get mostly 1, rarely up to 5 values, always in 30s.

      I deleted all unreachable devices and it decreased imediately. But in our (big) setting, having 90 unavailable devices is quite "normal".


      Timeout=15
      UnreachablePeriod=45
      UnavailableDelay=60
      UnreachableDelay=15

      I wanted to go ahead and tweak these, but because i can only restart in maintenance-windows i'll have to prepare this a bit. I will have to add these unavailable devices to our testing-system, in order to get a similar setup.
      Any hints on improving the situation?

      Chris

      Comment

      • ingus.vilnis
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Mar 2014
        • 908

        #4


        For every device that is off it locks the Unreachable Poller for 15 seconds trying to get an answer. If my understanding on the subject is correct then you have 24 pollers where each can do 60sec/15sec Timeout = 4 checks a minute.
        4 checks / minute by 24 pollers can theoretically achieve you 96 unreachable devices being checked every minute with no delays at 100% utilization.

        So, everything here works as configured. If you keep 15 second timeout, add more Unreachable Pollers to keep the graph readings happy.

        Have you got a good reason for having Timeout set to 15 seconds?
        The only reason I imagine is because you have some long running scripts checked from Zabbix server side. If that's your case then keep it. If you don't have such, lower the timeout to default 3 to 4 seconds.

        And if you wish to keep the offline devices, it is enough to disable them in web interface, no need to delete them entirely, in case you'll need them again.

        Best Regards,
        Ingus

        Comment

        • chlehmann
          Junior Member
          • Mar 2019
          • 14

          #5
          Thank you Ingus, that helped me a lot! I honstly don't know why we have set this on 15, but now at least i know the mechanics behind!

          I'll try to estimate how many devices can go offline in a worst-case scenario and tweak the values accordingly!

          Chris

          Comment

          • ingus.vilnis
            Senior Member
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Mar 2014
            • 908

            #6
            In the worst case scenario you'll have the pollers 100% busy anyways. Keep a normal count for daily usage plus some extra in case larger outage occurs.

            Comment

            Working...