Ad Widget

Collapse

Zabbix unreachable poller processes more than 75% busy. How to solve this problem?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cesarsj
    Senior Member
    • Dec 2018
    • 154

    #1

    Zabbix unreachable poller processes more than 75% busy. How to solve this problem?

    First of all, I need to know the basics, what is a pooler type process?

    I saw that the key of the alerting item is zabbix [process, poller, avg, busy], which would be the average percentage of poller processes that are busy.

    I saw from the zabbix server log that there are several SNMP connection failures. Yesterday I solved some.

    In another link I saw that discoveries of interfaces that no longer exist can cause this problem. I even have the monitoring of a radio that does interface discovery and that radio often drops. Would it be a probable cause?
  • storyteller
    Junior Member
    • Oct 2019
    • 23

    #2
    Try to increase the StartPollersUnreachable in zabbix_server.conf
    Mine is:

    StartPollersUnreachable=80

    Comment

    • cesarsj
      Senior Member
      • Dec 2018
      • 154

      #3
      Originally posted by storyteller
      Try to increase the StartPollersUnreachable in zabbix_server.conf
      Mine is:

      StartPollersUnreachable=80
      You increased to 80 based on what calculation?

      Comment


      • storyteller
        storyteller commented
        Editing a comment
        Hello Cesarsj,

        Experimentally, when i receive this alert i increase the parameter (Ex.: from 8 to 12). This because my environment is not static, every day hosts is been added.
    • cesarsj
      Senior Member
      • Dec 2018
      • 154

      #4
      We also have a static environment, but we can't increase without my boss understanding why I increased it, based on manual or calculations, etc. He is very systematic.

      Comment

      • kloczek
        Senior Member
        • Jun 2006
        • 1771

        #5
        "unreachable poller" processes are for host monitored over passive checks. If that host are monitored over agent (are not with items which are SNMP/IPMI/ODBC/telnet agent/ssh agent) you can:
        - disable host which are down
        - switch to active monitoring
        Than you would be able to use StartPollersUnreachable=0
        http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
        https://kloczek.wordpress.com/
        zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
        My zabbix templates https://github.com/kloczek/zabbix-templates

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #6
          Hmm .. just found that server cannot start with StartPollersUnreachable=0.
          Looks like bug :/
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

        • ingus.vilnis
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Mar 2014
          • 908

          #7
          Unreachable pollers handle checks for passively monitored hosts. Not all of them can be monitored in passive way, e.g. if you have SNMP polling or scheduling intervals (as suggested you in the other thread).

          You tune the number based on the readings from the graphs and have the processes on average 30-40% busy.

          Also note that if you later change your "Timeout" setting for example, it will have effect also on unreachable pollers. It is all connected, just be aware of it.

          Comment

          • cesarsj
            Senior Member
            • Dec 2018
            • 154

            #8
            ingus.vilnis Why on average 30-40%? Does the manual say that?

            What really happens when unreachable pollers processes reach 100%? Alerts no longer issued?

            I would also like to know what happens when housekeeper processes are 100% busy. Cleaning no longer done during that cycle?

            Comment

            • ingus.vilnis
              Senior Member
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Mar 2014
              • 908

              #9
              Average 30-40% busy comes from real life experience when you don't want start too many idle processes yet keeping them busy enough and allowing some room for system growth (so that you don't need to restart your server every time you add a new host just because processes get too busy).

              Some articles on this topic:

              https://blog.zabbix.com/monitoring-h...esses-are/457/ (old but still valid)

              When unreachable pollers are 100% busy you will still get alerts. What will happen with delay is checking for unavailable hosts which are normally checked by passive pollers. Not a big deal for a short time period but if you have many hosts then better have enough unreachable pollers to handle them all in a potential big outage.

              Housekeeper is a single process and when it runs then it is 100% busy throughout its run. The question is how long does each cycle take to complete. Up to 10 minutes per hour is acceptable for smaller instances but in the long run you should look at partitioning scenarios for data removal.

              Comment

              Working...