Ad Widget

Collapse

Zabbix Proxy and Device Polling Issues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Zabbrad
    Junior Member
    • Sep 2023
    • 8

    #1

    Zabbix Proxy and Device Polling Issues

    We currently have a new setup with the following:
    • 2x Zabbix Proxy servers (in different regions polling their own devices). Lets call these proxy01 and proxy02.
    • 1x Zabbix Server centralised to collect the values from both proxy servers.
    The Zabbix Proxy servers are virtual machines with 6vCPU and 16GB of RAM. The Zabbix Server is bare-metal with 16 cores and 196GB of RAM. The /etc/zabbix/zabbix_proxy.conf appears to the standard one with the additional lines of configuration:
    Code:
    StartPollersUnreachable=8
    StartPingers=8
    StartDiscoverers=8
    CacheSize=1G
    Timeout=30​
    The problem that we have is the Zabbix Values Per Second seems to take a nose-dive periodically (we average around ~1000 or so and this drops <20 sometimes). I've also noticed the SNMP polling last checks don't seem to run as frequently (proxy02 appears to be the most impacted). When I go in to "Administration -> Queue -> Queue overview" I can see a large number of values (thousands) in the 1min, 5min and more than 10mins for SNMP agent. When I go in to "Administration -> Queue -> Queue overview by proxy" I can see proxy02 has over 2,700 in more than 10mins. Any ideas on what might be happening? So far, the only devices and SNMP templates we're using on proxy02 are:
    • Cisco Catalyst 9500
      • Cisco BGP SNMP
      • Cisco IOS SNMP
    • Cisco Catalyst 9300
      • Cisco IOS SNMP
    • Cisco Catalyst 9200
      • Cisco IOS SNMP
    • Cisco Nexus 9K
      • Cisco Nexus 9000 Series SNMP
    • Cisco ASA
      • Cisco ASAv SNMP (not actually working atm)
    Other things to note:
    • Proxy01 and the Zabbix server are in the same data center.
    • Proxy02 and the Zabbix server are ~200ms apart from each other.
    Any ideas on what may be happening would be greatly appreciated.​
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    I have a gut feeling, that its more the issue of those devices not answering quickly enough than something between server and proxy...
    How many pollers do you run on proxy... ? Default 5? That is definitely too few... I have 45-60 pollers to keep up 400-600 nvps proxy...

    You should get some overview of how busy those processes are, if you install proxy monitoring templates..

    Comment

    • Zabbrad
      Junior Member
      • Sep 2023
      • 8

      #3
      I've added the Zabbix Proxy to Zabbix for monitoring... via the proxy (as per the recommendation here: https://blog.zabbix.com/zabbix-proxy...hooting/14013/). There is a lot to digest but nothing is really jumping out as problematic. Regarding your previous question..
      How many pollers do you run on proxy... ? Default 5? That is definitely too few... I have 45-60 pollers to keep up 400-600 nvps proxy...
      How do I check this? I currently have StartPollers=8 in our configuration file, assuming this is it I can try increasing this to a value of 100.

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        yes, that is the one..


        Double the amount, see if it helps, if not, double again and so on.. 8->16->32->64 .. at one point you should have enough to process all queries.. your proxy monitoring should show poller utilization ...
        And also if you look on command line with "watch 'ps -fu zabbix | grep "zabbix_proxy: poller" ' " and you see all those proxy processes "gettting values" and not idling much, then they are pretty busy and you should increase the number ...

        Comment

        • Zabbrad
          Junior Member
          • Sep 2023
          • 8

          #5
          I am seeing a lot of "got 0 values in $small-time-scale sec, idle 1 sec" on the graphs I'm seeing a lot of gaps in the data but did notice Zabbix proxy: Utilization of poller data collector processes, in % was quite high at one point (95%) in the last couple of hours reaching 95% and Zabbix proxy: Utilization of data sender internal processes, in % reaching 70% before a gap occurred.

          Comment

          Working...