Ad Widget

Collapse

Trouble after Zabbix proxies become unavailable

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fotto
    Junior Member
    • Jul 2020
    • 3

    #1

    Trouble after Zabbix proxies become unavailable

    Dear Zabbix community,

    looking for some help in tuning our configuration to deal with the following problem. Most of our data is collected by 9 Zabbix proxies (Zabbix server itself only monitors the proxy nodes and a handful of other nodes), and today 3 of them went offline due to data centre issues. I created a maintenance entry turning off data collection for the hosts behind the proxies and for the proxy nodes themselves. Still, after a while the "utilization of poller data collector processes" crept up and went above 75%. Normally this utilization is below 1%. Then I set all the hosts behind these proxies to disabled, bumped the Zabbix server config to StartPollers=50 and StartPollersUnreachable=10, and restarted the Zabbix server. But the high utilization of pollers kept coming back. Eventually also the pollers on the other 6 Zabbix proxies (which are fine otherwise) reach a high utilization! Then I've also bumped these proxies to StartPollers=10 and StartPollersUnreachable=5 (they had been fine with default settings before). This kept things okay for about one hour, though notably the "utilization of poller data collector processes" was still elevated (5-10%) and also the "utilization of proxy poller data collector processes" was at roughly 30% constant -- I had been running with the setting StartProxyPollers=10, so looks like each unavailable proxy is keeping one proxy data collector busy? But then the utilization suddenly shot up again, for proxy data collectors to 70% and for data collectors to 50% and rising, so then I've bumped to StartProxyPollers=25 and restarted the server. That's where I'm now, and the utilization of proxy data collectors is about 12% (3/25) and data collectors fluctuating around a few % (still higher than normal) with occasional spikes above 10%. But I fear the performance will degrade again, as I don't understand the reason for the observed behaviour. Can't see anything suspicious in the log files.

    Zabbix server is 4.0.23, database is Postgresql 9.6, system information:
    Code:
    Number of hosts (enabled/disabled/templates)          351     143 / 114 / 94
    Number of items (enabled/disabled/not supported)      11532   6781 / 4617 / 134
    Number of triggers (enabled/disabled [problem/ok])    5574    3152 / 2422 [5 / 3147]
    Proxies and agents are run in passive mode. Number of processed values per second is around 50, when all things are up.

    Relevant settings in zabbix_server.conf:
    Code:
    StartPollers=50
    StartIPMIPollers=3
    StartPollersUnreachable=10
    CacheSize=1G
    CacheUpdateFrequency=30
    HistoryCacheSize=256M
    HistoryIndexCacheSize=64M
    TrendCacheSize=64M
    ValueCacheSize=1G
    Timeout=20
    UnreachablePeriod=120
    UnreachableDelay=120
    LogSlowQueries=3000
    StartProxyPollers=25
    ProxyConfigFrequency=60
    ProxyDataFrequency=10
    Any suggestions what I can tune? When all proxies are up the load is very low on all collector processes, I don't understand why proxies being offline wreak havoc like this. Let me know what other data I can provide to figure out this issue. Many thanks!
    Last edited by fotto; 28-07-2020, 23:33.
  • fotto
    Junior Member
    • Jul 2020
    • 3

    #2
    Additional information: meanwhile I've noticed something rather curious. Looking at the "latest data" for any of the proxy nodes, I see that all the data items were last updated before the network went down in the affected data cente, except the items for Zabbix proxy which continue to show updates! But the log files for zabbix server clearly say that it can't connect to these proxies! Also manually connecting to port 10051 on the proxies from the zabbix server machines results in connection timeout, as it should as I know that the network line is down. So I'm rather confused by this. How can the zabbix proxy data items continue to receive updates while the proxy is down? I fear this might be a bug in Zabbix (or I completely misunderstand how Zabbix proxies work). Perhaps the proxy pollers are mixing up data? Is any issue like this known?

    Comment

    • Hamardaban
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • May 2019
      • 2713

      #3
      After the proxy connects to the server again it starts transmitting all the data accumulated during the period of unavailability causing a high load of pollers. This is normal. If you do not need data for the period of proxy unavailability, change the storage period and storage cache size in the proxy settings. For your information, in version 5, working with device unavailability for a proxy when it is unavailable has changed significantly for the better.

      Comment

      • fotto
        Junior Member
        • Jul 2020
        • 3

        #4
        Originally posted by Hamardaban
        After the proxy connects to the server again it starts transmitting all the data accumulated during the period of unavailability causing a high load of pollers. This is normal. If you do not need data for the period of proxy unavailability, change the storage period and storage cache size in the proxy settings. For your information, in version 5, working with device unavailability for a proxy when it is unavailable has changed significantly for the better.
        Sorry if I wasn't clear enough. The problem doesn't happen after the proxies come back but while they are offline. Also on coming back they wouldn't send a lot of data in my case because everything behind these proxies is shut down.

        The 3 proxies in question are still offline. Over night, the utilization of poller data collector processes has crept up to 65% and of proxy poller data collector processes to 32% (after restart it sits at 12% = 3/25; I had restarted at 11pm):

        Utilization of data collectors creeping up when proxies are unavailable
        Good to hear that there are improvents for handling unavailable proxies in Zabbix 5. I've checked the release notes but couldn't find much details about this point. Any pointers for more details? Many thanks!

        Comment

      • khontrolirq
        Junior Member
        • Aug 2020
        • 1

        #5
        This is normal. If you do not need data for the period of proxy unavailability, change the storage period and storage telldunkin cache size in the proxy settings. For your information, in version 5, working with device unavailability for a proxy when it is unavailable has changed significantly for the better.
        Last edited by khontrolirq; 19-08-2020, 09:30.

        Comment

        Working...