Ad Widget

Collapse

YA "Zabbix Agent Unreachable" false alarm question

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • wsanders
    Junior Member
    • Feb 2014
    • 7

    #1

    YA "Zabbix Agent Unreachable" false alarm question

    Is it the configuration parameter "Timeout" or "Unavailable" that triggers the "no ping" condition that throws this alert?

    We have about 1 or 2 instances per day of "Zabbix Agent unreachable" alerts on random servers. During this time, Zabbix logs lines like this every two minutes:

    19636:20140219:154735.205 cannot send list of active checks to [10.0.4.172]: host [blahblah] not monitored

    No other log entries are produced from the host. The host is up, but busy, with, perhaps, high CPU utilization (we can't tell because Zabbix doesn't get data during this period, which can last up to 30 minutes.) For some reason, we get the "unreachable" alert and the "OK" recovery at the same time, only after Zabbix is able to poll the host again, so it's hard to catch the server "in the act" with top or ps..

    Both Timeout and Unavailable are set to defaults: Timeout=3 and UnreachablePeriod=45. I think I will tune Timeout up to 6 and UnreachablePeriod up to 120. Does that sound like a good approach?

    Thanks
    w
  • wsanders
    Junior Member
    • Feb 2014
    • 7

    #2
    Solution: YA "Zabbix Agent Unreachable" false alarm question

    We found these false alarms were occurring at the end of maintenance periods during which data was not collected. Once we began collecting data during the maintenance period, the false alarms went away.

    We also increased the server-side Timeout parameter from 3 to 10, but the former seems to have been the main cause of our false alarms.

    Comment

    Working...