Ad Widget

Collapse

Heads up: The agent is NOT a reliable, robust way to check for availability

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jherazob
    Junior Member
    • Sep 2011
    • 20

    #1

    Heads up: The agent is NOT a reliable, robust way to check for availability

    After too many troubles this week, i have determined that using the Zabbix agent to check for host/service availability is a mistake.

    First a server went down, and agent.ping completely failed to alert anybody about it, it just stopped gathering information, last value was at about 3am and the trigger didn't fire at all. So, i changed it to a simple check of icmmping.

    Last night the same server went into rescue mode at about 3am again. In rescue mode, everything goes down but the network and SSH. So the "Server is down!" trigger didn't fire, but all the service checks for HTTP and the like went silent just like agent.ping did. So, the server was effectively down while still technically up. And nobody knew about it until it was too late to prevent troubles. I have changed as many checks as possible to simple checks now.

    In sum, agent.ping is in fact useless, because items using it will completely fail to alert you that the agent is in fact not pinging. Be advised and use simple checks for that, specially when the service is vital.
  • ghoz
    Senior Member
    • May 2011
    • 204

    #2
    This is the documentend functionality.
    You should use a 'nodata' trigger to tell you that the agent is not running (or at least not answering),...

    I agree that the name agent.ping can be misleading...
    what it really does is ask the agent for it's own presence... you can't get an agent answering "no no no, I'm not here" ...

    As for the pings, you found out that they only check for network présence...

    and your http simple checks won't tell you that your web app is busted and answering 500 errors , i(ll just tell you that something is listening on that port (and maybe in next versions that it's a web server)

    You ask if "everything is OK" but that's not easily translatable into checks...

    Comment

    Working...