Ad Widget

Collapse

zabbix-agent stops responding intermittently

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • vic
    Member
    • Jul 2013
    • 58

    #1

    zabbix-agent stops responding intermittently

    A little bit stumped here.

    Everything updated to latest released version zabbix server and agents on all the monitored servers. No proxy being used. One of the monitored servers intermittently stops responding. Only seems to happen for a few minutes every day or two so difficult to troubleshoot. Random times but so far always in mornings during business hours Los Angeles time. Just get a notice that the server is offline but it's not. When I look at Zabbix graph all the other parameters that are monitored are blank at this time as well. So CPU load etc. So basically the agent is not responding and not sending back any information. Then it comes back on it's own so I get the notice that the server is back online and usually doesn't happen again that day. Sometimes it does within a few more minutes and again comes back. Twice within a 20 minute period seems to be the most in any given day.

    Today it stopped responding for an extended period for a change so had some time to try some things. As soon as I restarted the agent it started working again then a few minutes later it stopped. I stopped the agent then started and this time so far it seems to still be working after about half an hour.

    Before I restarted the agent I checked a few things. I did a netstat and tcpdump on port 10050 and didn't see anything unusual. No foreign IP's trying to connect on that port. The zabbix server is in a different data center as the monitored server. I have another monitored server in the same data center as the one that stops responding intermittently but this other server never misses a beat. So it's not something with the routing or anything like that.

    I double checked the agent config file and it's the same as the other servers I am monitoring. Only thing I have set is the zabbix server IP which is allowed to connect. Everything else is at defaults so passive mode etc.

    Anyone have any ideas what this could possibly be?
    Last edited by vic; 10-10-2014, 21:30.
  • tchjts1
    Senior Member
    • May 2008
    • 1605

    #2
    A few things to look at.

    1. On your Zabbix server in zabbix_server.conf, what do you have Timeout= set to? If it is the default of 3, try incrementing that to 15 and restart your Zabbix server process. (You should either enter a new line with that value or change the existing one and remove the comment symbol # )

    2. You can make that same change on your agent in zabbix_agentd.conf and restart the agent.

    3. Have a look at the bottom of this post where I discuss the internal processes and graphs. Have a look at yours and see if possibly you need to allocate some more pollers/trappers/cache: https://www.zabbix.com/forum/showthread.php?t=41219

    Comment

    • vic
      Member
      • Jul 2013
      • 58

      #3
      Thank you for the suggestions.

      I have increased timeout to 15 on server and agent. It was set to 10 on server but at the default 3 on agent.

      Zabbix server performance data looks ok so I don't think I am running out of resources on the server.

      Comment

      • Nimue
        Junior Member
        • Jul 2021
        • 2

        #4
        I realise this post is quite old but I am facing a similar if not the same problem. Zabbix loses connection with the Agent on an hourly basis. The Zabbix agent also consumes a lot of CPU (increasing over time until 100% CPU usage). Restarting the agent solves the problem for a few days until it starts again. I set the debug level to the highest but agent log days everything is ok. I also increased the timeout but it didn't help either. Is there something else I could check?

        Comment

        Working...