Ad Widget

Collapse

All zabbix agents on a machine have stopped responding

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • free.tim
    Junior Member
    • Sep 2012
    • 3

    #1

    All zabbix agents on a machine have stopped responding

    SYSTEM DESCRIPTION
    ----------------------
    Zabbix Server 1.8.11 is installed on Ubuntu Linux 12.04.

    A Windows Server 2008 R2 server has the following agents:
    1) Standard windows zabbix agent
    2) Two "Jabcat" agents imbedded in two Java applications

    There are many servers on the network, and each of them has a similar setup (the windows zabbix agent and one or more Jabcat agents).

    PROBLEM DESCRIPTION
    ----------------------
    Everything has been working just fine for several months now. However, one of the Windows servers' zabbix agents have all suddenly begun behaving strangely.

    I see in the Zabbix Server logs where it complains about "network errors", it waits for 15 seconds and sometime reconnects, but soon fails again. Sometimes, it cannot connect for so long that the agents' triggers start going off. Every single zabbix agent on this one Windows machine is failing in this way, and no other machine is having this issue.

    TROUBLESHOOTING PERFORMED
    -------------------------------
    1) Ping from Zabbix Server -> Windows machine lost no packets while running for 15 minutes.
    2) Ping from Windows machine -> Zabbix Server lost no packets while running for 15 minutes.
    3) CPU / Memory usage looks fine on both the Zabbix Server and Windows Machine.
    4) Windows system logs show no errors during the time preceding this behavior's beginning.
    5) No errors in Zabbix Server's system logs, etc.
    6) On the Windows system, I have many network-dependant processes running, and none of them are having network connection issues. Only the Zabbix Agents (all 3 of them) are having an issue.

    Any ideas, guys? I have spent a day and a half tracking this down, and I am officially at a loss.
  • free.tim
    Junior Member
    • Sep 2012
    • 3

    #2
    Any ideas?

    Sorry there are multiples of this post. They didn't show up on the forum for a few days, so I tried posting 4 times.

    Comment

    • tchjts1
      Senior Member
      • May 2008
      • 1605

      #3
      For the items to that monitored host, do you have them as Zabbix agent or Zabbix agent (active)?

      If you can make them active agents, you may see more success. For that to work, on the host you must have the HostName= in the zabbix_agentd.conf file be an exact match (case and spelling) as what you have in Zabbix server frontend.

      If you have to use passive Zabbix agent, you might try spinning up some more agent processes to handle the load. in the zabbix_agentd.conf file, find the field StartAgents= (By default this value is 3). You could try gradually bumping it up and see if that helps. I would start at 5.

      If you make any changes to the conf on the host side, be sure to restart the agent.

      Comment

      • free.tim
        Junior Member
        • Sep 2012
        • 3

        #4
        Rule 1 in IT - "Have you tried turning it off and on again?"

        I rebooted the server, and whatever was wrong is no longer occurring. Fingers crossed that the problem just goes away.

        Comment

        Working...