Ad Widget

Collapse

Unavailable Host stays unavailable for 3 Days

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • salzkrebs
    Junior Member
    • Nov 2013
    • 14

    #1

    Unavailable Host stays unavailable for 3 Days

    Good Day,

    Not long ago I had a Problem with a Host in Zabbix. After a Test which stopped a Service and its IP a Host in Zabbix became unavailable and stayed like this for 3 Days.

    System:
    I have two Sites each having a cluster with three servers (HP). On one of the servers on each cluster i have a running Zabbix Server (2.2.3), both monitor the exact same items (for redundancy purposes).
    All Servers, Cisco Switches and Routers are monitored with snmpv2 and icmpping. Some of the items on the servers are monitored via a Zabbix Agent.

    Szenario:
    On one Site a Service was killed. Therefore its IP Adress was not reachable anymore. After a short time the service was started again and the IP was reachable again.

    Result:
    The Host of this Service recognized the outage correctly, but on one of the Zabbix Servers it wasn't cleared again after the service was started.
    In the Zabbix-Server Log I see the following Entries:

    Code:
    2017-06-02T10:06:54.176+02:00 Management_Server zabbix_server[47643] SNMP agent item "Rejected Reg" on host "Service X" failed: first network error, wait for 15 seconds
    2017-06-02T10:07:12.158+02:00 Management_Server zabbix_server[47651] SNMP agent item "Status" on host "Service X" failed: another network error, wait for 15 seconds
    2017-06-02T10:07:30.163+02:00 Management_Server zabbix_server[47651] SNMP agent item "Request Sum" on host "Service X" failed: another network error, wait for 15 seconds
    2017-06-02T10:07:48.168+02:00 Management_Server zabbix_server[47651] temporarily disabling SNMP agent checks on host "Service X": host unavailable
    The unavailable state stayed for 3 Days:

    Code:
    2017-06-05T11:29:03.160+02:00 Management_Server zabbix_server[47648] enabling SNMP agent checks on host "Service X": host became available
    In the time while the Host was unavailable i am 100% sure that the Service was reachable and the second Zabbix reported it as working.
    I didn't see any Log entries which point to this service in this time, but an average load of the unreachable poller of 1%.

    I didn't change any of the timeout settings in the zabbix_server.conf. The only change are some additional pollers (5 unreachable poller).

    After a short search I found some Tickets, but most of them are connected to disabling/enabling the Host in Zabbix.

    Is there anyone who can help or point me in a direction?

    Br Manuel
Working...