Ad Widget

**gofree** · 03-12-2019, 10:35

just a wild guess - can it b e housekeeper - its being run by default every hour....

**bdaniel** · 03-12-2019, 10:46

Yeah I read something about Housekeeper running hourly but didn't think it would interact with the actual scanning of network devices - it seemed to be more of a db cleanup.

Can I safely disable it as a test?

**gofree** · 03-12-2019, 10:55

I guess for a test purpuse it will do no harm if youll disable it for a while. Or cahnge its interval in zabbix server conf file and youll see the results.

**bdaniel** · 04-12-2019, 01:08

I have changed the interval to 6h and rebooted the server. A bunch of the UPS's just reported the same latency issue at 5min past the hour.

Does anyone else have any suggestions of where to look to solve this issue?

EDIT: Still getting the exact same symptoms every hour, a couple minutes past the hour.

**dimir** · 04-12-2019, 11:49

Could it be the Network Discovery?

**bdaniel** · 05-12-2019, 00:20

I don't think it is Network Discovery Rules, for a few days I had them (and Actions) completely disabled and this issue was still occuring. I also had them set to 24h with the same result. I have just set all 3 of them to 6h and will see if this does make a difference for some reason...

**bdaniel** · 05-12-2019, 00:33

But I just found this (see screenshot) under one of the affected hosts. Could one of these Discovery Rules be putting too much load on the low end network card in the UPS's?

Attached Files

**dimir** · 05-12-2019, 15:31

Well, according to the image you have 2 discovery rules that attempt to discover network interfaces of the UPS every hour using SNMP protocol, so that very possibly is it. One thing to find out, which time of the hour those rules are fired. Unfortunately this information is not available in history tables, but you could use some traffic analyzer, e. g. tcpdump to see at what time of the hour SNMP traffic goes between zabbix server and this UPS.

**bdaniel** · 09-12-2019, 01:04

So, I have disabled each of those discovery rules independently and left for a 24 hour period, I still have the same results each hour on the hour.

Dimir: When I run Wireshark and watch the traffic, there is lots of SNMP get-request and get-response traffic all the time, all appears to be as usual. Around the time the issue happens there is some SNMP getBulkRequest traffic and then we see the issue on the devices. Does this give you any clues?

If I stop the Zabbix-Server service the issues completely stop.

**Mike2K** · 09-12-2019, 12:40

Originally posted by bdaniel

So, I have disabled each of those discovery rules independently and left for a 24 hour period, I still have the same results each hour on the hour.

Dimir: When I run Wireshark and watch the traffic, there is lots of SNMP get-request and get-response traffic all the time, all appears to be as usual. Around the time the issue happens there is some SNMP getBulkRequest traffic and then we see the issue on the devices. Does this give you any clues?

If I stop the Zabbix-Server service the issues completely stop.

Zabbix uses the SNMP GetBulk request to get all SNMP data from the device, instead of the SNMP Get which causes a lot of network traffic. Could it be that the UPS is not able to handle the GetBulk request properly, causing the network stack on the UPS to crash?

Edit:
Could you please try to disable the bulk requests ? You can do this in the host configuration.

**Mike2K** · 09-12-2019, 12:43

Could you try disabling the bulk requests ? You can do this in the host configuration...

Attached Files

**bdaniel** · 10-12-2019, 07:30

Since disabling the "use bulk requests" checkbox for each UPS I have not noticed any issues. I will continue to monitor this for the next 24 hours.

If this is indeed the solution, is there anyway to disable this setting per host group or per VLAN? Possibly as part of the discovery/action process?

**bdaniel** · 10-12-2019, 23:44

Originally posted by Mike2K

Zabbix uses the SNMP GetBulk request to get all SNMP data from the device, instead of the SNMP Get which causes a lot of network traffic. Could it be that the UPS is not able to handle the GetBulk request properly, causing the network stack on the UPS to crash?

Edit:
Could you please try to disable the bulk requests ? You can do this in the host configuration.

The system seemed to behave well during the day yesterday after making the change. However, overnight the same issues have been observed on multiple UPS's.

**bdaniel** · 12-12-2019, 00:10

Disabling the "use bulk requests" has actually seemed to make the problem worse. All UPS's are now reporting latency spikes over 200ms on a random timeframe.

Does anyone have a suggestion for me? Surely I am not the only person SNMP monitoring Eaton UPS's using Zabbix...

Ad Widget

Zabbix Server Causing Network Device Latency Hourly

Zabbix Server Causing Network Device Latency Hourly

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment