Hello,
First of all - I'm quite new to zabbix, so please excuse me if I'm asking ...not very smart questions
I'm using zabbix (2.2.7) for a month and I noticed some gaps on the graphs.
I have a centralized deployment (zabbix_server, zabbix frontend, mysql, no proxies) on a single VM (debian 8, 64 bits with 8GB of RAM, 4 CPU cores, 100GB HDD).
I'm monitoring 18 hosts (17 SNMPv3 network devices and zabbix server itself via agent) with a total number of items - 2974 (most of them are with interval of 300 seconds and the most aggressive are on 180 seconds) with 894 triggers. Maximum processed values per second are 48.
All my network devices are using SNMPv3 with MD5 authentication and AES encryption.
On the zabbix_server.log there are many (515 for the last day) entries like:
57228:20150825:003925.909 SNMP agent item "ITEM" on host "HOST" failed: first network error, wait for 15 seconds
57236:20150825:003940.927 resuming SNMP agent checks on host "HOST": connection restored
However, when I'm trying to manually pull the device with snmpwalk - it always works.
I've read that this could be because by default snmpwalk retries 5 times before giving up, unlike zabbix, which doesn't retry at all.
Do you believe that those gaps are because of this SNMP errors?
Are there a simple workaround of activating the SNMP retries (I've found a patch, but it's quite old and I'm not sure is it still valid - https://support.zabbix.com/browse/ZBXNEXT-1096)
Or maybe there is another performance issue causing these gaps?
My queue looks green and I've done some tuning on zabbix_server and on mysql server.
I've check the mysql table - history for the particular item and there are no entries for the time window when gap occurs, meaning that the problems is not with visualization, but with collection of the values.
Do you have any ideas what could be wrong with my setup?
Thanks!
First of all - I'm quite new to zabbix, so please excuse me if I'm asking ...not very smart questions
I'm using zabbix (2.2.7) for a month and I noticed some gaps on the graphs.
I have a centralized deployment (zabbix_server, zabbix frontend, mysql, no proxies) on a single VM (debian 8, 64 bits with 8GB of RAM, 4 CPU cores, 100GB HDD).
I'm monitoring 18 hosts (17 SNMPv3 network devices and zabbix server itself via agent) with a total number of items - 2974 (most of them are with interval of 300 seconds and the most aggressive are on 180 seconds) with 894 triggers. Maximum processed values per second are 48.
All my network devices are using SNMPv3 with MD5 authentication and AES encryption.
On the zabbix_server.log there are many (515 for the last day) entries like:
57228:20150825:003925.909 SNMP agent item "ITEM" on host "HOST" failed: first network error, wait for 15 seconds
57236:20150825:003940.927 resuming SNMP agent checks on host "HOST": connection restored
However, when I'm trying to manually pull the device with snmpwalk - it always works.
I've read that this could be because by default snmpwalk retries 5 times before giving up, unlike zabbix, which doesn't retry at all.
Do you believe that those gaps are because of this SNMP errors?
Are there a simple workaround of activating the SNMP retries (I've found a patch, but it's quite old and I'm not sure is it still valid - https://support.zabbix.com/browse/ZBXNEXT-1096)
Or maybe there is another performance issue causing these gaps?
My queue looks green and I've done some tuning on zabbix_server and on mysql server.
I've check the mysql table - history for the particular item and there are no entries for the time window when gap occurs, meaning that the problems is not with visualization, but with collection of the values.
Do you have any ideas what could be wrong with my setup?
Thanks!
Comment