Hi all,
I've been using the same Zabbix server for almost 2 years. I believe it started off as a 1.6.something and now is a 1.8.1.
Mostly it’s been smooth sailing.
95% of my hosts are on remote sites, I use active agent items to get my data. I use a nodata(300) trigger on one of my items to determine if the hosts are online.
Zabbix is running on Debian Etch with MySQL
Here is a status of Zabbix.
Number of hosts (monitored/not monitored/templates) 206 95 / 45 / 66
Number of items (monitored/disabled/not supported) 8650 6225 / 2168 / 257
Number of triggers (enabled/disabled)[true/unknown/false] 3901 3117 / 784 [123 / 24 / 2970]
Required server performance, new values per second 21.5125 -
Occasionally in the past, it would 'go nuts' and we'd get flooded with nodata alerts, the trigger would flap on and off over time. If we waited a short period, 10min or so, it would just sort itself out. Previously this was only a rare occurrence. The more hosts I add however, the more often it is happening. Lately I've had to turn off the trigger completely (we got 1,500 no data alerts from 95 hosts in 30 min...)
I can see that the data does actually appear to come into Zabbix fine. The connection isn’t severed, clients can communicate with the server.
There are no gaps in the monitoring data, it does all appear to get in there eventually.
so; the question is; where do I start looking for the actual cause of this problem? What metrics should I be measuring and comparing?
Could my server disk I/O be too poor for the amount of hosts/items I have? CPU under spec? how could I go about verify that?
Thanks!
I've been using the same Zabbix server for almost 2 years. I believe it started off as a 1.6.something and now is a 1.8.1.
Mostly it’s been smooth sailing.
95% of my hosts are on remote sites, I use active agent items to get my data. I use a nodata(300) trigger on one of my items to determine if the hosts are online.
Zabbix is running on Debian Etch with MySQL
Here is a status of Zabbix.
Number of hosts (monitored/not monitored/templates) 206 95 / 45 / 66
Number of items (monitored/disabled/not supported) 8650 6225 / 2168 / 257
Number of triggers (enabled/disabled)[true/unknown/false] 3901 3117 / 784 [123 / 24 / 2970]
Required server performance, new values per second 21.5125 -
Occasionally in the past, it would 'go nuts' and we'd get flooded with nodata alerts, the trigger would flap on and off over time. If we waited a short period, 10min or so, it would just sort itself out. Previously this was only a rare occurrence. The more hosts I add however, the more often it is happening. Lately I've had to turn off the trigger completely (we got 1,500 no data alerts from 95 hosts in 30 min...)
I can see that the data does actually appear to come into Zabbix fine. The connection isn’t severed, clients can communicate with the server.
There are no gaps in the monitoring data, it does all appear to get in there eventually.
so; the question is; where do I start looking for the actual cause of this problem? What metrics should I be measuring and comparing?
Could my server disk I/O be too poor for the amount of hosts/items I have? CPU under spec? how could I go about verify that?
Thanks!

Comment