Hi,
We have a fairly large & busy zabbix monitoring setup (around 400 hosts, 15.000 items and 12.000 triggers) and are experiencing a "peculiar" problem.
At times, the important internal zabbix processes all reach 100% busy at the same time:
- housekeeper
- poller
- http poller
- history syncer
- icmp pinger
These are of course our most active checks (icmp, poller (snmp) & http), but they occasionally get to a 100% busy state at random intervals. It's usually solved automatically after an hour or so, but it causes some items to have missing data.
Host performance remains OK throughout the day, there's no extra CPU load. There seems to be around 50% room left for CPU time, so it's not fighting for CPU cycles.
The zabbix-server config has parameters for "StartPollers", "StartTrappers", ..., but I'm confused as to how these auto-grow. If more processes are needed to execute the requested checks, would it continue spawning more processes?
How would you go about debugging what may be the cause of it, and what possible solutions could I expect?
Ps; we're running the latest 1.8.5 server.
We have a fairly large & busy zabbix monitoring setup (around 400 hosts, 15.000 items and 12.000 triggers) and are experiencing a "peculiar" problem.
At times, the important internal zabbix processes all reach 100% busy at the same time:
- housekeeper
- poller
- http poller
- history syncer
- icmp pinger
These are of course our most active checks (icmp, poller (snmp) & http), but they occasionally get to a 100% busy state at random intervals. It's usually solved automatically after an hour or so, but it causes some items to have missing data.
Host performance remains OK throughout the day, there's no extra CPU load. There seems to be around 50% room left for CPU time, so it's not fighting for CPU cycles.
The zabbix-server config has parameters for "StartPollers", "StartTrappers", ..., but I'm confused as to how these auto-grow. If more processes are needed to execute the requested checks, would it continue spawning more processes?
How would you go about debugging what may be the cause of it, and what possible solutions could I expect?
Ps; we're running the latest 1.8.5 server.
SO, I stopped monitoring it.
Comment