We have a half-dozen calculated checks for each of our 150 or so hosts that average performance values (CPU load, free RAM, web server connections, etc.) over long intervals (8 days at present, but I would like to extend this to 30 days). We would like the checks to run once/day. I quickly found out that calculated items have the ability to cause an enormous load on Zabbix (iowait) as the system reads values from the DB and performs the calculation for each host. This sometimes causes the web GUI to become unavailable for several minutes. I've put quite a bit of effort into optimizing our system for performance. We use DB partitioning, I've optimized mariaDB and cache values, etc. Unfortunately, the one thing I cannot do is change out our RAID 10 SATA array for SSDs, as they are prohibitively expensive at our current data center host. And other than these I/O-expensive checks, everything is working fantastic.
Current I'm using macros to assign flexible scheduling for each item, so that checks are broken up over the course of the day. I've created 4 periods of 4 hours each, scheduling so that any 4 hour period has only 1 or 2 items to check. The macros look like this:

The resulting CPU graph looks like this. Each peak represents one set of checks:

This is working, but it is not a happy situation, and I'm afraid I can't go beyond the 8-day averaging. How can I get away from the clustering of these checks so that they put less of a load on the server? I could go into each and every host and customize the schedule intervals so that there are more unique schedules, but I want to retain the management ease of assigning the check interval via template.
My question for you Zabbix experts is: Am I missing something? Is there a scheduling strategy I can adopt so that checks are better spread over a particular time period of each day?
Thanks beforehand for any tips!
Current I'm using macros to assign flexible scheduling for each item, so that checks are broken up over the course of the day. I've created 4 periods of 4 hours each, scheduling so that any 4 hour period has only 1 or 2 items to check. The macros look like this:
The resulting CPU graph looks like this. Each peak represents one set of checks:
This is working, but it is not a happy situation, and I'm afraid I can't go beyond the 8-day averaging. How can I get away from the clustering of these checks so that they put less of a load on the server? I could go into each and every host and customize the schedule intervals so that there are more unique schedules, but I want to retain the management ease of assigning the check interval via template.
My question for you Zabbix experts is: Am I missing something? Is there a scheduling strategy I can adopt so that checks are better spread over a particular time period of each day?
Thanks beforehand for any tips!
and failed again and again ( basically there is not enough time to schedule them all reasonably and there are to many devices that will cause those i/o locks eventually ) - simply this is not the way it should be handled. The question what I had to ask ( maybe you should to ) whats the benefit of those 30d ( 8d) calculated check ?
Comment