Remember, Zabbix 1.8 was supposed to improve performance?
That's a nice promise, but what are users seeing in real production environments? Luckily, we know that now. Zabbix user verwilst has shared some graphs before and after upgrade from 1.6 (and he has a big monitor). Rough facts about the environment:
This installation also has split Zabbix server and database.
And now for the shiny part. Here's a graph of SQL access by Zabbix server, split up by selects, inserts, updates and deletes. On the left hand side we can see Zabbix 1.6 operating, then there's a small gap during the upgrade, and then Zabbix 1.8.1 is getting to work.
So what's the difference? As we can see, all kinds of database access have dropped notably. Selects, for example, dropped more than twice, updates a bit less than twice. What's significant, amount of inserts has dropped from significant 800 per second to pretty much nothing during a normal run (last value is 7.71), with insignificant, small peaks (all below 500). Amount of deletes does not seem to have changed that much.
Looking at the graph we can of course appreciate the improvements in the Zabbix server. We also can spot different things happening. The red risings are quite clearly housekeeper runs, where old data is removed. One run happens when Zabbix server starts, and then it runs once every hour - which is the default housekeeper interval.
There also are smaller bumps in inserts hourly. But these are not aligned with housekeeper runs, instead happening at full hour. These are trend calculations and inserts into the database. At the same time, updates slightly decrease because Zabbix server cache is busy by the trends.
So this single graph gives us both a confirmation that Zabbix server in version 1.8 is much more effective, as well as giving some insight in its daily (or more like hourly in this case) operations. But verwilst was so kind and shared some more graphs, showing the impact of the upgrade.
As a result of the reduced query count, actual load on servers dropped as well. Here we can see how CPU load stabilises on a lower level after the upgrade on the database host.
And here's CPU load change on the Zabbix server - excellent, that one also is lower with 1.8.
With all the load reduction, there might be some more “production like” metric we could look at, to determine what effect all this had on the efficiency of Zabbix after all. For that we have Zabbix server queue size - amount of items that are being worked on at any given moment. So here's the graph.
Zabbix 1.6 had a lot of items to work on, and some notable backlog. Excluding some larger spikes, it seemed to fluctuate around 7 thousand items. As this value was updated less frequently, Zabbix graph has upgrade gap filled with straight line - it does not know what caused the missing data, and the amount of missing values is too small to consider that a gap. Upgrade period is marked on the graph for clarity.
Hey, what's that ? Is there some problem with Zabbix after the upgrade? Line seems to go too low… Now this is indeed a testimony to all the technical improvements we looked at. Actually Zabbix queue dropped from ~7000 to… 19.
So there. Zabbix 1.8 is better, faster, and it might even feed your dog. Confirmed by users.