Having done some perusal of the code (disclaimer: I'm no coder, but I can follow code after a fashion) it becomes apparent that Zabbix 1.8 uses a single db syncer process. This is probably no problem for MySQL when it's set up as a local database. What happens when it's remote over a network?
At any rate, we've finally added more hosts than Zabbix can handle. Our database is capable of much more, but apparently Zabbix coupled with the SPARC architecture (running in 32 bits no less) and an Oracle backend not going to happen.
Number of items (monitored/disabled/not supported) 15178 15084 / 1 / 93
In other words, we're underwater. The "queue" is backing up within minutes of start-up. The history write-cache starts to fill up quickly and the DB Syncer seems to be unable to send data to our database quickly enough to keep our chin above water.
Questions from my team include, "Why is this a single-threaded or single process? Why isn't it parallelized?" I understand the problems with data concurrency, but they have a valid point. Is there any way we can speed this along or spawn multiple parallel processes? What can be done?
I understand that MySQL is preferred. Why wouldn't it have just as hard a time? We're about to attempt to move to a Linux/intel front-end as we fear the slower (but optimized for multi-threading) design of the SPARC boxes has something to do with this.
Our problem really does seem to be that Zabbix can't write to our DB fast enough to keep up with the stream of data we are trying to monitor, at least with Oracle on the backend and a Sun T5120 on the server side (8 cores, 16G of RAM).
Any suggestions?
At any rate, we've finally added more hosts than Zabbix can handle. Our database is capable of much more, but apparently Zabbix coupled with the SPARC architecture (running in 32 bits no less) and an Oracle backend not going to happen.
Number of items (monitored/disabled/not supported) 15178 15084 / 1 / 93
Code:
29188:20100408:155639.349 DB syncer spent 0.000124 second while processing 0 items. Nextsync after 5 sec. 29188:20100408:155644.353 DB syncer spent 0.000087 second while processing 0 items. Nextsync after 5 sec. 29188:20100408:155649.354 DB syncer spent 0.000078 second while processing 0 items. Nextsync after 5 sec. 29188:20100408:155658.826 DB syncer spent 4.472079 second while processing 186 items. Nextsync after 5 sec. 29188:20100408:155729.525 DB syncer spent 25.698532 second while processing 1000 items. Nextsync after 5 sec. 29188:20100408:155827.152 DB syncer spent 52.626974 second while processing 2000 items. Nextsync after 5 sec. 29188:20100408:160041.239 DB syncer spent 129.086294 second while processing 7000 items. Nextsync after 5 sec. 29188:20100408:160427.822 DB syncer spent 221.582604 second while processing 11000 items. Nextsync after 4 sec. 29188:20100408:161048.093 DB syncer spent 376.270834 second while processing 21000 items. Nextsync after 4 sec. 29188:20100408:161820.495 DB syncer spent 448.400870 second while processing 29000 items. Nextsync after 4 sec.
Questions from my team include, "Why is this a single-threaded or single process? Why isn't it parallelized?" I understand the problems with data concurrency, but they have a valid point. Is there any way we can speed this along or spawn multiple parallel processes? What can be done?
I understand that MySQL is preferred. Why wouldn't it have just as hard a time? We're about to attempt to move to a Linux/intel front-end as we fear the slower (but optimized for multi-threading) design of the SPARC boxes has something to do with this.
Our problem really does seem to be that Zabbix can't write to our DB fast enough to keep up with the stream of data we are trying to monitor, at least with Oracle on the backend and a Sun T5120 on the server side (8 cores, 16G of RAM).
Any suggestions?
An incomplete list of planned performance related improvements can be found here:
Comment