Hello, we are in the final stages of getting zabbix deployed for our business but are noticing that we regularly have items in the queue longer than 10 minutes. Looking at the output of iotop on the zabbix server it seems mysql is typically doing writes between 3-5MBps constantly. We've tried upping its buffer pool to 2GB per the docs (https://www.zabbix.com/documentation...n/requirements) but that only helped momentarily. We're monitoring about 263 nodes and Zabbix says it's getting around 120 new values per second. Has anyone else run into this? Do we just need to get more RAM for the server and up the mysql buffer pool more? Thanks.
Ad Widget
Collapse
Zabbix Constant High I/O / Queue Wait Time
Collapse
X
-
It is hard to say that just adding more RAM will take care of the issue. It is probably a little more complex than that since there are a fair amount of values that can be tweaked between MySql and Zabbix server settings.
What version of Zabbix are you using?
Take a look at this post... the last half of it anyway where the graphs are. If you can screenshot those, we can probably start tuning some settings for you.
And since you are mentioning high IO wait times, you can also take a look at this post:
-
Thanks, we're using 2.0.8. Here are the graphs and some more things I think may be useful:
zabbix cache usage: i42.tinypic.com/2j1uc8w[dot]jpg
zabbix internal processes: i39.tinypic.com/1tkdgj[dot]jpg
zabbix data gathering: i41.tinypic[dot]com/30mvlhw.jpg
zabbix queue: i43.tinypic[dot]com/21zi9t.jpg
iotop:i39.tinypic[dot]com/20u6ghx.jpg
top: i44.tinypic[dot]com/1z6f7nl.jpg
Let me know if you need anything else, thanks.Last edited by jmusbach; 10-10-2013, 00:49.Comment
-
Ah thanks, I tried uploading and it said I didn't have enough space alotted to my quota to attach all the images. And if I tried making direct links to the images I got a error saying I had too much live content.
Anyways please let me know if you come up with any things we can tweak. Thanks!
Comment
-
So what I see in your graphs.... your unreachable pollers were being hammered for about 5 hours straight. Were your servers actually unreachable during that time? That would explain your queue being very high.
Otherwise, your settings are not that bad. These settings are all in zabbix_server.conf. Your cache settings look good. If it were me, I would increase my StartPollers= and UnreachablePollers= a little bit, maybe add 20/5 respectively. I would also bump up my Timeout= to 10, if you still have it at the default of 3.
Restart your Zabbix server process after you make any changes.
What kind of infrastructure are you on? Standalone or VM servers?
I am on VM and I run my App server on a different VM than my DB server.
Outside of that, I would question why the unreachable pollers were being hammered. Were you having any network issues going on?Comment
-
Check out MWSnap sometime. It makes the picture size pretty compact while preserving clarity. I will also check and see if we can increase available space for uploads.Last edited by tchjts1; 10-10-2013, 21:12.Comment
-
At that point I think we were having some network latency issues. But in general things availability wise are fine. But we still get a lot of things listed as waiting >10 minutes in the queue and when I view details they're things set to run on the zabbix server's agent itself. Things like its own stats but also some scripts we've set the agent to run. At the time this is happening I am able to run the scripts that're sitting in the queue just fine so I'm not sure of the cause. If you look at the iotop screenshot you can see mysql's frequently causing IO of 3+ MBps. Could this be a cause of queueing? Thanks again.Comment
-
Check my other link regarding high IO wait?
I would look at the swappiness setting at the OS layer. If it is at the default of 60, I would set it to 0 on the Zabbix DB server. Especially if you are seeing high IO wait coupled with abnormal swap usage.Comment
-
Thanks, things seem to be better now that I changed the swappiness on the server to 0 so that it only swaps as a last resort and applied the configuration tweaks you suggested. However one problem still remains, if you look at my attached screenshot you'll see that some graphs are randomly getting breaks in them. The graphs are using data obtained from scripts the server is set to run as items and the scripts should be fine. The graphs weren't a issue until we finished adding all our servers and monitoring items, now it seems like the server is too busy to maintain a steady stream of the data the graphs depend on? Anything to tweak to make the data come in more reliably? Thanks.Comment
-
Comment