View Full Version : Scaling the web front end
jcesario
16-03-2010, 02:30
Im just curious, how do people deal with the slowness of the web front end in larger deployments.
For example, we monitor close to 16k unique items (these items are not able to be broken down or partitioned even more than they already are)
A typical host for us has about 1k of items associated with it.
What ive found is that doing even trivial tasks in zabbix becomes unbearable. 5 or 6 minutes just to access an items page.
We are already using PHP in FastCGI mode on Lighttpd with APC.
So just curious, how are others scaling their web interface?
untergeek
16-03-2010, 18:16
What's your frontend hardware? What's the DB on the backend?
jcesario
16-03-2010, 18:24
DB:
MySQL 5.0 running the InnoDB plugin
8 core i7 L5520 (16 cpus)
48GB memory (36GB of this allocated to the innodb buffer pool)
32 x 15K drives in a RAID-10 array
Frontend
8 core i7 L5520 (16 cpus)
24GB memory
Lighttpd running PHP 5.2 with FastCGI
Xcache opcode cacher.
untergeek
16-03-2010, 19:01
Whoa... That's some serious hardware!
I guess you're hitting a rendering wall. PHP can query pretty fast, but if you're trying to put over 1K items per server, that's a large number. That's up to 20 pages of information in the configuration pages at the default 50 lines per page!
My recommendation would be to slim it down, make "sub" hosts or something, e.g. run separate instances of zabbix_agent on different ports so you can offload some of that immense number of items per server.
jcesario
16-03-2010, 19:19
Yup, pretty much already did all that. separate zabbix_agent though we havent done.
jcesario
16-03-2010, 19:20
Ive tried partitioning the items into the smallest "host groups" possible. it just doesnt work anymore.
jcesario
16-03-2010, 19:21
btw, the agent and poller are absolutely stress free. The DB is nearly idle as well.
The only part choking is the PHP frontend, in fact ive started resorting to try to hack together some sprocs to manipulate everything i need right on the DB.
untergeek
16-03-2010, 20:05
Hmmm. How many connections does the php script open? As many as it wants? As many as it can? Only a few?
We've noticed some improvement with simultaneous viewing of highly detailed screens by increasing the OCI concurrent connections in php.ini (we're on oracle). But, I wonder if the script could be optimized to run these sorts of queries in parallel. It would make situations like yours (and ours) run much faster, I would think.
jcesario
16-03-2010, 20:21
Were on MySQL. I honestly think whatever the web frontend is doing it probably spends more time building and tearing down connections to the DB than the queries actually take.
The nice thing about the DB for right now is history_uint and items are both still so small we can float our entire data set.
We also mount a nice 8GB tmpfs partition on the mysql tmpdir to ensure that any tmp tables or massive sorts still end up being done in memory.
What I see being the bottleneck is when, for example, You goto Latest Data, youll see DB go (for the most part) idle. However 2 PHP-CGI processes will be pegged to 100% CPU util. Now we still have plenty of CPU leftover to fill but I think this is more of a limitation of PHP than anything else.
I do agree that further parallelization would be a benefit. I also think some pretty severe pagination would be in order as well. I need to crack open the web frontend to confirm but I have a feeling its making some nasty arrays/objects in there during the creation of the page.
I would be very interested in this as well. We also have high-end hardware, with DB residing on the SAN and using file-per-table partitioning on MySql.
One particular screen of graphs we have contains about 80 graphs. When you open the screen, go fix your coffee and let it draw. Change the time period you want to view and let it redraw yet again...
jcesario
16-03-2010, 21:07
Well it seems like we have some very high end zabbix people in this thread. Im assuming we all have the support contracts, perhaps we should be voicing this to our reps.
Or even better, figuring out how to patch it and committing it back?
What versions of ZABBIX you are using?
What versions of ZABBIX you are using?
My current production version is 1.6.6
We are waiting for the release of 1.8.2 to upgrade prod.
In 1.8 paged view is introduced, also very helpful will be new item filter and global search.
You shouldn't get any performance related problems to view your items in 1.8.
BTW nice hardware out there, jcesario :)
jcesario
17-03-2010, 15:28
Thanks :-). Were taking zabbix pretty seriously over here.
Were waiting for 1.8.2 as well. We did a POC of moving to 1.8.1 and it didnt turn out well :-(
jcesario
17-03-2010, 17:45
Well after some extensive tuning of lighttpd I had gotten the frontend to at least be reasonably responsive again. However, the nonresponsive zabbix problem has crept up in our 1.6.8 instance now.
Aly, Its not just viewing items that times out. Once we hit some magical critical mass point, zabbix simply stops responding to modifications/additions/removals to items AT ALL.
We cannot add a single item to a host, nor can we remove or modify any items. This basically renders zabbix completely useless.
Whenever trying to add or remove templates from hosts It simply spins and spins. Then eventually stops processing and does nothing. No exhausted memory ( I cranked my PHP max memory up to 2GB, no max execution time (set to 6 hours).
Big fail.
Were currently downgraded to 1.6.8, but this is also the issue we have with 1.8.1 - Aly, can you confirm behavior like this is not going to be present in 1.8.2 ?
jcesario
17-03-2010, 21:37
For S&G I flipped over to pg 8.4 to see if innodb shoddy concurrency was causing issues.
Replicated the exact same issue. Whenever dealing with large number of hosts or items (hosts in the hundres and items in the 10s of thousands). The Zabbix front end likes to just sit there and spin its wheels for a while and then just quit out without completing whatever action.
Able to replicate doing the following things
Adding 200 hosts to a template with 996 items in it.
Adding 200 hosts to a specific host group.
Removing or linking a template against 5/10/100/200 hosts when the template has multiple thousands of items in it.
Just FYI all of our items are SNMPv2
jcesario
19-03-2010, 19:03
Bump. Bueller ?
We are about to release 1.8.2. Jcesario, may I ask you to download and try the latest nightly build and tell us about your experience?
It is here: http://www.zabbix.com/developers.php
Note that you may install the newest front-end in a different location, no need to break your existing 1.8.1. I would be very interested in you feedback.
The hardware you use is capable of monitoring of thousands of devices without breaking a sweat. It must be some minor issue, misconfiguration or inefficiency which makes the front-end very slow in your environment. It would be nice to have it resolved prior to 1.8.2.
jcesario
22-03-2010, 15:45
Will do Alexei. Ill deploy the nightlies sometime today and get back to the forums.
Assuming I still have the same issue what kind of information would you like me to gather for diagnosis?
Assuming I still have the same issue what kind of information would you like me to gather for diagnosis?
Let's wait for performance results. Actually some stats from your system would be handy: total number of hosts and average number of items/triggers/applications per host, also number of values per second from the Dashboard.
jcesario
22-03-2010, 17:46
Setting up 1.8.2 nightly (10966) right now. Ill let you know how it goes in a few hours.
jcesario
22-03-2010, 18:08
When doing an import of my xml templates:
Mar 22 11:34:13 p3plmysqlweb02 php-cgi: PHP Fatal error: require_once() [<a href='function.require'>function.require</a>]: Failed opening required 'include/classes/class.domdocument.php' (include_path='.:/usr/share/pear:/usr/share/php') in /var/www/html/zabbix182/include/config.inc.php on line 62
jcesario
22-03-2010, 18:43
just for reference:
Current value Required Recommended
PHP version 5.2.6 5.0 5.3.0 Ok
PHP memory limit 2048M 128M 256M Ok
PHP post max size 1024M 16M 32M Ok
PHP upload max filesize 1024M 2M 16M Ok
PHP max execution time 1800 300 600 Ok
PHP max input time 600 300 600 Ok
PHP timezone America/Denver Ok
PHP databases support MySQL
PostgreSQL Ok
PHP BC math yes Ok
PHP MB string yes Ok
PHP Sockets yes Ok
PHP GD 2.0.34 2.0 2.0.34 Ok
GD PNG Support yes Ok
libxml module 2.6.26 2.6.15 2.7.6 Ok
ctype module yes Ok
# grep -P "^error_reporting" /etc/php.ini
error_reporting = E_ALL
jcesario
22-03-2010, 18:46
When doing an import of my xml templates:
Mar 22 11:34:13 p3plmysqlweb02 php-cgi: PHP Fatal error: require_once() [<a href='function.require'>function.require</a>]: Failed opening required 'include/classes/class.domdocument.php' (include_path='.:/usr/share/pear:/usr/share/php') in /var/www/html/zabbix182/include/config.inc.php on line 62
btw that was my bad. forgot to install php xml extension - but shouldnt the installer pick that up now?
jcesario
22-03-2010, 19:21
Same behavior as before. When trying to add or manipulate a large number of items against hosts the frontend simply stops processing after a few moments and nothing is changed.
Attempt to add single template (comprised of 12 other templates) tallying 2000 SNMPv2 items.
Attempted to add this template to host group comprised of 163 hosts.
There are no PHP or MySQL errors returned from this, as well as nothing displayed on the frontend.
Number of hosts (monitored/not monitored/templates) 691 449 / 1 / 241
Number of items (monitored/disabled/not supported) 0 0 / 0 / 0
Number of triggers (enabled/disabled)[true/unknown/false] 0 0 / 0 [0 / 0 / 0]
Number of users (online) 2 1
Required server performance, new values per second 0 -
jcesario
23-03-2010, 20:27
Same behavior on latest nightly.
There are no PHP or MySQL errors returned from this, as well as nothing displayed on the frontend.
It looks like PHP memory settings in php.ini are too low for your setup. Try to increase it to 512MB.
jcesario
24-03-2010, 15:08
Its currently at 2GB????
Also, if I was hitting the max memory limitation for PHP. There would be memory exhaustion errors thrown in the error_log. I have my PHP debug level set to E_ALL
nelsonab
24-03-2010, 18:45
I wish there was some functionality in the frontend for debugging/profiling log files. It would be nice if you could at least say "this was the last function called" to help track down bugs like this.
jcesario
24-03-2010, 18:47
Yah I was thinking the same exact thing actually.... Its far more a limitation of PHP than anything Zabbix is responsible for though.
jcesario
24-03-2010, 18:51
perhaps I could setup APD, maybe hack it into hosts.php or something? http://pecl.php.net/package/apd
How about this for php debugging/profiling?
http://xdebug.org/index.php
I haven't used it personally so I can't say if its output would be useful here.
nelsonab
24-03-2010, 19:53
Yah I was thinking the same exact thing actually.... Its far more a limitation of PHP than anything Zabbix is responsible for though.
True, but there isn't any functional framework within the Zabbix frontend for enabling adding debugging to disk for situations for this.
jcesario
24-03-2010, 20:00
So this is interesting.... For S&G I let my php threads (we use php-fcgi) spin out of control forever after one of my front end timeouts.
Watching the db I noticed something, that I was basically experiencing what people have been referring to as the "DOS" of profileid. Only this is on the auditlog table...
I see basically 2 types of queries just spammed over and over. Theyve been running for over 3 hours now...
SELECT nextid FROM ids WHERE nodeid=0 AND table_name='auditlog' AND field_name='auditid'
UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='auditlog' AND field_name='auditid'
EDIT: This is on the latest nightly
nelsonab
24-03-2010, 20:33
Yep... I saw the same symptoms on my box. It has since gone away but when I did see it it was usually when a new user would begin looking at Zabbix.
jcesario
24-03-2010, 20:38
I get this when trying to link templates against hosts... or at least I think. It could be totally unrelated.
jcesario
25-03-2010, 14:02
Bump. Really dont want to let this thread sink. This basically renders my zabbix setup useless.
nelsonab
25-03-2010, 14:22
Have you done much looking into getting xdebug installed? It looks like the only way to get some really good data is going to be to add a profiler of some kind.
You may try to enable debug in configuration of user groups for one specific group. Now you have a Debug link in the top header for all front-end screens.
Now if you see that a page is slow or very slow, click on the link to see why it happened.
jcesario
25-03-2010, 16:00
Thanks Alexei. I will give that a shot and report back what I find.
nelsonab
25-03-2010, 16:01
If you're getting a page timeout the front end debug is pretty much useless. The only way to see the debug information is if you can actually get a full page render. The problem I had had very similar symptoms and all I would ever see was a menu bar. Debug information was only available *after* the page rendered which *never* happened. What would be nice is if there was an ability to turn on debugging to a file. I don't care if actually turning it on requires digging into the source, in fact I would prefer that but something that says "this function is called with these values" and so on would do wonders. (kinda like what I have done with Zabcon)
I realize this may mean huge data dumps, perhaps you could make it more granular requiring levels or only logging when someone has enabled a debug get variable after they have enabled debug in the source (required to prevent a DOS of the system by default).
Otherwise for problems like this the only real option is to learn XDebug, or purchase Zend and use it's profiler.
jcesario
25-03-2010, 16:11
Um yah. that did nothing.
replicate problem, wait for page to time out in a few moments (far less than the php max execution time) - Then hit debug and.... nothing... no response. no POST|GET. nada.
jcesario
25-03-2010, 16:11
nelsonab. Exactly. Thats exactly the type of problems Im having.
jcesario
25-03-2010, 16:22
I will be looking into estimating LOE on getting some kind of profiling in there. However thats not something I have time for right now. Its basically a whole project unto itself.
Im also looking into grabbing a copy of Zend Profiler so I can just use that instead.
nelsonab
26-03-2010, 09:12
Cross referencing another thread with very similar symptoms, my hunch is it may be related.
http://www.zabbix.com/forum/showthread.php?t=15161
I've suspected for a while that the items queries pulls in all items for a host even when paging. I think this is requiring a large number of calls for each row even when most of those aren't displayed.
-Paul