Ad Widget

Collapse

Zabbix Internals: processes 100% busy

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Mojah
    Member
    • Apr 2010
    • 60

    #1

    Zabbix Internals: processes 100% busy

    Hi,

    We have a fairly large & busy zabbix monitoring setup (around 400 hosts, 15.000 items and 12.000 triggers) and are experiencing a "peculiar" problem.

    At times, the important internal zabbix processes all reach 100% busy at the same time:
    - housekeeper
    - poller
    - http poller
    - history syncer
    - icmp pinger

    These are of course our most active checks (icmp, poller (snmp) & http), but they occasionally get to a 100% busy state at random intervals. It's usually solved automatically after an hour or so, but it causes some items to have missing data.

    Host performance remains OK throughout the day, there's no extra CPU load. There seems to be around 50% room left for CPU time, so it's not fighting for CPU cycles.

    The zabbix-server config has parameters for "StartPollers", "StartTrappers", ..., but I'm confused as to how these auto-grow. If more processes are needed to execute the requested checks, would it continue spawning more processes?

    How would you go about debugging what may be the cause of it, and what possible solutions could I expect?

    Ps; we're running the latest 1.8.5 server.
  • Mojah
    Member
    • Apr 2010
    • 60

    #2
    To make this topic visually more appealing, and perhaps trigger some responses, I've attached a few zabbix graphs.

    The second graph shows that the "history syncer" process starts to occupy more resources (busy state rises to a 100% gradually), after which all other processes get to a 100% busy.

    Both zabbix server and mysql server (2 different servers) behave normally, no increased load. In debug-mode, zabbix server does not tell me that much (except, with our number of items/triggers, a sh*tload of warnings).

    A push in the right direction would be helpfull. :-)
    Attached Files

    Comment

    • jroberson
      Senior Member
      • May 2008
      • 124

      #3
      Not sure if you are still experiencing the same problems, but I'm also experiencing this, now. However, I'm trying to diagnoses another problem and have changed my debuglevel=4 in my config file and set my max log size to 10M. I'm noticing that every time my log rotates (which is pretty often now) I see a spike in the History Syncer process and then eventually it will cap at 100% then I start losing data (gaps in graphs).

      I started to monitor the history and text caches as well and noticed that when the History Syncer process went to 100%, the caches started to fill up. It makes sense, of course.

      Then I did a bit of thinking about the issue and remembered that I was monitoring the Zabbix log as an Active check. That means that the log file would then start to grow even faster because it would have started to contain itself. Wild, man! SO, I stopped monitoring it.

      Then I realized that I would never be able to catch the problem I was trying to find If my log file rotated every 10 seconds or so. Therefore, what I tried is:
      1. Set my log size to 6M
        -Because I figured the logic required in the active check is to compare the last check to this check; so, bigger file equals more processing.
      2. Change my check interval for my log to every 4 seconds
        -Else I would probably lose data.


      Then ... Failure! It seemed to be working just fine and then BAM 100% History Syncers.

      SO, I disabled the log check and waited for it to go sane. After it cleaned out all of the history and text buffers, it started responding again.

      Therefore, it seems that log files might be the problem. It looks like I'm out of luck with my log file and will just have to move it back to debug level 3 for now and hope that will give me the info I need.

      Anybody have any suggestions?

      Comment

      • Mojah
        Member
        • Apr 2010
        • 60

        #4
        Originally posted by jroberson
        Not sure if you are still experiencing the same problems
        I am, but so far no solution yet. I'm going to try debug what you did with the log files, see if that makes a difference.

        Thanks for the feedback, hopefully someone else also experiences this. :-)

        Comment

        • jroberson
          Senior Member
          • May 2008
          • 124

          #5
          Did you ever increase your DBSyncers value in the config file? I've upped it to 2 for now to see if it makes any difference. Still testing, though. You could also try changing the number of Pollers and Trappers that you have running until you see stuff start piling up in the queue or until you see the processes go up to 100% all the time. If you can get the number of these just right (low as possible), you can reduce the overhead required and possibly improve Zabbix performance. I don't think Zabbix will automatically spawn new processes if it needs them, but I think that would be a pretty neat feature, though.

          Comment

          Working...