Ad Widget

Collapse

Zabbix Server DB commits under heavy load/big queue

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mushero
    Senior Member
    • May 2010
    • 101

    #1

    Zabbix Server DB commits under heavy load/big queue

    We have about 25,000 active items and need 60/second to keep up, which normally runs fine. But when we get connectivity problems such as today, the server can't connect and the queue climbs up, of course, in this case to 18000.

    But when connectivity came back, the queue didn't go down, even though the DB showed over 675 updates/second and 1400 read/second - the DB was I/O saturated on log syncs. Changing log syncs (mysql sync on commit to 0) raised transaction level to 5000/second and the queue drained in less than a minute.

    My question is two-fold:

    1) The queue is a bit fuzzy to me but I assume is the number of items past their check time. If this is true and the server is checking more than the 60/second needed to keep up (was doing 1200/sec) why did the queue keep rising? I assume when an item is checked the next check time is set in the future (now + interval), not last scheduled check time plus interval.

    2) One of the great 1.8 performance improvements seems to be batch updates, so for 60 updates/second, I only see 1-2 log syncs normally, indicating Zabbix is doing lots of upates/inserts and then committing. But during this high queue time, 675 updates/sec gave me 675 DB sync/sec (actually 1300, for log and binlog) - does the server run out of RAM or other load factor that causes it to switch to one update/insert per transaction ?

    If so, we lose the performance benefit and a high queue can never be serviced, since the system gets much slower when the server does 1 update/xact instead of dozens.

    Hope this is clear. We are a heavy Zabbix user on our way to being one of the world's largest at 10 and 100X current sizes, so we need to really understand these dynamics when we're at thousands of updates/second and 10,000 hosts.
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #2
    Wow! That's a huge install!

    Best of luck with that. We're an Oracle shop, so I can't really help with MySQL. I believe, however, that some of the performance tweaks they're adding in 1.8.3 will be of benefit, especially the code to allow for multiple parallel dbsyncers. That may be the reason you've seen slower performance and an inability to keep up, and the fix in the pipes may be of benefit to help with your large deployment.

    Comment

    • dalle
      Senior Member
      Zabbix Certified Specialist
      • Mar 2009
      • 402

      #3
      here we have trouble with database response time, with mysql.... but our monitoring sistem is:
      Number of hosts (monitored/not monitored/templates) 1261
      Number of items (monitored/disabled/not supported) 52909
      Number of triggers (enabled/disabled)[true/unknown/false] 7991
      and run on 8 core CPU 16GB Ram on x64 architecture and storage of mysql on fiberchannel over IBM storage 8000
      We are thinking to migrate on Oracle because mysql sometime use ALL 8 core...
      Andrea Dalle Vacche
      website:http://www.smartmarmot.com/
      e-mail:
      Author of:Mastering Zabbix Book - second edition
      Zabbix Network Monitoring Essentials

      Comment

      • untergeek
        Senior Member
        Zabbix Certified Specialist
        • Jun 2009
        • 512

        #4
        I wouldn't bother until 1.8.3 and you can use parallel dbsyncers officially. Oracle is kind of bound-up with only 1 dbsyncer (the default in 1.8.0-2). It's much better for us with the "unofficial" 12 dbsyncers we have compiled in.

        Comment

        • fjrial
          Senior Member
          • Feb 2010
          • 140

          #5
          host down -> big queue

          Hi:

          I'm already in the same situation

          About 200 hosts
          About 9000 items (99% of items are SNMP v2 items)

          In normal situation, none of the items reachs the 30 seconds queue. But if a little Cisco with 100 items goes down.. the queue gets quickly filled with items, and they pass even the 5 minute queue.. I don't want to think what will happen if a Juniper router with more than 600 items goes down...

          Will parallel dbsyncers help on this?

          Just to note:
          In zabbix_server.conf
          StartTrappers=10
          StartPollers=10
          Timeout=3
          UnreachablePeriod=45
          UnavailableDelay=120
          UnreachableDelay=60

          Comment

          Working...