Ad Widget

Collapse

BIG numbers in delay queues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cioris
    Member
    • Oct 2008
    • 30

    #1

    BIG numbers in delay queues

    Hi,

    I have aproblem with my zabbix 1.6. It looks like everything is delayed for a log time (more than 10 min). Most of my agents are configured for active checks.

    Here is the summary of queues for Active checks:
    5 seconds: 0
    10 seconds:1
    30 seconds: 0
    1 minute: 0
    5 minutes: 0
    10 minutes: 1434

    Any idea how I can debug this problem? I cannot detect if the problem is the network connection (all agents are in remote sites, so the reporting goes over Internet connection) or there is a problem w/ my database.

    There are 153 remote agents sending data to the server, but the network activity is significantly below the actual bandwidth available. Can be a problem w/ the number of connections? How can I check it?

    Any suggestion is really appreciated.
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    These numbers are not correct for active checks. This is already fixed in pre 1.6.1, please wait for the official release.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • cioris
      Member
      • Oct 2008
      • 30

      #3
      Thanks..

      I will wait for that. Any estimate when this is supposed to be available?
      BTW, I was trying to get in touch with you to discuss the possibility of buying some "comercial" support, but the addres [email protected] is not functional. Can you give me some hints about prices?

      Thanks again for your answer and congrats for the great job!!!

      Comment

      • cioris
        Member
        • Oct 2008
        • 30

        #4
        again about queues...

        I understand that the numbers are not valid for active checks, but I have the same problem for "passive" checks:

        5 second: 0
        10 seconds: 0
        30 seconds: 0
        1 minute: 0
        5 minutes: 0
        10 minutes: 37

        The only items monitored in passive mode are the ones for Zabbix server itself and for one zabbix proxy I have in a remote location. So, I think my question is still valid. Any way to check why everything is delayed?

        Thanks

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Originally posted by cioris
          BTW, I was trying to get in touch with you to discuss the possibility of buying some "comercial" support, but the addres [email protected] is not functional.
          Please use contact form http://www.zabbix.com/contact.php to provide us with more details us about this email problem. I would be interested to know your email address, so we could check our email server logs. Thank you!
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • cioris
            Member
            • Oct 2008
            • 30

            #6
            still big numbers in queue

            I installed the new 1.6.1, but I still see ig numbers in delayed queues:

            5 seconds: 0
            10 secons: 0
            30 seconds: 0
            1 minute: 69
            5 minutes: 557
            10 minutes: 1406

            All are Zabix Agent Active. Any idea how can I check if there is something wrong with my server? I'm running mysql and zabbix server on the same machine.

            thanks

            Comment

            • drose12
              Junior Member
              • Apr 2007
              • 27

              #7
              Zabbix 1.6.1

              I am seeing graphing anomalies and our queues are way longer than I've ever seen. Came from 1.4.3 and never ever noticed the queues get long, but on this new version they are:

              Zabbix Agent:

              5: 83
              10: 296
              30: 395
              1: 340

              How does one trace this?

              Comment

              • nelsonab
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Sep 2006
                • 1233

                #8
                This has been a transient bug for a long time.

                My guess is that one of the server threads has become hung up and is blocking other threads from retrieving data. I may be wrong, but the behavior over the versions is somewhat consistent with this.

                So that helps you little in fixing it, let's see what we can do.

                First off what is the server load on your MySQL box? Run top and have a look at the percentage next to "wa" is this high? What is the overall server load? Is that high?

                If either one of those are high look into tuning your MySQL installation and or server. Check programs like CPUSpeed (it's been known to slow down a system big time).

                To reset the queue just restart the server. Usually this is when I have found the queue will reset and the items "stuck" in the queue will start working again. Why? Likely related to the above theory.

                Also check the clients and make sure they are responding appropriately. Telnet to them directly from the server and request a key. They should respond in a fairly quick manner. If they do not on a consistent basis that might be a factor in the queue overload.

                Hopefully this helps a little.
                RHCE, author of zbxapi
                Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
                Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

                Comment

                • bee
                  Senior Member
                  • Jun 2007
                  • 133

                  #9
                  Hi @:
                  I have similar issue with ZABBIX 1.6.1. The queue numbers seem unbelievable, and don't have any idea on how to solve it. I did on ZABBIX server restart but took a while until all queue's number back to "normal"

                  My server performance seem fine, below 'top' output:
                  Code:
                  last pid: 86517;  load averages:  0.35,  1.62,  4.80  up 74+16:36:25    17:21:34
                  213 processes: 1 running, 212 sleeping
                  Mem: 131M Active, 1528M Inact, 197M Wired, 45M Cache, 112M Buf, 101M Free
                  Swap: 4096M Total, 284K Used, 4096M Free
                  Also, i tried to increasing with 30 of pollers and trappers:
                  Code:
                  StartPollers = 30
                  StartTrappers = 30
                  I hope will be solve on next release of ZABBIX.

                  Thanks,
                  BEE
                  Attached Files
                  Last edited by bee; 23-12-2008, 13:48. Reason: add QUEUE screeshot

                  Comment

                  • nelsonab
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Sep 2006
                    • 1233

                    #10
                    I forget where it is, but there's a screen that will tell you the required performance. What is that value? If that is below the number of pollers you have then try upping the pollers so there is at least a 20% overhead.

                    Also what were you doing to cause the 15 minute average to climb to 4? How many cores does the processor have?

                    I would also suggest having a look into the Database performance tuning posts that are scattered around. That will help some with this.
                    RHCE, author of zbxapi
                    Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
                    Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

                    Comment

                    • bee
                      Senior Member
                      • Jun 2007
                      • 133

                      #11
                      I forget where it is, but there's a screen that will tell you the required performance. What is that value? If that is below the number of pollers you have then try upping the pollers so there is at least a 20% overhead.
                      The value show 117.32, while the poller set to 30. I cannot set the poller value higher then current value or ZABBIX will get crash.

                      Also what were you doing to cause the 15 minute average to climb to 4? How many cores does the processor have?
                      I have quadcore processor. Thank you to point me on this, It happen after upgrade from 1.4.6 to 1.6.1, all active checks (Agent Active) suddenly stop sending the value. As result i change all active check to zabbix agent. I still try to figure out why this could be happen.

                      I would also suggest having a look into the Database performance tuning posts that are scattered around. That will help some with this.
                      Thank you, start "digging" now.

                      Cheers,
                      BEE

                      Comment

                      • nelsonab
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Sep 2006
                        • 1233

                        #12
                        Sounds like we might be getting somewhere. First see how many database connections you are configured. Also how many trappers and how many pollers are you configured for? Let's hold off on upping the db stuff until we have to, first try the suggestion below.

                        Do you know what your polled to trapped ratio is? That required performance seems a little excessive.

                        You might also want to take a look at your templates (I hope you're using templates) and make sure you're polling things at a sane interval. CPU for instance I poll every 30-60sec max depending on how important that load is. Network interfaces once a minute, disks once every 5 minutes and so on. This will reduce the required performance and in turn reduce our queue length. The first items I would look at are the CPU related ones. The default template has some very aggressvie CPU polling values which are not usually needed.
                        Last edited by nelsonab; 23-12-2008, 22:05.
                        RHCE, author of zbxapi
                        Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
                        Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

                        Comment

                        • MrKen
                          Senior Member
                          • Oct 2008
                          • 652

                          #13
                          Just my 2 cents worth.

                          I had a similar problem a couple of weeks back. Long queues, and the housekeeper was taking about 45 - 50 minutes to delete 80,000 records.

                          I installed mysqltuner, followed the recommended alterations to my.cnf and haven't had any problems since [touch wood].

                          MrKen

                          p.s This is a great post regarding managing your data: http://www.zabbix.com/forum/showpost...79&postcount=2
                          Disclaimer: All of the above is pure speculation.

                          Comment

                          • svenw
                            Junior Member
                            • May 2008
                            • 26

                            #14


                            i had a similar problem, maybe yours is the same?

                            Comment

                            Working...