Ad Widget

Collapse

Queue Delays

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RaymondH
    Junior Member
    Zabbix Certified Specialist
    • Dec 2010
    • 24

    #1

    Queue Delays

    I've done a few days worth of searching and tweaking in an attempt to clear out my Zabbix Queue, but for the life of me it's just not working correctly.

    Click image for larger version

Name:	queue.jpg
Views:	1
Size:	23.5 KB
ID:	315645

    Most of the items that were stuck for a few days have processed or dropped, but I'm still seeing items in the queue. I've rebooted the server, the services, the agents, the remote servers (some), the DB, Disabled/removed/readded the hosts and it's just not going away. How can I clear these out so that it stops giving me False Positives with other triggers? What am I missing.

    Running Zabbix Appliance 1.8.3 (Yes I know it's not for Production)

    Zabbix server is running Yes -
    Number of hosts (monitored/not monitored/templates) 132 85 / 1 / 46
    Number of items (monitored/disabled/not supported) 5705 2509 / 462 / 2734
    Number of triggers (enabled/disabled)[problem/unknown/ok] 1601 1464 / 137 [28 / 601 / 835]
    Number of users (online) 2 2
    Required server performance, new values per second 19.24 -


    Thanks in advance

    Raymond
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #2
    Questions to help in troubleshooting:

    What data type are these? e.g., Zabbix agent, Zabbix agent (Active), Zabbix trapper, simple, etc.

    Have you tried doing a zabbix_get -s servername.example.com -k item.key against the server/agent to see if you can even get values (if they're agent based)

    Perhaps that's a good first place to check, whether the values are even collectable manually.

    Comment

    • RaymondH
      Junior Member
      Zabbix Certified Specialist
      • Dec 2010
      • 24

      #3
      Thanks for the reply -

      Since my last post it's a lot worse, so I'm not exactly sure what is going on.

      Click image for larger version

Name:	queue2.jpg
Views:	1
Size:	24.0 KB
ID:	309303

      I'll try the command you listed (if I can figure it out) and will let you know what happens. Kinda sad, it was running good for about 3 days after I fixed the last issue I had.

      I don't know if it's because I'm using the appliance or it's just my overall lack of experience w/ Zabbix and Linux, but I'm just struggling to keep this running smoothly. It sure beats Nagios, just wish I could keep it running for more then a few days.

      Comment

      • untergeek
        Senior Member
        Zabbix Certified Specialist
        • Jun 2009
        • 512

        #4
        You'll do far better with an actual, compiled install. The appliance is great, but it's not for production (as they said).

        Comment

        • RaymondH
          Junior Member
          Zabbix Certified Specialist
          • Dec 2010
          • 24

          #5
          Yeah, that's my understanding. Except I'm to dumb to figure out how to compile the source even following directions. I get weird results, and then I can't figure out what to do after I "finish" (if I'm doing it right) the make install.

          I've tried reading the directions in the Manual, some "how-to" guides for google, 2 different versions of a Linux install, and I just can't figure it out. So it was a case of: Use the appliance to play around w/ Zabbix, or just forget about it and look for something else.

          The biggest challenge I'm having w/ this tool is that it's geared for people that know what they are doing in Linux/DB'es. It makes a lot of "we assume you know how to do X/Y/Z before you actually get to this step in the instructions". Which is fine, I understand, but it makes it very hard for a Windows guy to read between the lines and get it running/installed/configured.

          Heck, took me 2 days to figure out I needed to convert the Appliance before I could use on my ESXi server.

          Love this tool, don't get me wrong, it's awesome. Just seems to have a steep learning curve if you don't have Linux/DB experience.

          Will get back to you on the other results - again, thanks for the reply/feedback.

          Comment

          • RaymondH
            Junior Member
            Zabbix Certified Specialist
            • Dec 2010
            • 24

            #6
            From my Zabbix server, it looks like I can use the zabbix-get command to pull the data.

            hqzabbix:/usr/bin # ./zabbix-get -sx.x.x.x -p10050 -k"vfs.dev.read[sda,sectors]"
            3364662

            That being said - shouldn't it be working or am I missing something?

            Thanks,

            R

            Comment

            • untergeek
              Senior Member
              Zabbix Certified Specialist
              • Jun 2009
              • 512

              #7
              In theory. Performance implications are involved. If there are IO problems, you can have hang-ups. I just don't know enough about the appliance to know if you'd be suffering from this or not. It's complicated.

              What is the interval between checks for these items? You could try decreasing the interval to be shorter.

              Are these items Zabbix Agent or Zabbix Agent (Active)?

              Comment

              • RaymondH
                Junior Member
                Zabbix Certified Specialist
                • Dec 2010
                • 24

                #8
                Fair enough -

                Let me do some more house keeping and figure out what I want my checks to be @; up to this point I've just left everything as it's default template value.

                Will report back w/ my finding so that hopfully other that run into this can learn something.

                Thanks again!

                R

                Comment

                • RaymondH
                  Junior Member
                  Zabbix Certified Specialist
                  • Dec 2010
                  • 24

                  #9
                  :: Update ::

                  Whew....OK!

                  So for the better part of today, I went through one device at a time and disabled all UNSUPPORT Items/Triggers. (Is there an easy way to do this other than 1 at a time?)

                  This seems to have cleared up my issue; because about half way through the 89ish nodes I have monitored the Queue started to clear up and now I'm sitting at 0.

                  On the recommendation of Untergeek I also went through and adjusted my item timings.

                  Doing the above 2 things seemed to have done the trick....for now (dun dun dunnnnn).

                  Thanks for all the help, hopefully this post can help some other person who is experiencing the same issue!

                  Comment

                  • alixen
                    Senior Member
                    • Apr 2006
                    • 474

                    #10
                    Hi,

                    Originally posted by RaymondH
                    So for the better part of today, I went through one device at a time and disabled all UNSUPPORT Items/Triggers. (Is there an easy way to do this other than 1 at a time?)
                    For Items:
                    In menu : Configuration -> Hosts -> Items
                    In Filter, select Status : Not supported
                    Clic on Filter.

                    At bottom of page drop down, select "Mass update" and clic on Go(nnn)
                    In mass update page:
                    Check Status checkbox
                    Set Status value to Disabled
                    Clic on Update

                    For Triggers, you can also use mass update but there is no filter, so you'll have to select all triggers by hand.

                    That's it

                    Regards,
                    Alixen
                    http://www.alixen.fr/zabbix.html

                    Comment

                    • RaymondH
                      Junior Member
                      Zabbix Certified Specialist
                      • Dec 2010
                      • 24

                      #11
                      Oh - you're awesome.

                      Thanks for the reply!

                      R

                      Comment

                      • qix
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Oct 2006
                        • 423

                        #12
                        Hi Raymond,

                        I've also come across some large queue problems in the past.
                        In our case, it had to do with IO problems. The disks just couldn't keep up with the MySQL queries. Since you are on a VM Appliance, I suspect this is your problem as well.

                        Like untergeek replied earlier, setting a higher polling interval on your items will significantly decrease the IO load. (sorry for the late reply, doh.)

                        If you are new to Linux, just a few quick tips on seeing if you have an IO problem on your box:
                        • Install atop, it is a top like application but it has a few bonuses. If your terminal supports it, it will colorcode lines that indicate a performance issue (like IO and Swapin/-out).
                        • You could also use iotop, it is like top, but for io
                        • If those are not available, iostat and/or vmstat might be able to help you identify if you are experiencing IO related trouble.


                        Just for being thorough, you need to get these tools in your VM appliance, not you ESXi server.

                        Just my 2cts worth, hope it helps,
                        With kind regards,

                        Raymond

                        Comment

                        Working...