Ad Widget

Collapse

Zabbix Queues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Ash
    Junior Member
    • Sep 2004
    • 6

    #1

    Zabbix Queues

    Alexi,

    I screwed up entering the monitoring port for a number of new servers I recently added to Zabbix. During the time between entering all the new servers and the time I discovered my typeo the zabbix server tried to poll the servers on the port I had erroneously entered and now I have 298 tasks in the task queue scheduled to occur on 01/01/1970.

    I have since corrected the problem and Zabbix is monitoring the servers on the correct port however I would like to remove the backlog from the queue.

    How do I clear these?

    Regards,
    Ash
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Hi Ash,

    Are you sure zabbix_suckerd is running? Are all elements of the queue ICMP-ping related?

    There is no such thing as backlog in ZABBIX. The queue is just list of items of monitored hosts that have to be updated immediately.

    In normal case when no performance problems exist, the queue stays empty or nearly empty. The only exception could be ICMP-related (icmpping, icmppingsec) items. Such items may stay in the queue longer, up-to 30 seconds, if you ping your hosts every 30 seconds.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • Ash
      Junior Member
      • Sep 2004
      • 6

      #3
      Re: Queue

      Alexi,

      Yes, confirmed zabbix_suckerd is running
      No, not all elements of the queue ICMP-ping, some are disk space checks, some CPU checks (ALL are ones that usually work on other hosts).

      I know the problem you are referring to about icmp ping related as I suffered it earlier when fping wasnt SETUID. This has a similiar symptom (for example all items in the queue have the same date as the fping problem had 01/01/1970).

      Normally on this server the queue is empty, its performing all the same tests on other servers that it is performing on these servers, its just like I said, when I created this new lot of hosts, I screwed up the port for zabbix to poll and that appears to have caused the queue entries. As to why it defaults to 01/01/1970 I have no idea.

      All of my other servers are working as desired but these new ones arent, even after fixing the agent port number problem that I created earlier. For example, if I go into Latest values of any operational host (other than these newly created ones) I get the current values of any monitored parameter but on these newly created hosts, all fields are blank.

      I've sheduled a bounce of the zabbix server this evening but failing that, is there anything else I could check?

      Regards,
      Ash

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        Hmm... I cannot understand how initial setting of incorrect port could screw up something.

        Please do the following:

        - select an item in question from the queue
        - check status of corresponding host

        Status of the host must be Monitored.

        Also, I would suggest to check zabbix_suckerd's LogFile.

        That's all I can help now. No more ideas so far.
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • Ash
          Junior Member
          • Sep 2004
          • 6

          #5
          Re:

          Yep. for some reason the status had changed to 'not monitored' on those hosts (prob because I originally screwed up the port to check them on they switched to not monitored).

          Comment

          • charles
            Member
            • Sep 2004
            • 54

            #6
            Originally posted by Ash
            Yep. for some reason the status had changed to 'not monitored' on those hosts (prob because I originally screwed up the port to check them on they switched to not monitored).
            Is there any way to configure the rules used to decide when to stop trying to monitor a host?

            thanks
            charles

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #7
              Hi, Charles,

              Originally posted by charles
              Is there any way to configure the rules used to decide when to stop trying to monitor a host?
              Currently, ZABBIX stops monitoring of a host for 60 seconds in case if three (3) network error occured. For example, host is unreachable, unable to resolve host name (of no IP used), etc etc.

              Every 60 seconds, ZABBIX will try to restart monitoring of the unreachable host.

              When host is not monitored, status of all triggers (except triggers related to status of the host) is UNKNOWN.

              The logic is hardcoded.

              What would you like to change in the logic? Why? What could be configurable?
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              • charles
                Member
                • Sep 2004
                • 54

                #8
                Originally posted by Alexei
                What would you like to change in the logic? Why? What could be configurable?
                Hi Alexei

                Thanks for the explanation. In my case I have had monitoring stop for two reasons.

                1. There was a problem with a slow user parameter.
                2. The box (being monitored) crashed and because zabbix_agentd was killed suddenly and the pid file existed, it never started at boot.

                So, I think there needs to be a nicer solution to #2, but in either case it would be nice if I could configure it to try once a day or something in case it can start monitoring again without me waiting a week to realize it has stopped Because there have been times where I realized zabbix_agentd was down and started it, but didn't put the host back into monitored state again - so never collected data for a while longer.

                Am I missing something?

                charles
                p.s. I am starting to wonder if I am not remembering the missing parameter issue properly - it may just stop monitoring that one user parameter, but I am not sure anymore.
                p.p.s. The site has been faster for me all week, but right now it is unbearbly slow - about 10 seconds to load this page.

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #9
                  Originally posted by charles
                  2. The box (being monitored) crashed and because zabbix_agentd was killed suddenly and the pid file existed, it never started at boot.

                  ....

                  Am I missing something?
                  I would suggest to monitor avilability of all hosts using trigger expression similar to:

                  #Server {HOSTNAME} is unreachable
                  {host:status.last(0)}=2

                  In this case if for some reason an agent will not be running, you'll get a message.

                  Originally posted by charles
                  p.p.s. The site has been faster for me all week, but right now it is unbearbly slow - about 10 seconds to load this page.
                  This is because of network speed. The www.zabbix.com itself is working perfectly, at least for me
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • charles
                    Member
                    • Sep 2004
                    • 54

                    #10
                    Originally posted by Alexei
                    I would suggest to monitor avilability of all hosts using trigger expression similar to:

                    #Server {HOSTNAME} is unreachable
                    {host:status.last(0)}=2

                    In this case if for some reason an agent will not be running, you'll get a message.
                    Yes, I think the problem is for our agent based template we must have a different check that is either not configured properly or not setup to notify. A lot of our machines are standalone and those are fine.

                    Looking forward to 1.1! I need to setup dependecies properly when it comes out

                    Originally posted by Alexei
                    This is because of network speed. The www.zabbix.com itself is working perfectly, at least for me
                    I know

                    Comment

                    • charles
                      Member
                      • Sep 2004
                      • 54

                      #11
                      Originally posted by Alexei
                      Hi Ash,

                      Are you sure zabbix_suckerd is running? Are all elements of the queue ICMP-ping related?

                      There is no such thing as backlog in ZABBIX. The queue is just list of items of monitored hosts that have to be updated immediately.

                      In normal case when no performance problems exist, the queue stays empty or nearly empty. The only exception could be ICMP-related (icmpping, icmppingsec) items. Such items may stay in the queue longer, up-to 30 seconds, if you ping your hosts every 30 seconds.
                      Hi Alexsey

                      I have opened this thread up since I have this exact problem and I can't resolve it (although in my case I have 4682 in my queue).

                      Suckerd is running and checking a few hosts who have entries in the queue with todays date.

                      The queue items are a mixture of different checks, not just icmp related

                      Some have the date 12.31.1969 19:00:00, some on other days, but the vast majority are from the 11th of this month.

                      I have spot checked hosts, and they are all in state "Monitored" and the checks are "Active".

                      How can I clear the queue and get zabbix monitoring again?

                      thanks
                      charles
                      p.s, this is v1.0beta14 and the status check took a very long time to run..
                      Is zabbix_suckerd running ? Yes
                      Is zabbix_trapperd running ? Yes
                      Number of values stored 114580174
                      Number of trends stored 24283583
                      Number of alarms 21469
                      Number of alerts 769
                      Number of triggers (enabled/disabled) 3502(3501/1)
                      Number of items (active/trapper/not active/not supported) 6434(5699/0/85/650)
                      Number of users 16
                      Number of hosts (monitored/not monitored) 297(270/2)

                      Comment

                      • Alexei
                        Founder, CEO
                        Zabbix Certified Trainer
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Sep 2004
                        • 5654

                        #12
                        Hi Charles,

                        Originally posted by charles
                        Some have the date 12.31.1969 19:00:00, some on other days, but the vast majority are from the 11th of this month.

                        I have spot checked hosts, and they are all in state "Monitored" and the checks are "Active".

                        How can I clear the queue and get zabbix monitoring again?

                        ...

                        p.s, this is v1.0beta14 and the status check took a very long time to run..
                        ZABBIX selects all items from the queue having next check date in past. Please, could you check if the items with date 12.31.1969 (default date) have type 'ZABBIX agent'? Is there anything special about the items?

                        The status check took a very long time because of usage of InnoDB. In this case, MySQL does sequential scan of all data in a table to get result of "select count(*) from <table>".
                        Alexei Vladishev
                        Creator of Zabbix, Product manager
                        New York | Tokyo | Riga
                        My Twitter

                        Comment

                        • charles
                          Member
                          • Sep 2004
                          • 54

                          #13
                          Originally posted by Alexei
                          Hi Charles,


                          ZABBIX selects all items from the queue having next check date in past. Please, could you check if the items with date 12.31.1969 (default date) have type 'ZABBIX agent'? Is there anything special about the items?
                          All the 12.31.1969 ones are ICMP. A sample...
                          12.31.1969 19:00:00 VZD2 ICMP Ping Seconds
                          12.31.1969 19:00:00 Power_D3-11 ICMP Ping
                          12.31.1969 19:00:00 VZ4 ICMP Ping Seconds
                          12.31.1969 19:00:00 ALER-MIKROTIK ICMP Ping Seconds

                          I then have some oddballs

                          05.10.2004 10:49:36 DTG162 ICMP Ping Seconds
                          05.10.2004 17:08:03 DTG162 ICMP Ping
                          09.03.2004 03:41:58 DTG102 ICMP Ping Seconds
                          09.07.2004 18:52:15 DTG97 ICMP Ping Seconds
                          09.07.2004 18:52:15 DTG100 ICMP Ping Seconds
                          09.07.2004 18:52:16 DTG99 ICMP Ping Seconds
                          09.07.2004 18:52:16 DTG98 ICMP Ping Seconds
                          11.04.2004 10:48:23 DTG133 ICMP Ping Seconds
                          11.04.2004 10:48:24 DTG31 ICMP Ping Seconds
                          11.04.2004 10:51:15 DTG197 ICMP Ping Seconds
                          12.06.2004 12:19:05 NLAY-EQCHI-1 ICMP Ping Seconds
                          12.30.2004 16:22:35 DTG185 ICMP Ping Seconds
                          01.20.2005 21:46:34 DTG195 ICMP Ping Seconds
                          02.03.2005 10:55:27 VZArray1 ICMP Ping Seconds

                          But, then I have the rest on 02.11.2005 and they are type zabbix agent it appears, but cover all types.

                          02.11.2005 00:29:02 kt4c SSH server is running
                          02.11.2005 00:29:02 kt4c Free number of inodes on /usr
                          02.11.2005 00:29:41 kt4c Incoming traffic on interface eth0 (1min)
                          02.11.2005 00:29:41 kt4c Outgoing traffic on interface eth1 (1min)
                          02.11.2005 00:29:41 kt4c Processor load
                          02.11.2005 00:29:41 kt4c Incoming traffic on interface eth1 (1min)
                          02.11.2005 00:29:41 kt4c Outgoing traffic on interface lo (1min)
                          02.11.2005 00:29:41 kt4c Incoming traffic on interface lo (1min)
                          02.11.2005 00:29:41 kt4c Outgoing traffic on interface eth0 (1min)
                          .....

                          Zabbix is not collecting agent data for most hosts it appears, and up/down alerts etc are nto working as well. This is why I want to get the queue cleared so I can see whats getting done. Only a very few hosts appear to be getting checked right now and they are all icmp it seems.

                          Originally posted by Alexei
                          The status check took a very long time because of usage of InnoDB. In this case, MySQL does sequential scan of all data in a table to get result of "select count(*) from <table>".
                          mine are all MyISAM though

                          charles

                          Comment

                          • charles
                            Member
                            • Sep 2004
                            • 54

                            #14
                            Alexei, do you have any suggestions on how to recover from this? Zabbix is effectively down for me and I need to get it working again. Nothing is getting monitored and nothing is graphing.

                            Short of blowing away all my data and starting clean, what can I do to fix this?

                            thanks
                            charles

                            Comment

                            • Alexei
                              Founder, CEO
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Sep 2004
                              • 5654

                              #15
                              I really have no idea what hapenned. If it worked before and then suddenly stopped working, then something obviously has changed. I'd check:

                              1. Available disk space for ZABBIX database (who knows?)
                              2. LogFile of both zabbix_suckerd and zabbix_trapperd
                              3. If you see "insert ... failed" in a LogFile, it means your database is corrupted. Do 'repair table ...'.

                              Let me know if it helped.
                              Alexei Vladishev
                              Creator of Zabbix, Product manager
                              New York | Tokyo | Riga
                              My Twitter

                              Comment

                              Working...