Ad Widget

Collapse

Prevent Zabbix from hitting triggers after a restart of zabbix_server daemon

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Sjeik
    Junior Member
    • Sep 2007
    • 19

    #1

    Prevent Zabbix from hitting triggers after a restart of zabbix_server daemon

    Hi all,

    I'm using the 'nodata' function on TCP ping to determine if a server is up or down. Of course I don't want to receive mails or sms messages when a host is down for two seconds, since the problem is most likely to be a small network outage in that case. So this is the expression I use in a trigger:
    Code:
    {Template_Linux:agent.ping.nodata(300)}=1
    A agent which doesn't ping for 300 seconds will be marked as "unreachable". Works like a charm, however... When the zabbix_server daemon (not the agent!) has been down for maintenance longer than 300 seconds, all triggers will start to issue alarms on startup and my phone is hit by over 50 text messages.

    The nature of the "ping" command prevents me from using the "last(...)" function on it, so I really need to use the "nodata" function. Does anyone have any idea how to fix this? Maybe Zabbix needs some kind of no-trigger or no-actions mode for a minute or two after the daemon has started?

    Thanks for your ideas!
  • johnnyirons
    Junior Member
    • Dec 2007
    • 16

    #2
    what do you mean by 'down for mainteinance'?
    if you mean you stop you zabbix_server processes (and host itself), try adding your expected values in that period.
    Code:
    eg. if you poll every 60secs 
    {Template_Linux:agent.ping.min(#5)#1}
    as 300/60 = #5

    otherwise you can check also zabbix server status, for valid values and if there has been any data in 5mins, like:
    Code:
    ({Template_Linux:agent.ping.nodata(300)}=1&({ZabbixServer:proc.num[zabbix_server].min(300)}>0&{ZabbixServer:proc.num[zabbix_server].nodata(300)}#1)
    hope this helps.

    Comment

    • Sjeik
      Junior Member
      • Sep 2007
      • 19

      #3
      Originally posted by johnnyirons
      what do you mean by 'down for mainteinance'?
      if you mean you stop you zabbix_server processes (and host itself), try adding your expected values in that period.
      Code:
      eg. if you poll every 60secs 
      {Template_Linux:agent.ping.min(#5)#1}
      as 300/60 = #5

      otherwise you can check also zabbix server status, for valid values and if there has been any data in 5mins, like:
      Code:
      ({Template_Linux:agent.ping.nodata(300)}=1&({ZabbixServer:proc.num[zabbix_server].min(300)}>0&{ZabbixServer:proc.num[zabbix_server].nodata(300)}#1)
      hope this helps.
      I don't think I understand your syntax. What exactly do you mean with "#1" in "ping.min(#5)#1"?

      The last suggestion sounds better, just check if the Zabbix server has been down. I'll try that!

      Comment

      • johnnyirons
        Junior Member
        • Dec 2007
        • 16

        #4
        #1 means 'any values other than 1'.
        since 1 is expected as a successful ping, it literally means: if all the values in 5 polls => 60*5=300 seconds, are not 1, fire the trigger.
        since icmpping spits only two values, 0 or 1, this is equal to 0.

        understood?

        Comment

        • Sjeik
          Junior Member
          • Sep 2007
          • 19

          #5
          Originally posted by Sjeik
          I don't think I understand your syntax. What exactly do you mean with "#1" in "ping.min(#5)#1"?

          The last suggestion sounds better, just check if the Zabbix server has been down. I'll try that!
          Hmmm, Zabbix doesn't accept the expression:
          Incorrect trigger expression. You can't use template hosts in mixed expressions.
          This is what I'm trying to do (my Zabbix server is called "zabbix"):
          Code:
          {Template_Linux:agent.ping.nodata(300)}=1&({zabbix:proc.num[zabbix_server].min(300)}>0&{zabbix:proc.num[zabbix_server].nodata(300)}=0)

          Comment

          • johnnyirons
            Junior Member
            • Dec 2007
            • 16

            #6
            that's correct (my fault)..
            that error says you cannot mix triggers from normal hosts and templates.
            what about moving the item in your host off the template_linux?

            just copy it from the template in the 'items' configuration page.

            btw, this is exactly the reason why i prefer to use polling times against seconds in trigger expressions, so can i change intervals of polling for items (eg. in the night, i poll some items less often to not disturb them and not to grow database too much)

            Comment

            • Sjeik
              Junior Member
              • Sep 2007
              • 19

              #7
              You suggest that I move the trigger out of the template? That's exactly what I _do not_ want, since templating is a very powerful feature of Zabbix.

              Comment

              • johnnyirons
                Junior Member
                • Dec 2007
                • 16

                #8
                so you should use the first solution i proposed.
                or, you can set a trigger with that, give it lo priority (so you do not recv a sms) and make it a dependency to your old one.

                Comment

                • Sjeik
                  Junior Member
                  • Sep 2007
                  • 19

                  #9
                  Originally posted by johnnyirons
                  so you should use the first solution i proposed.
                  or, you can set a trigger with that, give it lo priority (so you do not recv a sms) and make it a dependency to your old one.
                  You're right, but I didn't mention that the Zabbix server is sometimes down for half an hour. Your latest suggestion might be the solution, but I'm afraid that its possible that Zabbix will alarm the "host down" trigger earlier than it checks its dependencies. Correct me if I'm wrong...

                  Comment

                  • johnnyirons
                    Junior Member
                    • Dec 2007
                    • 16

                    #10
                    dependencies are checked before firing an action.. is that what you mean?

                    Comment

                    • Sjeik
                      Junior Member
                      • Sep 2007
                      • 19

                      #11
                      Originally posted by johnnyirons
                      dependencies are checked before firing an action.. is that what you mean?
                      Yeah, exactly. But in that case Zabbix might hang itself when a cyclic dependency occurs? I'm testing it right now, I'll let you know if it works. Thanks again!

                      Comment

                      • Sjeik
                        Junior Member
                        • Sep 2007
                        • 19

                        #12
                        Solved!

                        Sadly your solution doesn't work. I think the problem lies within the "proc.num[zabbix_server].nodata(...)" expression. Once this dependency is checked, the nodata function obviously returns "0" (there is data available).

                        I tried to think out of the box, and found the following solution:
                        - there's a zabbix agent script running on the zabbix server which tells me the age of the oldest zabbix_server daemon process
                        - this "item" is incorporated in a trigger "Zabbix server daemon has just started" which activates when the process is less than 300 seconds old
                        - my "host unreachable" checks depend on this trigger

                        Now it works! Thanks for all your help!

                        Comment

                        • johnnyirons
                          Junior Member
                          • Dec 2007
                          • 16

                          #13
                          glad at least we found a solution

                          Comment

                          • Justin Freeman
                            Junior Member
                            • Jan 2009
                            • 18

                            #14
                            Similar problem & my solution

                            Using Zabbix 1.6.2, I had a similar problem, where I wanted a trigger to cause an action that sent out an email when the backup process completed successfully.

                            Here is the original trigger I used:
                            {Template_Backup:backup.status.last(0)}=1 & {Template_Backup:backup.status.nodata(300)}=0

                            My logic: If backup completed in last 5 minutes then trigger and send the email alert. So people know to change tapes etc.

                            This also allowed the trigger to fire for a 5 minute period before resetting itself back to OK status, therefore keeping the Dashboard clear.

                            My backup trigger was working great except when the Zabbix server was restarted, all triggers were set off sending false positives about the backup process, emails were going off everywhere! PANIC!!!

                            After a lot of testing & changes to the trigger function, I finally came to the revelation that when the Zabbix server restarts, it only fires the triggers with a Trigger Value of OK and not PROBLEM.

                            So the solution for me was simply to change the Action to only apply to those triggers having a Trigger Value of PROBLEM.

                            Now whenever the Zabbix server restarts (or crashes) triggers are fired and no emails are sent.

                            Hope this helps someone else out there in Zabbix-land.

                            Comment

                            Working...