Ad Widget

Collapse

Alarm when many hosts goes down

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Blinkiz
    Junior Member
    • Mar 2011
    • 27

    #1

    Alarm when many hosts goes down

    Hi
    Trying out Zabbix for the first time.
    Is it possible to define a trigger that will go of if more than 5 hosts goes down?


    Usecase:
    50 switches.
    Ping item exist pinging each switch every 60 second.
    Trigger will go of if the switch has not been responding the last 5 times (5 minute downtime)
    If more than 5 switches does not respond within 60 second, I want alarm directly after one minute.
  • JBo
    Senior Member
    • Jan 2011
    • 310

    #2
    Hi,

    Originally posted by Blinkiz
    Hi
    Trying out Zabbix for the first time.
    Is it possible to define a trigger that will go of if more than 5 hosts goes down?
    Yes.

    Originally posted by Blinkiz
    Usecase:
    50 switches.
    Ping item exist pinging each switch every 60 second.
    Trigger will go of if the switch has not been responding the last 5 times (5 minute downtime)
    If more than 5 switches does not respond within 60 second, I want alarm directly after one minute.
    Create an host group with all switches.
    Create an aggregated item that will count the number of switches that are down.
    Set a trigger on this item.

    Hope this helps
    JBo

    Comment

    • untergeek
      Senior Member
      Zabbix Certified Specialist
      • Jun 2009
      • 512

      #3
      Hmmm. One possibility that strikes me immediately would be a Zabbix Aggregate item or a calculated item.

      You'd have to add all of the potential items together, then do the math against it in the trigger, e.g.
      {uber_host:calculated_item.last(0)}<(total number of hosts - 5)

      Or something of that nature.

      I agree, it is not an elegant solution. It would work, though.

      Comment

      • kewan
        Member
        Zabbix Certified Specialist
        • Apr 2011
        • 33

        #4
        or you could try something like (building on untergeek's aggregated item):
        {uber_host:calculated_item.nodata(60)}>5
        cheers,

        Stefano

        Comment

        • untergeek
          Senior Member
          Zabbix Certified Specialist
          • Jun 2009
          • 512

          #5
          I thought nodata(60) would only return a 1 or a 0…

          1 (True) if there was no data,
          0 (False) if data was received.

          Would that actually work with an aggregated item? I wouldn't think it would.

          That's why if they all matched OIDs or something, you could do grpsum and then see if it was less than the total - 5. Otherwise, it's a MASSIVE calculated item.

          Comment

          • JBo
            Senior Member
            • Jan 2011
            • 310

            #6
            Hi,

            OP said that he uses ping with a 60 s. period, so nodata doesn't apply.

            I assume that he uses icmpping.
            The aggregated item could be:
            Code:
            grpsum["group","icmpping","last","0"]
            and the trigger:
            Code:
            {host:grpsum["group","icmpping","last","0"].last(0)}<N
            where N is (number of hosts - 5)

            Regards,
            JBo

            Comment

            • kewan
              Member
              Zabbix Certified Specialist
              • Apr 2011
              • 33

              #7
              right, my bad, I was thinking of doing something like this:

              aggregated item:
              grpsum["host group", "ping", "nodata", "300"]

              that would give you the number of unresponsive switches in the last five minutes

              and the trigger:
              {uber_host:aggregated_item.last(0)}>5

              but my brain kind of mixed up my idea with the other one

              anyway, I think you could do it one way or the other and it would be fine

              cheers,

              Stefano
              Last edited by kewan; 20-04-2011, 16:56. Reason: nodata with a 5 minutes downtime instead of 60 seconds

              Comment

              Working...