Ad Widget

Collapse

Setting Triggers

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ccoggins
    Junior Member
    • Jan 2019
    • 8

    #1

    Setting Triggers

    Good afternoon,

    I am new to Zabbix (3.2.11) (Working on this from the Browser Admin Panel) and I am trying to support something I did not set up and I am not quite grasping how it works.
    I have 3 triggers setup for a service that is running on servers and also an alert that checks if something us running/listening on a specific port.

    If this service and program are not running I get emails and text messages letting me know it is down - then there is also an action attached to them saying if this trigger is true -restart the service and program.
    So then a few mins later I get an OK email and text.

    This is happening far too often lately so I want to change the triggers that are already here so they do not email/text. However, I want to create 2 new triggers that will email/text me if they have been down for a longer interval than what they are currently set to.

    I was hoping someone could help me decipher how these are working and how to create a new trigger and what expressions I should use for longer intervals. I do not understand exactly the timing on these or how often it checks in or I should say how long these have been "down" for that Zabbix sends out the alert.

    *DISASTER {Template App 2016 DBSync Service:net.tcp.service[tcp,,2507].sum(#5)}=0
    *WARNING {Template App 2016 DBSync Service:net.tcp.service[tcp,,2507].sum(#3)}=0
    *DISASTER {Template App 2016 DBSync Service:service.info[TTDBSyncServer, state].sum(#3)}>12

    I want to keep these alerts how they are as it is set up to auto restart the service and application when this is triggered.
    However, I want to create new triggers that only have 6 mins of being "down" (or if it's based on intervals make the interval range longer?) will it send out an email and text.

    I tried the .sum(300)}=0 for a 5 min interval, but as soon as I did this I got flooded with alerts saying nothing was working when in fact they were.

    Thanks for any help you can provide.
    Last edited by ccoggins; 03-09-2019, 21:40.
  • brunohl
    Senior Member
    Zabbix Certified Specialist
    • Mar 2019
    • 215

    #2
    How are your itens configured? How often does it gets updated?

    You could also use action escalations, so your first action would be try to restart the service. After sometime your action would be to send you a message.

    Comment

    • ccoggins
      Junior Member
      • Jan 2019
      • 8

      #3
      Originally posted by brunohl
      How are your items configured? How often does it gets updated?

      You could also use action escalations, so your first action would be try to restart the service. After sometime your action would be to send you a message.
      The 2 items for the triggers look like this:

      net.tcp.service[tcp,,2507]
      service.info[TTDBSyncServer, state]

      both have intervals set to 60 seconds



      Originally posted by splitek
      (Zabbix 3.2 is unsupported - suggesting to upgrade)

      From doc:
      You may use the prefix # to specify that a parameter has a different meaning:
      sum(600) Sum of all values in no more than the latest 600 seconds
      sum(#5) Sum of all values in no more than the last 5 values
      So you can't replace sum(#5) in to sum(300).

      You can try to build a more complicated trigger made of two conditions.
      First condition is your old trigger.
      Second condition:
      Let say you interval is set to 1 minute - every minute you gather data for the key net.tcp.service gather and the trigger {Template App 2016 DBSync Service:net.tcp.service[tcp,,2507].sum(#5)}=0
      Second condition will check data collection over the last 6 minutes and whether the data has changed. In 6 minutes we should get 6 values:
      {Template App 2016 DBSync Service:net.tcp.service[tcp,,2507].count(6m,0)}>5

      Our trigger:
      {Template App 2016 DBSync Service:net.tcp.service[tcp,,2507].sum(#5)}=0 and {Template App 2016 DBSync Service:net.tcp.service[tcp,,2507].count(6m,0)}>5

      BTW. I only wonder should it be ">5" or ">=6" or maybe ">=5"
      I could give that option a try as well.

      I guess what I am failing to understand is what does .sum(#5)}=0 or .sum(#3)}>12 actually mean broken down?
      I think if I could understand what each part means I could build a trigger to meet my needs.

      Comment

      • ccoggins
        Junior Member
        • Jan 2019
        • 8

        #4
        Originally posted by splitek
        .sum(#5) getting five last values and sum them. You must remember here that it get last 5 values no mather what.. let's say we collected data:
        (first column is time of data grab in "h:mm" format, second - value of that data)
        1:00 - 1
        1:01 - 1
        1:02 - 1
        at 1:03 we have problems (ie. host not responding, broken net and so on) and data are not collected at all, after 12 minutes data collection started to work again
        1:15 - 1
        1:16 - 1

        sum(#5) is 5, because we sum last five values we have, no matter there is a gap between 1:03/1:15
        sum(5m) is 2, because we sum all values from last 5 minutes and only two is in that range (last 5 minut range is: 1:11 to 1:16).
        Thank you for the help and explanation.
        I was tinkering with this yesterday and I think I got it how I want it now.

        Comment

        Working...