Ad Widget

Collapse

How to change failed duration for problem generation from 60 seconds to 180 seconds?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mindspray
    Junior Member
    • Apr 2021
    • 3

    #1

    How to change failed duration for problem generation from 60 seconds to 180 seconds?

    How do I change the configuration so that instead of generating a problem after 60 seconds of a failed state I generate a problem after 180 seconds?

    I am experimenting with and learning Zabbix, and I've got monitoring and alerting setup for a pfSense instance. I am receiving email alerts such as "IPsec Tunnel 1 (test-tunnel-1) Not Connected" after Zabbix detects that the tunnel is not connected for 60 seconds.

    How can I modify the configuration of this so that a problem is not generated until the tunnel is not connected for 180 seconds?
  • mindspray
    Junior Member
    • Apr 2021
    • 3

    #2
    Thank you cyber for the tip. I found the trigger prototype expression and it looks like:
    Code:
    {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},disabled].last()}=0 and {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},status].last()}<>1 and {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},status].last()}<10
    If I understand this correctly, the expression is checking, in this order...
    If the value for disabled is 0 (i.e. trigger is not disabled)
    If the last status does not equal 1 (phase 1 is not established)
    If the last status is less than 10 (phase 1 is either Down (0), Established(1), or Connecting(2) per the value maps. 10 = Down on CARP Secondary).

    Nothing I am seeing seems to point to "Once the Phase 1 status is not-Established, how long should Zabbix wait before changing the status and showing an alert."

    Am I looking at the right thing? Or maybe reading this incorrectly?

    Comment

    • tim.mooney
      Senior Member
      • Dec 2012
      • 1427

      #3
      There are three things you want to think about, to help you better understand what's going on.
      1. How frequently item data is collected. That's set at the item level.
      2. what trigger functions you use in your problem and/or recovery expression AND how many data points those functions are evaluating to decide if there is a problem. As soon as your problem expression evaluates to true, Zabbix will generate a PROBLEM event.
      3. How your actions are written, including whether there are escalations and "steps" in the action(s), determine when Zabbix tries to alert you or remediate the problem directly (depending upon your actions). Note that the trigger expression you've written might detect the problem immediately, but if your actions are set up to delay for a certain period of time before they're executed, you might not know about it right away.

      You asked to "change the configuration so that instead of generating a problem after 60 seconds of a failed state I generate a problem after 180 seconds". To do that, you have to make changes to your trigger expression.

      Your existing trigger expression is using the last() function only, which means that it's deciding whether there is a problem or not based upon just the last (most recent) data point for 'pfsense.value[ipsec_ph1,{#IKEID},disabled]' and 'pfsense.value[ipsec_ph1,{#IKEID},status]'.

      You're probably collecting the item data every 60 seconds. Collecting it every 60 seconds and only looking at the most immediate data point to decide if there's a problem matches your description.

      Hopefully the direction cyber was pointing you in is now clear: if you collect the data every 60 seconds but you don't want Zabbix to treat it as a problem until there has been a failed state for 180 seconds, then you need to modify your trigger expressions so that it's looking at not just the most immediate data point, but instead the 3 most recent datapoints OR (using a different syntax) the last 180 seconds.

      Take a look at the documentation for triggers, and pay careful attention to the function parameters for each function. Many of them support either a "number of datapoints" (such as min(#3)) or a "time period" (like min(180). The last() function works differently, though, so be aware that you can't make last() look at the last 3 values. Since you're showing the older trigger function syntax, I'll assume you're on 5.0 or earlier, and link the 5.0 docs:

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        Look at trigger expression and try to analyze, what it does ... and then change it according to your needs....

        Comment

        • mindspray
          Junior Member
          • Apr 2021
          • 3

          #5
          I appreciate the help and pointers here. Got sidetracked but came back to this and rewrote the trigger prototype to get what I wanted. Here is the original expression:
          Code:
          {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},disabled].last()}=0 and {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},status].last()}<>1 and {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},status].last()}<10
          I used the https://www.zabbix.com/documentation...ctions/history documentation to rewrite it like this:
          Code:
          {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},disabled].last()}<>1 and {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},status].last()}<>1 and {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},status].last(#2)}<>1 and {Template pfSense Active IPsec:pfsense.value[ipsec_ph1,{#IKEID},status].last(#3)}<>1
          I tested this by administratively downing one of the tunnels for 90 seconds (long enough to guarantee at least one evaluation and brief enough to prevent three evaluations) and though I could see the status as down in Zabbix, no alert was generated because the change did not persist across 3 consecutive evaluations (180 seconds). Thanks again for the assistance!
          Last edited by mindspray; 04-07-2022, 18:14.

          Comment

          Working...