Ad Widget

Collapse

Trigger-based event correlation

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • xenograft
    Junior Member
    • May 2021
    • 10

    #1

    Trigger-based event correlation

    Hello all,

    I've been attempting to create a trigger that would warn me when a host loses connection to a SQL database and would also automatically resolve when the connection is restored.
    After reading the documentation, the following article seems to cover exactly what I need: 1 Trigger-based event correlation [Zabbix Documentation 5.0]
    However, despite following the example specified in the article quite accurately, I'm unable to get these problems to auto-resolve.

    Here's the example in the article:


    Here's the trigger I've created:
    Click image for larger version  Name:	trigger.png Views:	0 Size:	42.6 KB ID:	434593

    Tags:
    Click image for larger version  Name:	tags.png Views:	0 Size:	10.7 KB ID:	434594

    Unfortunately, while I can quite easily get the problem to trigger, it never goes away after seeing the "A connection has been established" message in the event log.
    I can confirm that the item that monitors the event log for Information messages definitely received the message successfully and the timestamp is newer than the error message.
    Shouldn't the recovery expression auto-resolve the problem as soon as the expression conditions are met?
    Should I be using the same item for both the Problem and the Recovery expression for this to work?

    I'm on Zabbix server 5.0 LTS with Zabbix agent 3.0.0 running on the host, if that makes a difference.

    Thank you in advance for the advice.

    UPDATE: I've tried to use the same item for both the Problem and the Recovery expression and the issue remains:
    Click image for larger version

Name:	triggernew.png
Views:	402
Size:	46.0 KB
ID:	434596

    Here is the item history:
    Click image for larger version

Name:	itemhistory.png
Views:	394
Size:	21.8 KB
ID:	434597
    Last edited by xenograft; 10-11-2021, 21:12.
  • ISiroshtan
    Senior Member
    • Nov 2019
    • 324

    #2
    Could you please clarify if tags are properly populated when problem raised? And does it properly close alarm if you not use tags for closing while using same item in problem and recovery expressions?

    I did something similar with tags in 4.x and it works fine.

    Comment

    • xenograft
      Junior Member
      • May 2021
      • 10

      #3
      Originally posted by ISiroshtan
      Could you please clarify if tags are properly populated when problem raised? And does it properly close alarm if you not use tags for closing while using same item in problem and recovery expressions?

      I did something similar with tags in 4.x and it works fine.
      Thank you for responding!

      The tags are definitely populated correctly - I get a "SQL" tag and a tag with the database server name next to the problem.
      The event that *should* trigger the recovery expression also contains the database server name, so it should have no problem generating a matching tag (I checked with regex to be sure).

      Removing the tag matching from the trigger unfortunately did not work either...

      Comment

      • ISiroshtan
        Senior Member
        • Nov 2019
        • 324

        #4
        Interesting enough, trigger worked exactly as you want in my Lab on fresh Zabbix 5.0.

        Don't have Windows to pull logs from, so had to replace source data with fake SNMP traps I fed it instead. Here is data based on which it was opened and closed:

        Click image for larger version

Name:	Problem.png
Views:	406
Size:	42.5 KB
ID:	434913
        Click image for larger version

Name:	Recovery.png
Views:	394
Size:	41.7 KB
ID:	434914
        Trigger configuration:
        Click image for larger version

Name:	Trigger.png
Views:	398
Size:	44.5 KB
ID:	434915
        Problem details after it was automatically resolved:
        Click image for larger version

Name:	Problem details.png
Views:	377
Size:	69.2 KB
ID:	434916


        At the same time, I don't see any obvious mistake on your screenshots. Not sure what exactly the issue you facing.

        Comment

        • xenograft
          Junior Member
          • May 2021
          • 10

          #5
          Interesting, but very frustrating!
          It seems that the only difference is that you're capturing data from an SNMP trap, so I can maybe start from there.

          One thing that I was missing from your screenshots was how you triggered the recovery expression.
          Did you just use the same method as for the problem trigger and write "A connection has been established" instead?

          Additionally, does your setup still work in PROBLEM event generation mode "Single"?
          SQL server connection drops create a new error in the event log every time the connection is retried, so I don't want my dashboard to light up like a Christmas tree

          Thanks again for your help!

          Comment

          • ISiroshtan
            Senior Member
            • Nov 2019
            • 324

            #6
            Totally agree that it's frustrating. That is the reason why I spin off fresh Zabbix to test. I did same logic on my different task with Zabbix 4.4, and it works just fine. But I saw no flaw in your screenshots, so wanted to make sure it works in 5.x as well

            In regards to recovery - oooooooh, I posted 2 screenshots of problem, not one problem and one resolve messages. Silly me. Pretty much as you stated. I just replace string inside of SNMP trap to represent either PROBLEM or an OK messages. Took exact text from item history you shown in your original post.

            Click image for larger version

Name:	Recovery.png
Views:	458
Size:	43.5 KB
ID:	434930

            In single problem generation mode it works just fine as well. Also, instead of using single mode event generation, which pretty much would fail to create a 2nd problems for you if 2 or more servers will have issue same time, you can add global event correlation rule on top of trigger setup to clear duplicates(also first need to figure why trigger fails).

            If you can - share the original recovery log message from event log. In my tests I manually typed both regexp in Zabbix and text in SNMP Trap. In your case they come from different sources, so maybe there is some minor differences. I'm really at loss over your problem atm.

            Comment

            Working...