Ad Widget

Collapse

Zabbix delayed trigger when there is a problem with a HA Proxy backend

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • maherenthusiast
    Junior Member
    • May 2025
    • 6

    #1

    Zabbix delayed trigger when there is a problem with a HA Proxy backend

    Dear Support Community,

    At our organisation we're using Zabbix (5.2) to monitor multiple environments both frontends and backends. We are currently struggling with Zabbix and HA Proxy and using the Zabbix HA Proxy templates.

    For some reason when there is a problem now on the HA Proxy backends the problem only gets created after 40+ minutes. When viewing the HA Proxy discovery and latest data we can see that the status of this backend is in fact DOWN so this part seems to work as expected.

    The interesting part is that once the trigger is created after the 40+ minutes have passed it does not get long to resolve once the backend is up again, this takes about a couple of minutes, 5 at most.

    This is obviously concerning as we won't know if a backend is DOWN and the Problem notification will be delayed by over 40+ minutes. We noticed this since one of our dashboards stopped reporting this problem directly with a trigger label icon.

    This is how the trigger is setup:

    Trigger should initiate once 5 checks have been performed where the server is down. The check is decided by the parent which is HAProxy by HTTP

    The preprocessing step looks like this:
    Click image for larger version

Name:	image.png
Views:	181
Size:	114.0 KB
ID:	503569

    ​Trigger prototype below, it is set to "create enabled" and discovery is also set to yes.



    HAProxy by HTTP: Get stats is set to perform an interval check every minute:


    In Latest data we can also see this. and the backend on the top A_APP-11-NAME-WIS-Out is also detected every minute.

    However for some reason the trigger does not seem to work or it is heavily delayed. As far as we know nothing has changed recently and we are planning an upgrade, however as it is right now it is not working as expected and we're unable to figure out why also. Other triggers seem to work fine for the HA Proxy, though unsure if they're delayed or not but we do get notified when certificates are about to expire for example.

    Any help or direction is much appreciated.

    Many thanks.
    Maher
    Last edited by maherenthusiast; 28-05-2025, 14:35.
  • Answer selected by maherenthusiast at 30-05-2025, 13:00.
    cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4806

    If you remove "discard unchanged" you will get that thing triggered in five minutes. If you leave that in place and change trigger to fire after 2 fails instead of 5, it will inform you in 10 minutes instead of 40...

    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4806

      #2
      Originally posted by maherenthusiast
      Trigger should initiate once 5 checks have been performed where the server is down.
      "Discard unchanged with hearbeat" -> 10 minutes...
      First fail ... 10m break... second fail.. 10m break... .third failure.. 10m break... fourth failure ... 10m break... fifth failure... lets count now those 10m breaks... 4 of them... 40 minutes...

      Comment

      • maherenthusiast
        Junior Member
        • May 2025
        • 6

        #3
        Hi cyber

        Thank you for your response, Which failure are you also referring to when you say first fail? You mean the first fail of the preprocessing?

        From what I find here it is part of the default Zabbix haproxy template. Do you recommend we change the value in this case? What should it be changed to?

        As mentioned on my post, in the "latest data" it does show that the backend is down?
        Zabbix Official Templates. Contribute to fernandobrunelli/Zabbix-Templates development by creating an account on GitHub.
        Last edited by maherenthusiast; 30-05-2025, 10:45.

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4806

          #4
          failure as a failed check... I assume check is performed with 1m interval... at one moment it fails, records it, then keeps cehcking, finds things failed, but as "discard unchanged" is involved, it gets recorded again only after 10 minutes... as you say "Trigger should initiate once 5 checks have been performed where the server is down" then it takes a while to gather those 5 recorded failures...

          Comment

          • maherenthusiast
            Junior Member
            • May 2025
            • 6

            #5
            Originally posted by cyber
            failure as a failed check... I assume check is performed with 1m interval... at one moment it fails, records it, then keeps cehcking, finds things failed, but as "discard unchanged" is involved, it gets recorded again only after 10 minutes... as you say "Trigger should initiate once 5 checks have been performed where the server is down" then it takes a while to gather those 5 recorded failures...
            Thanks for clarifying!

            Yes it shows this every 1 minute that the backend is down. What do you recommend we do in this case? What is the best approach to get it to work and create a problem.

            Comment

            • cyber
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • Dec 2006
              • 4806

              #6
              If you remove "discard unchanged" you will get that thing triggered in five minutes. If you leave that in place and change trigger to fire after 2 fails instead of 5, it will inform you in 10 minutes instead of 40...

              Comment

              • maherenthusiast
                Junior Member
                • May 2025
                • 6

                #7
                Thank you for your help. We have changed this and now the problem gets created as expected. We are still unable to explain why this worked in the past but now does not seem to work as the template was created. But we're happy as things are now.

                Comment

                Working...