Ad Widget

Collapse

Zabbix falsely report a problem is resolved.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • amak
    Junior Member
    • Mar 2023
    • 3

    #1

    Zabbix falsely report a problem is resolved.

    Hi,
    I am using Zabbix server version 6.29 on RHEL8.7.
    I setup a Ubuntu 20.04 host (client) using zabbix passive agent 4.0.17

    I created a Problem by filling up the / partition to 97% on the Ubuntu client, I received an alert from Zabbix server saying the filesystem is over 90% full, so far so good.
    I did not resolve the problem deliberately and find out the problem was resolved automatically after 56m4s. Email was send out from Zabbix server saying problem resolved. I check again the Ubuntu host and can confirm the filesystem is still at 97% full.
    From Zabbix GUI, the Dashboard, "Current problems" window, has no entry of the problem, yet the problem still exist. Any one experience the same?

    I am puzzle what is causing the behaviour, any advise on how to debug what is going on.

    Regards,
    Andrew Mak
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    First thing to do, would be to look on your trigger expression and try to understand how it works and how your data fits there...

    Comment

    • amak
      Junior Member
      • Mar 2023
      • 3

      #3
      The host is monitor using the native template Linux by Zabbix agent
      "Trigger action" --> "Send email' is defined as having the following conditions
      A and (B or C or D), where
      A. Severity is greater than or equal to High
      B. Host is in group Linux Servers
      C. Host is in group Management Servers
      D. Host is in group Production

      Interesting enough, after the filesystem is filled with more than 90%; and no more action is done on the host; a Problem email will be send, then 20 minutes or so later, the problem is resolved even the filesystem is still > 90% fill. Confirmed by receiving a Resolve email and problem is marked as Resolve in GUI. Then the Problem is raised again, another Problem email send, then resolve again in less than 5 minutes. Then remain quiet.

      Any comment on what caused the Resolve, Problem, Resolve email to be sent despite the filesystem remains at more than 90% fill all these time.

      Regards
      Andrew

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        You displayed here our action config, but I suggested to look into your trigger config... Trigger config is the one that fires and also resolves your problem...
        So .. try again...

        Comment

        • amak
          Junior Member
          • Mar 2023
          • 3

          #5
          As mentioned, the host is monitored by the template "Linux by Zabbix agent", filesystem usage is under "Discovery rules" --> "Mounted filesystem discovery", the Trigger prototypes is
          Name: {#FSNAME}: Disk space is critically low
          Operational data: Space used: {ITEM.LASTVALUE3} of {ITEM.LASTVALUE2} ({ITEM.LASTVALUE1})
          ​Expression:
          last(/Linux by Zabbix agent/vfs.fs.size[{#FSNAME},pused])>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and ((last(/Linux by Zabbix agent/vfs.fs.size[{#FSNAME},total])-last(/Linux by Zabbix agent/vfs.fs.size[{#FSNAME},used]))<{$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"} or timeleft(/Linux by Zabbix agent/vfs.fs.size[{#FSNAME},pused],1h,100)<1d)​

          I believe this raise and resolve problem.

          Comment

          • sanq
            Junior Member
            • Apr 2021
            • 15

            #6
            I'm working on the same problem. IS THERE Any process?

            Comment

            • ISiroshtan
              Senior Member
              • Nov 2019
              • 324

              #7
              So that trigger has 2 ways to fire:

              1. Disk space usage % higher than {$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} AND free disk space is below {$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}" .
              This can be tricky for big disks because '{$VFS.FS.FREE.MIN.CRIT}' default value is 5 GB. So As long as your disk has more than 5 GB of free space - this condition will not fire, no matter how low % wise free space goes

              2. Zabbix calculates the rate of space usage over last hour and estimates when will it reach 100% usage. If Zabbix calculated disk to get full in less than 1 day - it will fire said alert.

              So I'm pretty sure during tests more than 5GB space left, but disk was filled pretty fast, triggering 2nd condition to fire the alert. Then disk was left to be with nothing written to it -> disk fill rate dropped to (or close to) 0 -> zabbix calculated that condition #2 is no longer valid based on 1h disk usage trend -> alert was closed.

              Comment

              Working...