Ad Widget

Collapse

Zabbix 5.0. Problem resolved but it's not true

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Sara.Art
    Member
    • Jun 2020
    • 52

    #1

    Zabbix 5.0. Problem resolved but it's not true

    Hi! I recently upgraded from Zabbix 4.4 to 5.0. Triggers seem to act in a weird way. E.g. I have servers with very little free space on disk (I mean below 10%-5%)): Zabbix 5.0 finds the problems, but soon it tells me the problem has been resolved in 3 minutes -so the problem is not marked even if it's still present- (latest data has the correct readings).
    Anyone has the same issue? How can I fix it? I thank You all in advance. Have a nice day, Sara
  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #2
    That sounds like a problem with how the trigger expression or perhaps recovery expression is configured.

    Can you post the expression that's being used for the trigger in question, and, if there's a recovery expression, the recovery expression too?

    Comment

    • Sara.Art
      Member
      • Jun 2020
      • 52

      #3
      Hi Tim, first of all I thank You for your reply. The trigger expression in Zabbix 5 is {Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},pused].last()}>{$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"} and (({Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},total].last()}-{Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},used].last()})<10G or {Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},pused].timeleft(1h,,100)}<1d), while in 4.4 version was different (and worked fine). So I tried to use the old one instead (eg. {Test Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},pfree].last(0)}<10) and the problem is correctly showed (and not resolved). Could You tell me why? Grazie! Sara

      Comment

      • tim.mooney
        Senior Member
        • Dec 2012
        • 1427

        #4
        Thanks for providing information on what triggers you're using. Have you set a custom value for the {$VFS.FS.PUSED.MAX.WARN} macro for this particular volume, or is it at the template default of 80 (percent)?

        To be able to know for certain why it's alerting and then immediately clearing, it would also be necessary to see a set of the readings for % used (pused) and %free (pfree) for that volume during a period when it alerted and then cleared 3 minutes later. Even without knowing those values, the first thing I would suspect might be causing this is the last part of the new trigger:

        Code:
         or {[URL="http://tech2srv4.tech2.it/disc_prototypes.php?form=update&itemid=29523&parent_discoveryid=29509"]Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},pused][/URL].[B]timeleft([/B]1h,,100[B])[/B]}<1d)
        That's a predictive trigger: https://www.zabbix.com/documentation...ers/prediction

        It's looking at the % used for that volume (pused) over the last hour (1h) and trying to calculate how long it would take for the % used to reach 100. If the timeleft() from the prediction is that it would reach 100% used in less than 1 day (< 1d), then it's a problem. The predictive triggers in Zabbix are a really nice feature, but for volumes that have unusual usage patterns, the predictions can sometimes be wrong. For example, if some process periodically writes a large file to the volume (or several smaller files) and then something else later on cleans up those files, the 1 hour period of historical growth that the trigger is using to make its prediction may not be enough for it to accurately predict growth. My site has volumes like that, where a volume's % used is relatively constant most of the day, but shortly after midnight there's an SQL export that writes a lot of data to the volume in a short period of time. If a timeleft() predictive trigger looks only at the 1 hour period where SQL is being exported, the % used is growing very quickly during that hour, so the prediction would be that the volume will fill very soon.

        Another possibility, that's completely unrelated to the predictive trigger, is that you're getting "flapping" ( https://blog.zabbix.com/no-more-flap...mart-way/1488/ ) because there's almost exactly 10G available on that volume, and the utilization is periodically going above and then below that value. That would cause this part of the trigger to fire and then clear:

        Code:
        ({[URL="http://tech2srv4.tech2.it/disc_prototypes.php?form=update&itemid=29522&parent_discoveryid=29509"]Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},total][/URL].[B]last([/B][B])[/B]}-{[URL="http://tech2srv4.tech2.it/disc_prototypes.php?form=update&itemid=29521&parent_discoveryid=29509"]Template OS Windows by Zabbix agent:vfs.fs.size[{#FSNAME},used][/URL].[B]last([/B][B])[/B]})<10G
        Is the volume in question at approximately 10 G free?

        I'm a little surprised that there isn't a recovery expression with this template, that prevents the possibility of flapping when the volume is very near 10 Gig free.

        The latest Zabbix templates are in general huge improvements over the old templates, and they use macros in powerful ways that make it easy to customize some things without having to modify the template itself. They are relatively new, though, and it's possible that this template was not tested in an environment that has a usage pattern just like your environment.

        Comment

        • Sara.Art
          Member
          • Jun 2020
          • 52

          #5
          Hi Tim, I forgot to tell that the agent installed on host was 4.4. So I upgraded to 5.01 and now it works fine with Zabbix 5 server. Alas, I'll have to uninstall Zabbix agent 4.4 first and then install Zabbix Agent 5 via command line (and edit the conf file again) for all the monitored hosts, and it'll be a long job. Will it be released an msi installer soon? (in hope to upgrade the agent without uninstalling and reconfiguring).
          I thank You very much for your time and your replies :-) I also noticed that with agent 4.4 on host, Zabbix 5 seems to mark as problem (and rightly not resolving) "C: disks" only.
          Have a nice day! Sara

          Comment

          • tim.mooney
            Senior Member
            • Dec 2012
            • 1427

            #6
            I don't know about the MSI. I thought there already was an MSI for some older agent versions, but perhaps I'm mistaken.

            My site uses a configuration management system on Linux to handle deploying packages, local configuration, and service restart for the agent when there's an updated version we want to deploy. Perhaps something similar (desired state configuration? Or something similar) would be useful for your Windows clients.

            Comment

            • Sara.Art
              Member
              • Jun 2020
              • 52

              #7
              Hi Tim, there are some MSIs for older agents (e.g. 4.4).
              Indeed, a configuration management system will be really useful. What would You suggest? Thank You very much, Sara

              Comment

              • tim.mooney
                Senior Member
                • Dec 2012
                • 1427

                #8
                Configuration management is a huge topic, and different organizations have very different needs. The best advice I can give is to research them thoroughly and once you've identified one or two that look like the best possible candidates for your environment and needs, evaluate the software carefully before you commit to it. Just like any other enterprise product, once you have it in place and it's widely used in your environment, it can be very difficult to switch to some other product, even if the other product meets your needs better.

                Good luck!

                Comment

                • Sara.Art
                  Member
                  • Jun 2020
                  • 52

                  #9
                  :-) Thanks!

                  Comment

                  Working...