Ad Widget

Collapse

prediction trend forecast in V3 is it ready for prime time

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cevyne
    Junior Member
    • Jul 2016
    • 2

    #1

    prediction trend forecast in V3 is it ready for prime time

    I recently read http://blog.zabbix.com/staying-ahead...ediction/4534/

    suggesting I can alert for a trend of disk filling for example. I've been away from implementing Zabbix for a couple years so I'm a little rusty.
    But I cannot find enough details to actually make this work. Does someone have a good example , step by step so I can get my hands around this?
  • akincer
    Junior Member
    • Jul 2016
    • 11

    #2
    I've been working on this and reading documentation too. To say the documentation is thin and ambiguous would be generous. No offense to whomever is writing it, but come on -- that's pretty uninspiring to say the least.

    I took this from the PDF that digs into the details to try to create a trigger. I'm trying to work out what limit should be. I also changed the window to 2 hours instead of 1 hour:

    Code:
    ({host:item.forecast(1h,1h,1h,,avg)} - {host:item.avg(1h)}) *
    ({host:item.forecast(1h,1h,1h,,avg)} - {host:item.avg(1h)}) /
    {host:item.avg(1h)} / {host:item.avg(1h)} < 0.01 and
    {host:item.forecast(1h,,1h)} > limit
    I replaced "host" with the name of a host defined. I replaced "item" with "vfs.fs.size[C:,free]" and the third parameter to "forecast" from 1h to 2h. I think this pushes the warning window out to 2 hours. What I do not understand is "limit". I can't find any reasonable documentation on what forecast() is returning. The documentation literally says nothing about what the return value is which for the documentation of a function that in and of itself is pretty strange.

    I think I might have something sort of working if I could figure out what "limit" should be.

    If you're wondering how to apply this to a host, click Configuration -> Hosts and then click on the host you wish to create a predictive trigger for. Then click Triggers -> Create Trigger (top right). Modify the above and use some number for "limit" that makes sense (good luck with that) and paste the code into the Expression field. Name it, decide if you want to receive multiple alerts and assign it a severity. Save it and then you should be good. The trick is:
    1. Figuring out how to adjust the window to something that makes sense for your environment.
    2. Figuring out what limit is and should be.
    3. Figuring out how to create some generic trigger to apply to multiple hosts in a template.


    I haven't gotten to #3 yet because I can't figure out what to do with "limit" yet nor am I sure about how to adjust the window properly. Good luck.

    Comment

    • cevyne
      Junior Member
      • Jul 2016
      • 2

      #3
      I considered this an important reason to upgrade. Now not so much.

      Comment

      • akincer
        Junior Member
        • Jul 2016
        • 11

        #4
        I don't think it's necessarily the feature that's lacking, but rather the documentation. Although realistically I'm guessing it should be possible to have a few canned formulas with settings that could be modified to fit your desired window of prediction.

        Hopefully they flesh out the documentation a little better. There is one example I might give a go in the documentation to see how that works.

        Comment

        • jan.garaj
          Senior Member
          Zabbix Certified Specialist
          • Jan 2010
          • 506

          #5
          Prediction is not a free lunch. You will need some minimal math background. Zabbix company has provided nice paper about it - http://zabbix.org/mw/images/1/18/Prediction_docs.pdf
          Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
          My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

          Comment

          • akincer
            Junior Member
            • Jul 2016
            • 11

            #6
            It's not a matter of being a "free lunch" but rather lacking documentation on how to make it work at all. If you spend even a minute thinking about it, you know that there simply wouldn't be a "one size fits all" accurate formula for trend prediction.

            But even getting something working at all, however imperfect it might be, is about as clear as mud. I've read the documentation several times and tried to extrapolate how to make use of the information there and it's not immediately obvious in any fashion I would consider reasonable on how to do so for someone that doesn't know this product inside and out. IMHO, that shouldn't be a necessary threshold for someone to make at least minimal usage of a product feature.

            I should, at a minimum, be able to read the documentation and configure a linear prediction trend alert and have it accurately extrapolate a disk full event. I actually did exactly that by having a Powershell script create text files at varying rates over various times that would guarantee a disk full event within a 24 hour period. My best attempts to follow the documentation did not produce anything that was remotely useful at capturing this.

            TL;DR; -- I'm fine with heavy math being required to get super accurate trend prediction, but the problem is that even with basic use cases the documentation fails to make it clear how to implement this feature.

            Comment

            • glebs.ivanovskis
              Senior Member
              • Jul 2015
              • 237

              #7
              What's wrong with the first example here for a start?

              Comment

              • akincer
                Junior Member
                • Jul 2016
                • 11

                #8
                I'll go back and look again, but I'm pretty sure I tried that and it didn't work.

                Comment

                • Abyss
                  Junior Member
                  • Oct 2016
                  • 9

                  #9
                  Hello, has anyone found an answer to this yet??

                  I'm running a Zabbix 3.2 server and configuring the predictive triggers as follows:

                  {hostname:vfs.fs.size[/,free].timeleft(7d,,102745398)}<1h
                  {hostname:vfs.fs.size[/,free].forecast(7d,,1h)}<102745398
                  From what I understood of the confusing documentation of these triggers, they're supposed to work like this:

                  The "timeleft" trigger should use the last 7 days of historic data to predict the free space and warn me 1h before the threshold is gonna be reached (102745398Kb in this case, which corresponds to 10% of free space).

                  I don't think I understand the "forecast" trigger at all, but I configured it anyway to see how it behaves.

                  Initially I had configured both triggers with "pfree" instead of "free", like this:

                  {hostname:vfs.fs.size[/,pfree].timeleft(7d,,10)}<1h
                  {hostname:vfs.fs.size[/,pfree].forecast(7d,,1h)}<10
                  It didn't work as well.

                  Am I doing something wrong in the triggers, or do they need a specific configuration done previously for them to work? All I did was to configure the triggers for the Item in Zabbix and nothing else. Also, I'm not very confident on my understanding of them, could someone explain them to me a little bit better?

                  Thanks!

                  Comment

                  • glebs.ivanovskis
                    Senior Member
                    • Jul 2015
                    • 237

                    #10
                    Originally posted by Abyss
                    It didn't work as well.
                    How do you expect them to "work"?

                    Your configuration looks good to me. Can you show the graph of your vfs.fs.size[] for 7 days?

                    Also, to learn how predictive functions behave it is worth using them in calculated items for a start. So you can see the numbers in latest data and on the graphs.

                    Comment

                    • Abyss
                      Junior Member
                      • Oct 2016
                      • 9

                      #11
                      How do you expect them to "work"?
                      I expected the trigger bellow to warn me 1h before the disk goes below 10% space, but virtually it does nothing, it doesn't appear in the "Last issues" in zabbix's Dashborad nor anywhere else I looked:

                      Code:
                      {hostname:vfs.fs.size[/,pfree].timeleft(7d,,10)}<1h
                      The host I'm using for testing punctually goes below 10% space every night at 03:00AM due to a routine that takes some disk space and frees it after it is run.

                      Here it is, a 7d graph of the disk space: http://puu.sh/rAlqj/96614a2719.png

                      As you can see, this is the perfect scenario for testing since disk usage on this machine is very predictable.

                      Also, to learn how predictive functions behave it is worth using them in calculated items for a start. So you can see the numbers in latest data and on the graphs
                      Calculated items? The thing I'm mostly interested in is seeing the predictive data in the graphs, so I can have an idea of when our servers are running out of space. Could you explain to me what's wrong with my triggers or how to write one properly?

                      Comment

                      • glebs.ivanovskis
                        Senior Member
                        • Jul 2015
                        • 237

                        #12
                        It is predictable for a human.

                        So, with a trigger you have you tell Zabbix to take data for last 7 days and draw a straight line (default fit parameter is linear) through all these 7 days. Let's be honest, straight line is a very poor approximation for your "predictable" data. From the looks of your graph I would say that "best straight line possible" is very close to horizontal line. And horizontal line can't go below 10%, at least it is very unlikely to go below 10% in one hour.

                        To "catch" sudden dips at 3 a.m. you should use a shorter interval, say, 1h or 2h or 30m.
                        Code:
                        {hostname:vfs.fs.size[/,pfree].timeleft(1h,,10)}<1h
                        In a calculated item syntax it will be:
                        Code:
                        timeleft("hostname:vfs.fs.size[/,pfree]",7d,,10)

                        Comment

                        • Abyss
                          Junior Member
                          • Oct 2016
                          • 9

                          #13
                          Originally posted by glebs.ivanovskis
                          It is predictable for a human.
                          In a calculated item syntax it will be:
                          Code:
                          timeleft("hostname:vfs.fs.size[/,pfree]",7d,,10)
                          This code above is not accepted by Zabbix as a trigger when I try to add it.

                          I changed the other trigger I had to use a 1h interval, I'll see how it behaves throughout next night and then I'll give you feedback, thanks!

                          Comment

                          • glebs.ivanovskis
                            Senior Member
                            • Jul 2015
                            • 237

                            #14
                            Originally posted by Abyss
                            This code above is not accepted by Zabbix as a trigger when I try to add it.
                            It's not a trigger, it's a calculated item!

                            Comment

                            • Abyss
                              Junior Member
                              • Oct 2016
                              • 9

                              #15
                              Originally posted by glebs.ivanovskis
                              It's not a trigger, it's a calculated item!
                              Even so, when I use this as a Key when adding an item it gives me the following error:

                              Code:
                              Invalid key "timeleft("hostname:vfs.fs.size[/,pfree]",7d,,10)" for item "Calculated free disk space on $1 (percentage)" on "hostname": incorrect syntax near "("hostname:vfs.fs.size[/,pfree]",7d,,10)".
                              Could you provide a bit more insight on how to do this?

                              Thanks a lot!

                              Comment

                              Working...