Ad Widget

Collapse

Recovery checks interval

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • arikin
    Junior Member
    • Aug 2011
    • 27

    #1

    Recovery checks interval

    Sorry. Really did try to find the answer to this.

    Q: Why does the status change to OK BEFORE the web scenario's next Update Interval?

    The Update interval is set for 15 minutes. But the OK recovery notification comes in within 5-10 minutes after the original PROBLEM notification.

    Q: Does Zabbix increase the update interval until a recovery is found? If so what is the time between checks?

    Using Zabbix 1.8.7

    Is this like the adaptive polling in OpenNMS?
    <downtime interval="30000" begin="0" end="300000"/> <!-- 30s, 0, 5m -->
    <downtime interval="300000" begin="300000" end="43200000"/> <!-- 5m, 5m, 12h -->
    <downtime interval="600000" begin="43200000" end="432000000"/> <!-- 10m, 12h, 5d -->
    <downtime begin="432000000" delete="true"/> <!-- anything after 5 days delete -->

    So... How to delay that OK status change? Or is there a completely better way to do this?
    Last edited by arikin; 05-12-2013, 10:44.
  • arikin
    Junior Member
    • Aug 2011
    • 27

    #2
    Hmm...

    I am still looking for some good advise on this.

    But, for now I will test out escalations to "double check" the status. If it fixes itself within 5 to 10 minutes I guess I didn't want to hear about it at 3 AM.

    Using this tutorial:

    Comment

    • arikin
      Junior Member
      • Aug 2011
      • 27

      #3
      A different approach

      They say talking to yourself is bad only if you answer yourself. Guess that makes the whole world crazy and me the only sane one.

      Anyway... decided to just check the last three values. If those are all bad then... send a notice. Something like this expression:

      {host:web.test.fail[Web].last(#1)}=1&
      {host:web.test.fail[Web].last(#2)}=1&
      {host:web.test.fail[Web].last(#3)}=1

      But I only wanted to do this during early morning maintenance. So tacked on some time constraints. So two triggers, one that does it normally and one that does it as above.

      Normal hours (08:00 to 03:00):
      &{host:web.test.fail[Web].time(0)}>075959&{host:web.test.fail[Web].time(0)}<030000

      Dead man hours (03:00 to 08:00):
      &{host:web.test.fail[Web].time(0)}>025959&{host:web.test.fail[Web].time(0)}<080000

      The lesson is to drive yourself crazy first, revert to a simpler method, and cackle madly.

      Comment

      Working...