Ad Widget

Collapse

Actions, configurations, nofitications, maintenance

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Mox
    Member
    • Sep 2009
    • 90

    #1

    Actions, configurations, nofitications, maintenance

    Hi everyone!
    Firstly, I'd like to talk about configuring the actions in general. How it is confusing.
    Secondly, about notifications after maintenance period expiration.

    o) Configuring Actions.

    Now in current zabbix version 2.0 we have 'Trigger value' condition that can be set to OK or PROBLEM or both. And we have Recovery message checkbox.
    At first glance I (and probably some of you) would say that 'Recovery message checkbox' is excess option, and that checkbox works like 'Trigger Value = OK' condition (or the other way around).

    Beware. Usually it does, but there is only one difference. When you want to use escalations only 'Trigger value = PROBLEM' + 'Recovery msg checkbox' works good as expected.
    Surprise? I know, it's documented, but I'm going crazy when thinking why did developers do that. Why did they make two things that do the same but not the same. %)
    So, we might think, if only this configuration works as expected we could abolish 'Trigger value' condition at all! What for do we need two things that do the same?
    In other words set 'Trigger value' condition permanently to 'PROBLEM' and hide it from user. Then if user wants to receive OK notifications let him mark 'Recovery msg checkbox'. So there is no 'Trigger value' condition anymore.

    But, I can imagine situation when we want to run 'Remote command' (instead of 'Send message') on OK switching. Or we would want only OK notification (quiet strange) or something else.
    Another disadvantage of removing 'Trigger value' condition: it's very difficult to find out which action configured to send OK notifications in Actions list (Configuration -> Actions).
    For example, I use zabbix in production since 1.4 (now 1.8) and have ~70 actions (you have more, I know)... And I still can figure out which one has OK condition at first glance)

    So, I tried every variant to configure action. Here is the table with testing (see first page). As you can see there are so many possible different configurations! It's quiet confusing.

    W/o escalation every variant works good, but w/ escalation in most cases OK message is sent after step duration. Only 'Trigger value = PROBLEM' + 'Recovery msg checkbox' works excellent.


    o) Notifications after or during maintenance period.

    How does notification work in zabbix 2.0 when we use maintenance period? So, we just figured out that only 'Trigger value = PROBLEM' + 'Recovery msg checkbox' is good.
    I tried this variant and "'Trigger value = PROBLEM' and 'Trigger value = OK'" variant. Here is another table (see previous link, at second page).
    The main idea is neither of this configurations work good.

    Let's consider situation when you have one trigger on some host. You have configured action, that notifies you at OK and at PROBLEM switching of this trigger.
    Then you have configured 'maintenance period' for this host with collection data
    Initially tigger is on OK state
    Maintenance period starts
    Then during maintenance period trigger switched to PROBLEM state.
    It wouldn't send notification, right? Yes, and it's good behavior.
    But then maintenance period ends and our trigger is still in PROBLEM state, i.e. trigger_state_before_mp != trigger_state_after_mp
    And in this situation zabbix wouldn't notify you. Is it good behavior?
    I think no. I think it's very annoying bug/feature. I think it mustn't notify you only in case if trigger_state_before_mp == trigger_state_after_mp, no matter did it switch during maintenance or not.
    You can realize this behaviour is actually ok, because Zabbix does send notifications when there was a trigger status change.
    Not actually. See generate_events() which is called from update_maintenance_hosts().
    In 1.8 branch if you use 'Trigger value = PROBLEM' and 'Trigger value = OK' instead of 'Recovery message box' it works like I just described (but you had to forget about escalations).
    Here is ticket to prove my statement - ZBXNEXT-894.
    I wrote small patch in this ticket. Patch makes this behavior much useful.

    It would be much more useful if zabbix sent you such notifications in described case. Do you agree? <--- It's main message!

    This is the biggest problem why I still use 1.8 branch in production.
    Because you can't switch off escalations on 2.0. It's hided from user and switched on.

    I've made small patch for 2.0 branch. Find it in suggestions below!

    Example from real life: I work in IT department (~30 admins) with different hardware and servers. I'm the only one who support our monitoring system.
    One of these admins made MP for host1. When MP had started admin updated software on host1 and then rebooted it. When host1 went up some of service that has trigger didn't go up.
    Tigger Service1 switched to PROBLEM, no notification. Then maintenance ended and no notification again. Admin went home and he doesn't know that something is going bad.



    o) Finally. As result:
    - two things that do the same, but not the same ('Trigger value = OK' and 'Recovery msg checkbox'). A bit counter-intuitive.
    - Not good behavior of notifications with maintenance.

    My suggestions:
    1. Make 'Recovery msg checkbox' something like alias to 'Trigger value = OK' condition and leave that checkbox just for overwriting recovery message. Or at least make it visible in action list, but previous suggestion is much more welcome!
    2. Remove default adding of 'Trigger value = PROBLEM'. Leave just 'Maintenance status not in "maintenance"'. Because if we don't set any 'Trigger value' condition then any trigger value will cause event (i.e. both OK and PROBLEM). It's general logic of any condition type work.
    3. Make 'Trigger value = OK' work with escalations in smart way, i.e make zabbix to consider operation type.
    If 'operation type'=='send message' then consider escalation step duration only when trigger goes to PROBLEM, but at OK ignore step duration. If 'operation type'=='remote command' then consider step duration always.

    Or just introduce 'Ignore step duration on OK' checkbox for action configuration! which would set by default.
    4. Realize suggested good behaviour of notifications with maintenance. See suggested in ticket patch (if it will work with above fixes). I meant notify only if trigger_state_before_mp != trigger_state_after_mp.
    Here is patch for 2.0 branch. Check it out.
    http://pastebin.com/Ux61kZvb
    5. Revert r32342 trunk svn commit to bring back good and right comment for generate_events() function.

    Welcome to discuss!

    P.S. Sorry for my english! It was so hard to explain in foreign language!

    UPD. Suggestions are reworked 3.0.
    UPD 2. Patch for 2.0 has been added!
    Last edited by Mox; 29-07-2013, 11:46. Reason: rework 4.0
  • Mox
    Member
    • Sep 2009
    • 90

    #2
    My colleague from another city is studying zabbix right now.
    He asked me what's wrong with his zabbix action configuration.
    He marked recovery message checkbox, rewrote Recovery message like he wanted. But Zabbix always sent him Default message when tigger went to OK (instead of his Recovery message).
    The problem was because of he deletes 'Trigger value = PROBLEM' condition everytime when create new action.
    IMO it occured because 'Trigger value' condition and just a checkbox are in different planes and I think it's quiet difficult to relate them logically.
    Last edited by Mox; 29-07-2013, 11:48.

    Comment

    • zalex_ua
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2009
      • 1286

      #3
      Did you know about ZBXNEXT-452 ?

      I'm still not very happy how it has been resolved.

      Comment

      • Mox
        Member
        • Sep 2009
        • 90

        #4
        Originally posted by zalex_ua
        Did you know about ZBXNEXT-452 ?

        I'm still not very happy how it has been resolved.
        I completely agree with you! And that was quet strange soluton IMO
        Last edited by Mox; 05-02-2013, 14:06.

        Comment

        • Mox
          Member
          • Sep 2009
          • 90

          #5
          I've made a tiny patch for 2.0 and attached it.
          Try it!
          1. Apply patch
          2. Make trigger and make action with conditions
          - maintenance status not in "maintenance".
          - you can set Trigger value = PROBLEM && Trigger value = OK, and can leave in unset.
          - no Recovery message checkbox!
          3. Because of unset recovery msg checkbox forget about escalations
          4. Set notification contacs.
          5. Set maintenance period you like

          Then you can swith your trigger during maintenance period and figure out what's happening!
          Last edited by Mox; 06-02-2013, 11:55.

          Comment

          • Mox
            Member
            • Sep 2009
            • 90

            #6
            Why I don't recommend you to try suggested patch with recovery msg checkbox?
            See second page of https://docs.google.com/file/d/0By9y...BNNVl5MW8/edit
            Because it notifies you on OK event during maintenance period!

            Comment

            • Mox
              Member
              • Sep 2009
              • 90

              #7
              Clarification for 3rd suggestion:
              Only recovery msg checkbox works good with escalation, right?
              What does it mean?
              Notify on PROBLEM and then using step duration generate another notifications in step duration. But on OK it must ignore step duration and notify all previously notified people in 1 time.
              What's wrong with 'Trigger value = OK' then? -> It doesn't ignore step duration on OK switching.
              If you use 'trigger value = ok' insted of recovery msg checkbox: We have 2 people who want to know about our trigger. 1st man - immediatly, 2nd man - in 1 hour after it went to PROBLEM. We set action where defined this escalations. Trigger goes to PROBLEM - 1st man notified. In 1 hour - 2nd man notified. But when it goes to OK our action configuration sends only one OK for 1st man and then waiting step duration (1 hour as I said) and then send OK to him.
              i.e Recovery msg checkbox ignores step duraion on OK switching. But 'Trigger value = OK' doesn't.
              I can't imagine situation, when you (2nd man) need to be notified on OK switchin in step duration. Can you? So I imagine only one hypothetical situation - if you run remote command instead of send notification.
              Here is purpose of 3rd suggestion.
              Last edited by Mox; 06-02-2013, 13:04.

              Comment

              Working...