Ad Widget

Collapse

Monitored host maintenance mode patch (zabbix-1.6)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bcheese
    Junior Member
    • Jun 2006
    • 26

    #1

    Monitored host maintenance mode patch (zabbix-1.6)

    Hi All,

    I have created a patch for a modification made to the 1.6 release tarball which provides me with the ability to disable alerting of hosts whilst I have them down for maintenance actions.

    How is this different from the maintenance mode built into zabbix already? The one built into zabbix is for the web interface so that you can perform maintenance on the zabbix server, where as my work permits maintenance actions to be performed on a monitored host without being bombarded by notifications which dont really need to be sent.

    How does it work? Well, there are a couple of changes this patch affords to make it all work as I need. Firstly there is another integer value added to the hosts table in the database called maintenance. This is used to store the hosts maintenance state. Following this is some additions to the configuration->hosts screen in the GUI. These additions provide the administrator the ability to interact with the added database column. Finally there are a number of similar changes to the zabbix server process to both retrieve and check the flag stored in the DB. Dependant on the state of the flag status when retrieving monitored parameters from a given host the server in its modified form may not process the values to re-evaluate the triggers of the host.

    I would like to propose that this patch be included into the current development effort for inclusion in the next stable release. Also I would be interested to hear back from anyone who actually uses it or has suggestions to improvements I could make to this patch.

    I would also like to take this opportunity to congratulate Alexei and team on the quality of the internal API which has made this patch so easy to implement.

    Cheers,
    Brian.
    Attached Files
  • xs-
    Senior Member
    Zabbix Certified Specialist
    • Dec 2007
    • 393

    #2
    Nice

    I have one suggestion / request tho.
    Would it be possible / or fit your uses to replace the current method or add a date/time field (the type already used for i.e. alerting) to configure automatic / period maintenance for a host?

    Comment

    • bcheese
      Junior Member
      • Jun 2006
      • 26

      #3
      Originally posted by xs-
      Nice
      Thanks xs-

      Originally posted by xs-
      I have one suggestion / request tho.
      Would it be possible / or fit your uses to replace the current method or add a date/time field (the type already used for i.e. alerting) to configure automatic / period maintenance for a host?
      Yes it would be possible to utilize a date field instead for ending the alert mechanism.

      What if (and I am only thinking out loud here and it may be what you are thinking of) it was changed so that the following are all true:
      • The current "Maintenance" column on the Hosts screen becomes a menu to place the host into maintenance mode and also show the current state of any maint operations. e.g. "Operational", "Maint - Indefinate", "Maint - 1h 23m".
      • This menu has an option to cancel the maint mode immediately for hosts which are in maint mode.
      • The duration times would be stored in a new table in the database allowing the user to configure the time frame options for the install. This would also require the creation of an additional Configuration or Administration page.
      • The menu would include an option for indefinite maintenance mode.

      An additional mechanism would need to be generated to allow the above options for actioning on multiple hosts. Perhaps this could be placed where the existing buttons are but again acting as a menu.

      Please let me know what you think as I feel your thoughts have some pluses over my approach. I feel that what I am now thinking would be a huge step forward and wide open enough to cover 99% of situations.

      Cheers,
      Brian.

      Comment

      • xs-
        Senior Member
        Zabbix Certified Specialist
        • Dec 2007
        • 393

        #4
        Originally posted by bcheese
        The current "Maintenance" column on the Hosts screen becomes a menu to place the host into maintenance mode and also show the current state of any maint operations. e.g. "Operational", "Maint - Indefinate", "Maint - 1h 23m".
        I like the idea (as an extra), but the way the permissions system is setup now, only super admins can do this. I think this functionality should at least be available in the host configuration. (perhaps with a popup).

        Originally posted by bcheese
        The duration times would be stored in a new table in the database allowing the user to configure the time frame options for the install. This would also require the creation of an additional Configuration or Administration page.
        You mean sort of predefined / templated maintenance 'plans'? I like your thinking

        Originally posted by bcheese
        The menu would include an option for indefinite maintenance mode.
        I assume you mean that a host can have a periodic maintenance but also be 'manually' set to maintenance mode (which would be indefinite)


        Basically this all would mean the following:
        • Maintenance mode boolean
          This is the on/off switch for alerting on this hosts. It's indefinite and overrules any periodic configuration
        • Maintenance plan time(s)
          repeatable date/time format which at the set time effectively puts the host in maintenance mode. Outside these times, the host acts normal.
          These 'plans' could be preset / templated in a separate table, for easy selection in host configuration.
        • GUI changes
          Management of this functionality could be done in numerous ways. Most obvious are in host configuration (per host basis) and administration->maintenance (bulk?)
          Also management for the plan table must be implemented (a new pulldown item in configuration->hosts?).


        Also thinking out loud
        I agree, the basic setup is great, and some minor changes could make it fit 99% of all situations for all users.

        Comment

        • bcheese
          Junior Member
          • Jun 2006
          • 26

          #5
          Originally posted by xs-
          You mean sort of predefined / templated maintenance 'plans'? I like your thinking
          Effectively that is what I am thinking. Lets use that phrase "Maintenance Plan" to infer an expected outage window containing a date specifier and a time from and to.

          Originally posted by xs-
          I assume you mean that a host can have a periodic maintenance but also be 'manually' set to maintenance mode (which would be indefinite)
          Yes indeed. Although it would not need to be indefinite, but rather could be for the next hour.

          Originally posted by xs-
          Basically this all would mean the following:
          • Maintenance mode boolean
            This is the on/off switch for alerting on this hosts. It's indefinite and overrules any periodic configuration
          Not quite what I had thought through to yet, but yes that would provide maximum flexibility,
          Originally posted by xs-
          • Maintenance plan time(s)
            repeatable date/time format which at the set time effectively puts the host in maintenance mode. Outside these times, the host acts normal.
            These 'plans' could be preset / templated in a separate table, for easy selection in host configuration.
          Yes absolutely, although the date/time functionality will need to be flexible enough to be able to cope with various situations such as:
          1. The 2nd wednesday of the month
          2. 23rd of the month
          3. every 5 weeks from a given date
          4. possibly even others.


          Originally posted by xs-
          • GUI changes
            Management of this functionality could be done in numerous ways. Most obvious are in host configuration (per host basis) and administration->maintenance (bulk?)
            Also management for the plan table must be implemented (a new pulldown item in configuration->hosts?).
          I would see the management of the standard maintenance plans to be done in Administration->Maintenance.

          The assigning of a standard maintenance plan to a host to be done in Configuration->Hosts. Infact this could be just moved to be part of the Select menu under actions which would also allow removal of the Maintenance column unless we wanted to keep it there and use it as a status indicator. Status being that the host is either in or out of a maintenance window and if it is in some indication of how much longer it is in effect.

          The only one I am not sure about yet is the ad-hoc outages. Perhaps Monitoring->Maintenance could provide an interface (taking group memberships into account) for a normal zabbix user to set/clear maintenance windows. As an example I would see this function as being useful where a server has physically failed and is awaiting for the service agent to repair it. This could allow the third level support staff to provide the second level support staff the ability to add ad-hoc maintenance windows to the hosts.

          I am thinking that this is an even better start to the whole concept of a host maintenance window to that which I had already reached. If there are no objections to this concept between now and the weekend, or we get to a suitable agreement on the final concept by then, I'll play around with the code again and see if we can't make it a reality.

          Cheers,
          Brian.

          Comment

          • xs-
            Senior Member
            Zabbix Certified Specialist
            • Dec 2007
            • 393

            #6
            I believe we are very much on the same page. All i can say is: gogo gadget coding fingers.

            Although i am also interested in what the devs think of this.

            Comment

            • bcheese
              Junior Member
              • Jun 2006
              • 26

              #7
              Originally posted by xs-
              I believe we are very much on the same page. All i can say is: gogo gadget coding fingers.
              Don't worry, will do.

              Originally posted by xs-
              Although i am also interested in what the devs think of this.
              I couldn't agree more as I am surprised that they haven't already built this type of function into Zabbix already.

              I'll keep you all updated as I progress.

              Cheers,
              Brian.

              Comment

              • Andreas Bollhalder
                Senior Member
                Zabbix Certified Specialist
                • Apr 2007
                • 144

                #8
                How does it handle the trigger state of other hosts, which depends on a trigger of the host in maintenance ?

                Andreas
                Zabbix statistics
                Total hosts: 380 - Total items: 12190 - Total triggers: 4530 - Required server performance: 224.2

                Comment

                • bcheese
                  Junior Member
                  • Jun 2006
                  • 26

                  #9
                  Originally posted by Andreas Bollhalder
                  How does it handle the trigger state of other hosts, which depends on a trigger of the host in maintenance ?
                  Andreas,

                  Thanks for the question. The patch above does not change the state of any trigger associated with the host in maintenance mode as it only stops them being updated. As a result the dependant triggers will be affected by way of the depenancies will stay as they were when last updated.

                  Having said that, you have made me think about this a little more. I now plan to change the way it works slightly so that the triggers are still updated and to just stop the alerts being sent.

                  If you have any thoughts or suggestions I would appreciate hearing them.

                  Cheers,
                  Brian.

                  Comment

                  • Andreas Bollhalder
                    Senior Member
                    Zabbix Certified Specialist
                    • Apr 2007
                    • 144

                    #10
                    Originally posted by bcheese
                    Andreas,

                    Thanks for the question. The patch above does not change the state of any trigger associated with the host in maintenance mode as it only stops them being updated. As a result the dependant triggers will be affected by way of the depenancies will stay as they were when last updated.

                    Having said that, you have made me think about this a little more. I now plan to change the way it works slightly so that the triggers are still updated and to just stop the alerts being sent.

                    If you have any thoughts or suggestions I would appreciate hearing them.

                    Cheers,
                    Brian.
                    Hello Brian

                    Currently, we're using a group called "Maintenance" and have specified in all actions only to send alarms when the host isn't in the group "Maintenance". When we do some maintenance tasks, we put the host into the group "Maintenance". With this, we have the triggers still showing up in the trigger screen, but no alarms are going out. Further, the data collection continues and all trigger dependencies are working.

                    I share the idea of a new page (Configuration -> Maintenance) for configuring one-time and recurring maintenance plans. This plans should be associated with a host or a template. Second, there should be a place to put/pull a host manually into/from maintenance.

                    In the trigger screen, there should be an option to enable/disable the triggers from hosts in maintenance (default disabled).

                    When the host enter into maintenance state, the data collection should continue, but triggers shouldn't send any message. Maybe, there could be an option to send a message when the host enter/leaves the maintenance state.

                    I don't know, but could it be an idea to make a new trigger state called maintenance ? This allows to associate an other icon for showing in maps.

                    Ok, this is my brainstorm and it still has room for improvements. But in 5 minutes, I can't go further.

                    Andreas
                    Zabbix statistics
                    Total hosts: 380 - Total items: 12190 - Total triggers: 4530 - Required server performance: 224.2

                    Comment

                    • Aly
                      ZABBIX developer
                      • May 2007
                      • 1126

                      #11
                      First of all, yes, we thought about this, but we just haven't had enough time for detailed planing, coding and implementing this. This is one of my tasks and I should say not the easiest one.

                      Mostly our thinking is similar in many ways.

                      But I have some doubts in possibility to realize "manual" set in/out maintenance mode per host. This is very tricky.

                      Also you should keep in mind overall performance. Too complicated maintenance functionality will require more checks(queries).

                      Also I was thinking about what is better: setting maintenance mode for host or may be for host-group. In large installations second one should be preferred.
                      Zabbix | ex GUI developer

                      Comment

                      • xs-
                        Senior Member
                        Zabbix Certified Specialist
                        • Dec 2007
                        • 393

                        #12
                        hostgroup maintenance would be nice, but individual host maintenance is at least as important.
                        (servicing hardware etc).

                        Comment

                        • bcheese
                          Junior Member
                          • Jun 2006
                          • 26

                          #13
                          Monitored host maintenance mode patch (zabbix-1.6)

                          Hi Aly,

                          Firstly thank you for submitting a development team view to this thread, and while I don't fully agree with your comments I do appreciate your time and perspective in this matter.

                          Originally posted by Aly
                          First of all, yes, we thought about this, but we just haven't had enough time for detailed planing, coding and implementing this. This is one of my tasks and I should say not the easiest one.
                          This is both good and refreshing to hear that this idea is already on the books.

                          Originally posted by Aly
                          Mostly our thinking is similar in many ways.
                          There are a number of us thinking very similar things on this which means that it should be able to become a reality.

                          Originally posted by Aly
                          But I have some doubts in possibility to realize "manual" set in/out maintenance mode per host. This is very tricky.
                          On this point I have to disagree. Last weekend I generated the patch which started this discussion off without having any training or understanding of the internals of zabbix. The only change which would be required for what I had already done to bring it inline with the current thinking is to remove the code which stops the triggers being updated, adding a query out in the alert sending code to perform a single SQL to determine if the host for the alert is in a maintenance window and then obviously allow/deny the sending of the alert.

                          Originally posted by Aly
                          Also you should keep in mind overall performance. Too complicated maintenance functionality will require more checks(queries).
                          I agree whole heartedly. This must be a priority, although if the indexes are available and used then the impact should be minor.

                          Originally posted by Aly
                          Also I was thinking about what is better: setting maintenance mode for host or may be for host-group. In large installations second one should be preferred.
                          My current patch already provides support for hosts and host groups for the manual setting/clearing of maintenance mode.



                          So where am I going next with this patch. Well, in the first instance I will be moving the "blocking" out to where the alerts are sent which will allow for the triggers and trigger dependancies to function as they do without the patch. Once this is completed I will post the new patch here for perusal and move on to designing and implementing the automated maintenance window operations. This is the larger area and may have some DB performance impacts.

                          Cheers,
                          Brian.
                          Last edited by bcheese; 25-09-2008, 00:10. Reason: changed alerts to triggers in 4th paragraph

                          Comment

                          • bcheese
                            Junior Member
                            • Jun 2006
                            • 26

                            #14
                            Originally posted by Andreas Bollhalder
                            Currently, we're using a group called "Maintenance" and have specified in all actions only to send alarms when the host isn't in the group "Maintenance". When we do some maintenance tasks, we put the host into the group "Maintenance". With this, we have the triggers still showing up in the trigger screen, but no alarms are going out. Further, the data collection continues and all trigger dependencies are working.
                            I can see why you are working this way as it is the most practical method for implementing this is the current realisations of Zabbix.

                            Originally posted by Andreas Bollhalder
                            I share the idea of a new page (Configuration -> Maintenance) for configuring one-time and recurring maintenance plans. This plans should be associated with a host or a template. Second, there should be a place to put/pull a host manually into/from maintenance.
                            Great, It seems like we are all having the same ideas/concepts.

                            Originally posted by Andreas Bollhalder
                            In the trigger screen, there should be an option to enable/disable the triggers from hosts in maintenance (default disabled).
                            If I read this right you are proposing that whilst the host is in maintenance mode, and a given trigger causes an alert to fire that we should look at the triggers details to see if we then ignore it? If this is so then this is good for single trigger alerts, but how would you see us dealing with multiple trigger alerts where some are ignored and others aren't? Should we ignore the trigger as if it wasn't there or consider it to be false (OK)?

                            Originally posted by Andreas Bollhalder
                            When the host enter into maintenance state, the data collection should continue, but triggers shouldn't send any message. Maybe, there could be an option to send a message when the host enter/leaves the maintenance state.
                            Yep, sounds good.

                            Originally posted by Andreas Bollhalder
                            I don't know, but could it be an idea to make a new trigger state called maintenance ? This allows to associate an other icon for showing in maps.
                            I suspect it would be a host state presenting as if it were a trigger, but yes this is a nice idea. I'll look at it and see.

                            Originally posted by Andreas Bollhalder
                            Ok, this is my brainstorm and it still has room for improvements. But in 5 minutes, I can't go further.
                            Andreas, it is nice to have other peoples ideas blending into the solution. Even my ideas and implementations still have room for improvement as well.

                            Cheers,
                            Brian
                            Last edited by bcheese; 25-09-2008, 00:44. Reason: corrected a closing quote and gramatical errors

                            Comment

                            • Aly
                              ZABBIX developer
                              • May 2007
                              • 1126

                              #15
                              Originally posted by bcheese
                              On this point I have to disagree. Last weekend I generated the patch which started this discussion off without having any training or understanding of the internals of zabbix. The only change which would be required for what I had already done to bring it inline with the current thinking is to remove the code which stops the triggers being updated, adding a query out in the alert sending code to perform a single SQL to determine if the host for the alert is in a maintenance window and then obviously allow/deny the sending of the alert.
                              I'm afraid you should agree. In your patch there is no maintenance plans.. but when they will be, it will be a problem to understand is host in maintenance mode or not. While we can select hosts in maintenance by timeline, there will problem in switching (as someone suggested) boolean field. In which moment it will be switched and by what process?
                              Zabbix | ex GUI developer

                              Comment

                              Working...