Ad Widget

Collapse

Maintenance & Notification Architecture Re-think

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #16
    Here is my current script. There may be some inconsistencies and poor choices for implementation, but it works.

    Please note that we are an Oracle shop and all queries are formulated with this in mind. This would have to be adapted for MySQL or any other supported DB.

    I have removed usernames and passwords and changed URLs and email addresses to use example.com

    We force the usage of Change Requests (CR) or Incidents (IN) to create a maintenance window, or notification is sent of an infraction. This should be avoidable/editable per your setup.

    I would be pleased and gratified of any changes or improvements or points you'd like to make. I'm a UNIX sysadmin and Java application admin by trade and not a coder. I'm merely handy with shell scripting. Some of the commented out bits reflect a recent change from Solaris as our Zabbix Server host OS to RHEL 5. Again, adapt as needed.
    Attached Files

    Comment

    • lukaswu
      Junior Member
      Zabbix Certified Trainer

      • Oct 2010
      • 3

      #17
      Originally posted by danrog
      The key is to setup (as another poster mentioned) maintenance with no data collection AND add to the action Maintenace status = not in maintenance.
      Could you sched some light on this? In my case I do not use "Actions" at all.

      Well, in 1.8.4 Maintenance is broken again, at least it does not work for me. I found already submitted bug on this (in commercial supported version). Soon I will be testing 1.8.5, though on the change list I did not notice patch for this problem.

      --
      luk

      Comment

      • untergeek
        Senior Member
        Zabbix Certified Specialist
        • Jun 2009
        • 512

        #18
        I am well and truly puzzled. We've used maintenance mode since 1.8.0 and never had it not work, once we figured out that we needed "maintenance status = not in maintenance" in the action conditions. We even still set up our maintenance windows with data collection so we're collecting data if it can be collected during the maintenance period. We've upgraded at each level, from 1.8.0 to .1, .2, etc. right up through 1.8.5. It has always worked. We had 2 Zabbix servers in separate environments, one for staging and one for production. We've added a third for our failover site. Maintenance mode has worked as expected in each case. I don't know why people have been saying that a given release has "broken" for them. It hasn't for us.

        Comment

        • lukaswu
          Junior Member
          Zabbix Certified Trainer

          • Oct 2010
          • 3

          #19
          Originally posted by untergeek
          I am well and truly puzzled. We've used maintenance mode since 1.8.0 and never had it not work, once we figured out that we needed "maintenance status = not in maintenance" in the action conditions. We even still set up our maintenance windows with data collection so we're collecting data if it can be collected during the maintenance period. We've upgraded at each level, from 1.8.0 to .1, .2, etc. right up through 1.8.5. It has always worked. We had 2 Zabbix servers in separate environments, one for staging and one for production. We've added a third for our failover site. Maintenance mode has worked as expected in each case. I don't know why people have been saying that a given release has "broken" for them. It hasn't for us.
          Allright, maybe I was wrong and this feature has never worked- I needed it in fact in 1.8.4.

          Again, I do not use "Actions" (hope you are refereing to Configuration/Actions) and maintenance mode simply does not work in 1.8.4. Period.

          When I set machine into maintenance mode I expect no alarm would show up regadless for status of data collection. Otherwise it does not make any sense. In my case when I set machine in "no data collection state" no alarms show up unless I reboot machine or shut down the Zabbix agent- we have item working with "no data" in case we lose network and it always shows up. Again maintanence mode should put server on a hook and Zabbix server should ignore ANY errors in described scope. I presume if you do not use "no data" in item(s), you would be unaware in fact when the Zabbix server loses communication with monitored server and in this case mainatance mode apparently may work (in fact items are set into Unsupported state).

          To confirm other people have problems too, see:





          Kind regards.

          --
          luk

          Comment

          • untergeek
            Senior Member
            Zabbix Certified Specialist
            • Jun 2009
            • 512

            #20
            I apologize. I misunderstood you. Let me try again:

            1. You are only using visual cues in the Dashboard to determine whether a host or item is up/down (No actions).
            2. Maintenance mode with no data collection (which effectively disables the entire host) is the method you're employing.
            3. You're still seeing errors in the Dashboard.

            Do I understand correctly? If so, then I have some additional follow-up questions.

            1. Were the items already alerting before the maintenance?
            2. Do you see the items which are in an alerted state in the "Last 20 issues" section of the Dashboard? If so, are the host names orange (instead of blue)?

            If the answer to #1 is yes, then according to my understanding you should still see the items even after the maintenance has begun. Escalations already in place exist outside of maintenance.

            If the answer to #2 is orange, then maintenance is at least properly happening, whether before the items alert or not. I find that even hosts already alerting will show up orange in the Dashboard. Maintenance will only prevent notifications from going out for new alerts, not for pre-existing ones. I believe that will be the case even when "No data collection" is selected. In my case, because we always continue with data collection throughout maintenance, we see alerts in the Dashboard but no notifications come through. In your case I think that "no data collection" should prevent new alerts from showing up, even in the dashboard. However, it will not prevent existing alerts from continuing to appear.

            At my location, we only care about whether the notifications come through or not. In fact, we depend upon Zabbix continuing to show host & item status (with the orange links to show maintenance) while we work through issues. We have visual confirmation that they have cleared up or persist. I am not sure what to say about the other side of the coin, what it should look like and whether or not it's a bug that it does not behave the way it is expected. If this worked previously, it may either be a bug now, or it was "broken" then (i.e. you had the desired functionality but that itself was not intended by the Zabbix team).

            Comment

            Working...