Ad Widget

Collapse

Sheduled Maintenance Windows

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Ash
    Junior Member
    • Sep 2004
    • 6

    #1

    Sheduled Maintenance Windows

    Alexi,

    Is there now, or any plans in future versions to have an area of Zabbix that allows you to configure maintenance windows for various devices so that zabbix stops polling them during those periods.

    We have periods where systems will be taken offline for various maintenance to be carried out. During those times Zabbix begins sending alerts that these systems are uncontactable etc.

    It would be useful if there was a means of:
    A. disabling all checks/alerts/etc for each system individually between user defined times on certain days of the week, days of the month etc.
    B. quickly cancelling all checks/alerts/etc for a particular system or a group of systems for a specified time during unscheduled outages etc.

    For example, if the main link to a particular data centre goes out unexpectedly, a means of quickly disabling the checking of all systems within that location until the link is fixed needs to exist to prevent 'death by email alerts' from all devices that cannot be contacted during the outage period until the link is restored.

    Regards,
    Ash
  • charles
    Member
    • Sep 2004
    • 54

    #2
    Originally posted by Ash
    For example, if the main link to a particular data centre goes out unexpectedly, a means of quickly disabling the checking of all systems within that location until the link is fixed needs to exist to prevent 'death by email alerts' from all devices that cannot be contacted during the outage period until the link is restored.

    Regards,
    Ash
    Look in the docs for dependencies. Basically you make your host dependent on that router and if the router goes down the host alert will not go as well.

    I need to set this up badly, but in 1.0 this means editing every template. In 1.1 it will be easier.

    charles

    Comment

    • LEM
      Senior Member
      Zabbix Certified Specialist
      • Sep 2004
      • 112

      #3
      Originally posted by charles
      Look in the docs for dependencies. Basically you make your host dependent on that router and if the router goes down the host alert will not go as well.
      Trigger dependency is not the only thing to take in account, albeith it's a good idea for technical dependencies. By "maintenance windows", only time sometimes is important: this mean that a somewhat "functionnal" dependencie must be taken in account.

      This time/window issue is the same with "service availability" and IT service SLA issue.

      I did not find how to simply introduce this functionnal time/window issue with SLAs at the present day (1.0). Some ideas?

      --
      lem
      --
      LEM

      Comment

      • charles
        Member
        • Sep 2004
        • 54

        #4
        Originally posted by LEM
        I did not find how to simply introduce this functionnal time/window issue with SLAs at the present day (1.0). Some ideas?
        lem
        Sorry, I was just ignoring that aspect of your post I know the feature you want - nagios has it and it can be very useful. It cannot be done today with zabbix.

        But if enough people say they need it, you will see Alexei reacts very well to feedback.

        charles

        Comment

        • naparuba
          Junior Member
          • Feb 2005
          • 11

          #5
          It would be a great function, the doc si not very clear (for me.. ) about how is calculate the SLA, and it's a important functionnality for me.

          Comment

          • LEM
            Senior Member
            Zabbix Certified Specialist
            • Sep 2004
            • 112

            #6
            Time frame consideration for SLA and alerting

            About
            I know the feature you want - nagios has it and it can be very useful. It cannot be done today with zabbix.


            But if enough people say they need it, you will see Alexei reacts very well to feedback.
            I guess there is tree impact of time frames (do someone see more?):
            1. time frame where you can use (or not) a given media
            2. time frame where you count the triggers status for SLA calculation
            3. time frame for maintainance windows

            Time usage for a given media
            =================
            We should be able to define the time frame (days, hours in days...) where a given media may be use. For example, not use the media 'mail' for the user 'admin-team' when they are all sleepings (at least, not at work... say, week days from 20h to 08h, for example). This is the way it is done in Microsoft Operation Manager 2005.

            Time usage for SLA calculation
            ==================
            Most of the time, our SLA are based on a 'usage time' which is not 100% 24x7x365 for all elements. Say for example the monitoring of the availability of the given database when we know a off-line backup is performed each weekend from 12:00 to 16:00.
            It should be possible to indicate 'no SLA calculation for this period' for a given trigger (and/or for a given host).

            Time usage for maintainance windows
            ======================
            Where no monitoring are to be done (swith on to 'unmonitored' and swich back to 'monitored). In fact, for some servers, We don't need to monitor them during some previously known period. Say a given server during a time frame dedicated to an evolution/transition project.

            Any more ideas to help setting up the feature request for Alexei ? :-)
            --
            LEM

            Comment

            Working...