Ad Widget

**MrKen** · 09-10-2010, 07:28

Originally posted by IT_Architect

. . . .there is an essential piece missing, the need to be able to schedule Zabbix deactivate monitoring a certain host, or group, during a pre-determined time period.

Have you not heard about Maintenance mode? http://www.zabbix.com/documentation/1.8/manual/maintenance_mode_for_gui

Either that or just do it manually, if it's just one host or host group.

Originally posted by IT_Architect

Useful: Be able to schedule where the notifications are sent based on time periods. Keep it simple. For those with complex requirements, would be better served by an application outside of Zabbix to accomplish that task. I would guess that most of your installed base are guys like me who try not to work 24 X 7, can't be glued to their e-mail all day, and whose main job is not to watch a wall of monitors errors all day long.

You can already do this too! In the User's media set-up you can define periods (days, hours) in which each media is to be used. For example, emails during working hours, sms after-hours.

MrKen

**IT_Architect** · 09-10-2010, 17:29

Thank you for your reply

Originally posted by MrKen

Have you not heard about Maintenance mode? http://www.zabbix.com/documentation/1.8/manual/maintenance_mode_for_gui

The link you posted is not related to deactivating monitoring of a host or group for a predefined period. The link refers to disabling the user interface so people cannot make changes while the database is being maintained. "Zabbix GUI can be temporarily disabled in order to prohibit access to the front-end. This can be useful for protection of Zabbix database from any changes initiated by users, thus protecting integrity of database. "

Originally posted by MrKen

Either that or just do it manually, if it's just one host or host group.

That also doesn't address the functionality of deactivating monitoring of a host or group for a predefined time period. For example, to do it manually for a data center maintenance period it would entail setting an alarm clock for 2 AM in the morning, getting up, going to a computer, turning host monitoring off for a host or group, and setting your alarm again for 3 AM, to get up and turn it back on. Nobody is going to do that. However, if you don't do that, Zabbix will send you false alarms at 2 AM when they start maintaining the servers. You will attempt to figure out what the problem is and discover it was a false alarm triggered by a maintenance period in the Data Center. After that, you're not likely to ever respond to a Zabbix alarm at night. During the day, you can turn it off manually when you maintain a server, but sometimes you will forget to turn it back on again. It may be quite awhile before you notice you forgot and turn it back on. So you learn from that and just don't turn it off anymore to avoid the risk of forgetting to turn it back on. The net result of both of these scenarios is more than 90% of the messages from Zabbix will be false alarms. You surely wouldn't want to be texted with all of these false alarms, so you send them to e-mail. When you get around to processing your e-mail, you will process the Zabbix messages last, because the odds are heavily in favor that all of the Zabbix messages will be false alarms. What has happened is Zabbix has become a source of self-inflicted spam for you, and you will be exactly where I am now, where you're more likely to learn about a problem hours later from a phone call, or when you discover during your own use, that something isn't working.

Thus, neither of these two responses address the critical need to be able to schedule the deactivation of monitoring for a host, or group of hosts during periods of scheduled maintenance for the purpose of eliminating false notifications, nor do I see a real-life-usable work-around.

Originally posted by MrKen

You can already do this too! In the User's media set-up you can define periods (days, hours) in which each media is to be used. For example, emails during working hours, sms after-hours.MrKen

WOW! I was initially confused by what you wrote, but learned your response was exactly correct. I have multiple installations of Zabbix, but all but one are on 1.6. The one that I upgraded to 1.8 a few days ago does indeed have EXACTLY what I need. That's perfect! Thank you for pointing that out.

Summary:
The only remaining issue I have is the most critical one, and that is being able to schedule the deactivation of monitoring for a host, and group of hosts, during maintenance windows to prevent false alarms. I would be happy to accept a solid work-around such as how I could write a script to deactivate and reactivate monitoring for a host and group of hosts. I don't need to have Zabbix schedule and run the script. I can manage that outside of Zabbix

Thanks!

**MrKen** · 10-10-2010, 04:34

Originally posted by IT_Architect

I have multiple installations of Zabbix, but all but one are on 1.6. The one that I upgraded to 1.8 a few days ago does indeed have EXACTLY what I need. That's perfect! Thank you for pointing that out.

This functionality is available in 1.6, and even in 1.4. And judging by the image in the 1.4 manual, it was available in 1.1

MrKen

**IT_Architect** · 10-10-2010, 14:20

Originally posted by MrKen

This functionality is available in 1.6, and even in 1.4. And judging by the image in the 1.4 manual, it was available in 1.1

MrKen...In the User's media set-up you can define periods

Hi MrKen,

I'm going to have to say you are wrong on this one too.

None of the 1.6 User setup windows even have the word Media on them. I don't know where you're seeing it, but I'm guessing you don't have a version 1.6 to look at.

Other: Having been a programmer and dba for a long time, I looked through the data structures and wrote a php script that will activate and deactivate hosts or groups of hosts. It works perfect, and I just finished putting all of the error checking. The Maintenance Calendar they have in Zabbix is perfect, but I don't see that it does anything useful. Even if it disables changes from the GUI, the database would be changing many times a second from monitor data. I couldn't believe that it didn't also disable monitoring, so I tried it. The manual is right. It does nothing to stop monitoring. The Task Scheduler on the Windows servers that I use for everything else and hoped to use here, won't work because it doesn't understand end times. Soooo I'm going to need to come up with a scheduler that does. One option is to use some of the scheduling code from one of the ERP packages I've written. I'd have to modify extensively it because it has far too much functionality for this application. Another option is to find a simple system on the web that understands beginning times, end times, and durations.

**MrKen** · 11-10-2010, 03:40

Looks like 1.6.5 to me!

Attached Files

**IT_Architect** · 11-10-2010, 04:53

Oh no! I'm going to have to eat crow on this one.

I looked all over that screen for the word Media before, and before I posted. They hid it in plain sight on me. The only thing different between 1.6 and 1.8 is where they put it. Crawling back under my rock.

What remains is the glaring lack of a way within Zabbix to discontinue monitoring of hosts during maintenance periods to prevent the many false alarms that I, and it must be everyone else, are getting. Incorporating this functionality it would be huge boost to Zabbix's real-world usability as a monitoring solution.

I have a php script I can post if there is interest that can be used as a work-around. The problem with it being outside of Zabbix is if you change the name of a host, group, password, etc., it will break, and you will need to provide your own means of scheduling it.

Thanks!

**jpriceit** · 17-11-2010, 21:10

Originally posted by IT_Architect

What remains is the glaring lack of a way within Zabbix to discontinue monitoring of hosts during maintenance periods to prevent the many false alarms that I, and it must be everyone else, are getting. Incorporating this functionality it would be huge boost to Zabbix's real-world usability as a monitoring solution.

I think this option solves that problem. I am just now trying this for the first time, but it would appear to do so. Note: Using v1.8.3 release.

Edit: I would also like to point out that this entire maintenance feature is either not documented or is difficult to find in the manual.

Attached Files

**IT_Architect** · 17-11-2010, 23:04

Originally posted by jpriceit

Does this option not achieve that goal? I am just now trying this for the first time, but it would appear to do so. Note: Using v1.8.3 release.

All I can say is try it. Since I never got anything useful out of it, I wrote my own during which my expectations changed. I wrote a PHP script that accepts inputs from the command line or other scripts. It allows groups inside of groups. Example:
- I have a group that all hosts that a Zabbix instance is servicing in one group.
- You need to have two Zabbix instances in a data center in case a Zabbix machine goes down. Example Dallas1-Z1, Dallas1-Z2.
- I have a Data Center Group that includes both of those groups, Example Dallas1, so that when the DC is under maintenance, I can simply schedule Dallas1 for maintenance, and both groups and anything outside of the DC that is monitoring Dallas1 do not monitor anything at Dallas1 during that period.
- I also have Global Groups. For instance, in the case where you have a data provider that supplies data to web apps scattered across DCs, I schedule that group, and it will automatically make sure those application checks are not made. This is useful in a hosting situation where you want to monitor the server, but not the web applications of certain domains.
- This notification system for the Zabbix servers has been wonderful because in my case, I've been virtual for years. When I need to work on a physical machine, you guessed it, all of the virtual machines on that server are in a group, and whatever is monitoring gets the message not to during the scheduled maintenance period. There is no more matrix in my head of who's watching what. I can move virtual machines across servers with very few changes.

Summary:
It's been a dream. When I get maintenance notices, I just put them on the schedule for 5 minutes before the scheduled down time, and until 30 minutes after the scheduled down time. I can easily see at any time when something will be down. After the expiration period, the checks kick in. If the application server data is messed up, the web application checks fail, and I'll know before morning that I need to get on the phone with the data vendor so come morning, I don't start the day off losing money. I can take expired schedules, change the times, and re-use them. I now have Zabbix text me for disaster-level events when there is a problem, because I know if I get a text, I am losing money, no maybe about it.

With Zabbix capability to watch services and applications, and this to cut out all the false alarms, it has freed my mind to where I don't worry about the servers anymore. The only thing I see now is my daily report that tells me how the backups went, what needs to be updated, and server messages from the previous day that show me server load and disk space problems. If I want to analyze a problem, I can go into Zabbix and call up a graph. I live in a lot calmer environment now. If the Zabbix scheduler doesn't work for you, maybe you will want to make up something like this.

**jpriceit** · 17-11-2010, 23:10

I got a chance to test this today. One of our hosts that was scheduled to have updates installed was rebooted multiple times (it had a lot of windows updates to be installed).

I didn't receive a single alert for that host during this time.

**IT_Architect** · 18-11-2010, 00:02

Originally posted by jpriceit

I would also like to point out that this entire maintenance feature is either not documented or is difficult to find in the manual. ...I got a chance to test this today. One of our hosts that was scheduled to have updates installed was rebooted multiple times (it had a lot of windows updates to be installed). I didn't receive a single alert for that host during this time.

Perfect! That's something that couldn't be answered satisfactorily before, and you must have done something different than I when you did it. That might work for most people. What I have is better now, but if I could have gotten it to work, I perhaps would have taken it.

Thanks for the feedback!

**danrog** · 18-11-2010, 04:48

We use maintenance mode and have over 1000 hosts (a lot setup with only snmp traps) and we don't receive a single alarm during maintenance. The key is to setup (as another poster mentioned) maintenance with no data collection AND add to the action Maintenace status = not in maintenance. We also don't get many if any false alarms. I spent about a week tweaking triggers and actions when we first switched to Zabbix. Taking the time upfront planning it out definitely helped our deployment.

**IT_Architect** · 18-11-2010, 14:01

Originally posted by danrog

The key is to setup (as another poster mentioned) maintenance with no data collection AND add to the action Maintenace status = not in maintenance.

There we go. That's a clearly spelled out key piece of information. I wouldn't go back to this after what I have now since I've come to rely on nested groups, cross-Zabbix-server groups, and global Zabbix servers notifications.

**untergeek** · 22-11-2010, 19:09

Maintenance mode is so critical to our operations that I wrote shell scripts to directly access the database with the same commands as the UI.

We now are able to enter maintenance for a host or a group within moments with the same precision or with intervals by hours and/or minutes.

Granted, this bypasses security constraints, but only my team has access to the server with the scripts.

Code:

$ maint.sh 
Usage: maint.sh [OPTIONS]
          -i (run in interactive mode)
          -m (run in manual mode)
          -e [Maintenance ID] (end maintenance now)
          -s (show scheduled maintenance for next 24 hours)
          -x (silence all alerts and delete all escalations)
          -z (put all groups in maintenance (-x will be set also))
        Manual options 
          -H [hours] -M [minutes] -S [seconds] (duration calculations)
          -C [CR Number] -I [IN Number] 
          -g [Group search term]
          -h [Host search term]
          -n ["Maintenance Name/Title (enclose in quotes)"]
                Defaults to "Added by $FULLNAME on $DATE by script"
          -d ["Maintenance Description (enclose in quotes)"] (Only if no CR/IN)
                Defaults to "Quick maintenance window added by script without CR or IN"
          -T [Comma separated list of recipients - in addition to [email protected]]
          -? (Display this help)

We have our unix logins mapped to the same as our zabbix logins, so that's how we know who created a given maintenance. The other functionality which was so enjoyable was the ability to quickly end a maintenance window when a server was done being maintained.

This is by no means a complete implementation. It does not allow for creation of repetitive maintenance (e.g., weekly or daily). We still use the UI for that. This tool is for quick maintenance window creation and for showing servers/groups currently in maintenance, etc.

**fmrapid** · 22-11-2010, 20:27

Maintenance script

Would you care to share the maintenance management script you have created. This is certainly something that is of much interest to all here.

I can also see a will to convert the script to using the API for a more consistent approach if possible.

You can put it up on the wiki or link it somewhere else, taking care to strip out any passwords.

Thank you,

fmrapid

Ad Widget

Maintenance & Notification Architecture Re-think

Maintenance & Notification Architecture Re-think

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment