Ad Widget

Collapse

How to do flexible per-host alerting periods?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #1

    How to do flexible per-host alerting periods?

    My site's current Zabbix server environment is zabbix 4.4.7 on RHEL 7.8 x86_64 . We've been using Zabbix since the 2.0.x days, so this question is about a change or improvement to an existing environment.

    We're looking to adjust our existing alerting in the following way:
    1. easily configure a custom alerting period on a per-host basis
    2. continue to detect problems for our hosts on a 7x24 basis, so the issue still appears in the web interface and a problem event is still registered
      1. if the problem happens outside of the alerting period, no alert action is generated until the alerting period is reached, at which point alerting should start.

    For most of our enterprise systems, we are configured to alert 7x24 if there is a problem, and that's working fine.

    My workgroup has taken on monitoring a lot of client systems that have limited operational hours, though, and we're trying to adjust our actions to allow for flexible, per-host alerting periods. For example, we've begun monitoring a client that is only used from 7:00 AM until 5:00 PM during the weekday. We would like to have Zabbix continue to detect problems for this client on a 7x24 basis, but only alert our staff during business hours during the week. We don't need to be woken up in the middle of the night or paged on a weekend if there is a problem on this client, hence the desire for a per-host custom alerting period. We have other clients that would have slightly different alerting periods, so we're looking for a method that allows us to easily customize it per-host.

    I thought that adding "Event time" to the list of action conditions and then having a user macro define the alerting period would be the perfect solution for this problem.

    I modified one of our exising error actions to include Event time as part of the conditions, like this:

    Zabbix action using Time period

    I defined {$ERROR_ACTION_PERIOD} globally to 1-7,00:00-24:00 , so that the default for any host is 7x24 alerting.

    I then modified the first host to have {$ERROR_ACTION_PERIOD} => 1-5,06:00-17:00

    Experienced Zabbix users can tell where this is going. It works as long as the original problem event was generated within the time period defined by the macro, but if the problem was first detected outside of that period and problem event generation mode is set to the default of "Single", then the conditions will never match because the event started outside the time period. That's as designed, but it makes it unusable for our desired alerting goals.

    We know that it's possible to enable "Multiple problem event generation", but it's not at all clear from the docs what the downsides are to doing that. Since our triggers for most things come from templates, we would need to enable multiple problem event generation for a large number of our hosts, whether they need it or not.

    We know that triggers support time period evaluation too, but we've rejected that method here because setting the time period at the trigger level means that the problem is not registered at all until the host's specific "in use" period is reached.

    We know about recurring maintenance periods as a way to suppress alerts, and right now that seems like it might be the closest match for our requirements. I avoided a per-host recurring maintenance period in my first attempt because we'll have to periodically extend the maintenance period and we'll potentially need a dozen or more of these to handle these hosts with different alerting requirements.

    I'm looking for any advice from other Zabbix admins that have had to do something like this and what methods you've used to accomplish it.

    I would also be interested to hear from sites that are using "Multiple problem" event generation on a widespread basis (not just for log items, but for basically everything), to understand more about what the downsides are for sing it widely.

    Thanks,

    Tim
  • pizarrocz
    Junior Member
    • Jun 2022
    • 4

    #2
    tim had you find a solution to when a event occours before working hours and continues in working hours we will notify the admin?

    Comment

    • tim.mooney
      Senior Member
      • Dec 2012
      • 1427

      #3
      Nothing as good as what we had hoped to accomplish with the failed experiment I outlined above.

      What we've ended up doing for a couple of these hosts where we absolutely don't want to be alerted about them on weekends or outside of business hours is just using a long-term recurring maintenance period that covers the times where we shouldn't receive alerts. It's not our perfect solution, but it works OK.

      Luckily right now we only have a few hosts where we need this.
      Last edited by tim.mooney; 21-06-2022, 23:58.

      Comment

      • pizarrocz
        Junior Member
        • Jun 2022
        • 4

        #4
        i have find 3 solutions but always with one problem.
        if i create a trigger with time period but if a event occurs off that time period i will lever be notified.
        if i insert a machine in maintence mode and allow notifications from a specified trigger alert me its perfect but if the client has access to the host he will question me for why the maintance.
        if i create a action that only alerts me in high or dsesaster alerts and i exclude a host i don't wont to be notified i might have a problem if i still wont to receive notifications about avaliability

        Comment

        Working...