Ad Widget

Collapse

Long-running maintenance in phases

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • scoopex
    Junior Member
    • Sep 2010
    • 9

    #1

    Long-running maintenance in phases

    I want to perform operating system updates for, say, 1,000 hosts spread out over 14 days. Since the entire system remains in live operation during this update phase, I do not want to put all 1,000 hosts into maintenance mode at the same time for 14 days, as this would prevent alerts from being triggered for any issues that arise. To roll out the upgrades, I use an automated process that updates a certain number of systems in stages and reintegrates them into the overall system upon completion. I could extend this process to also coordinate maintenance in Zabbix via its API.

    Creating a maintenance task for each host individually seems a bit questionable to me.
    How could this be coordinated effectively in Zabbix?

    Intuitively, I would create a maintenance task with a duration of 14 days and then add individual hosts to the maintenance list for the duration of their upgrade and remove them again.

    My questions regarding this:
    • Would Zabbix still be able to distinguish between downtime caused by outages and downtime caused by maintenance when reviewing, for example, SLA calculations or problem history?
      (I noticed that adding and removing hosts in Maintenance is logged in the “Audit Log,” at least—though unfortunately, only the host ID is recorded.)
    • Does the tag matching described in the documentation refer to the host’s tags or the trigger’s tags?
      (For example, one could introduce a tag for hosts that could be used for my purpose:
      Code:
      host_status=in_maintenance|in_production
      )
    • What are the technical implications of creating thousands of maintenance periods when having individual maintenances for every host?
      (i.e. performance)

    Another very simple approach could be to deactivate hosts, but in my view, this has the drawback that triggers like
    Code:
    notata()
    will fire immediately when the host is reactivated.
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4901

    #2
    Does the tag matching described in the documentation refer to the host’s tags or the trigger’s tags?
    Does not matter. it can also be a item level tag.. Maintenance holds back escalations. Things that are escalated are events and events inherit tags from all levels, items, hosts, triggers, tempaltes etc.. And yes, some other event from same host, that has no specified tags, will be escalated, if maintenance is tag based.

    If you do you system updates on some automated way, I would include an API call in the script/system/whatever that runs it and create maintenance for this host (or group of hosts, I dont know how you do it al) and when finished that same system could remove that maintenance...

    Comment

    • scoopex
      Junior Member
      • Sep 2010
      • 9

      #3
      Originally posted by cyber
      Does not matter. it can also be a item level tag.. Maintenance holds back escalations. Things that are escalated are events and events inherit tags from all levels, items, hosts, triggers, tempaltes etc.. And yes, some other event from same host, that has no specified tags, will be escalated, if maintenance is tag based.
      That sounds good; in that case, a tag at the host level should work just fine.

      Originally posted by cyber
      If you do you system updates on some automated way, I would include an API call in the script/system/whatever that runs it and create maintenance for this host (or group of hosts, I dont know how you do it al) and when finished that same system could remove that maintenance...
      Yes, that was the idea. As I mentioned, the updates take a long time. For this reason, I would only add a host to maintenance once it’s actually being worked on.

      Once a host has been updated, I would then remove it from maintenance again after, say, 30 minutes.
      (presumably using the tags described above at the host level)

      Is it a problem to add hosts to maintenance while maintenance is active and then remove them later?
      Will Zabbix then subsequently display the host as “under maintenance” for the actual maintenance period

      Comment

      Working...