I want to perform operating system updates for, say, 1,000 hosts spread out over 14 days. Since the entire system remains in live operation during this update phase, I do not want to put all 1,000 hosts into maintenance mode at the same time for 14 days, as this would prevent alerts from being triggered for any issues that arise. To roll out the upgrades, I use an automated process that updates a certain number of systems in stages and reintegrates them into the overall system upon completion. I could extend this process to also coordinate maintenance in Zabbix via its API.
Creating a maintenance task for each host individually seems a bit questionable to me.
How could this be coordinated effectively in Zabbix?
Intuitively, I would create a maintenance task with a duration of 14 days and then add individual hosts to the maintenance list for the duration of their upgrade and remove them again.
My questions regarding this:
Another very simple approach could be to deactivate hosts, but in my view, this has the drawback that triggers like
will fire immediately when the host is reactivated.
Creating a maintenance task for each host individually seems a bit questionable to me.
How could this be coordinated effectively in Zabbix?
Intuitively, I would create a maintenance task with a duration of 14 days and then add individual hosts to the maintenance list for the duration of their upgrade and remove them again.
My questions regarding this:
- Would Zabbix still be able to distinguish between downtime caused by outages and downtime caused by maintenance when reviewing, for example, SLA calculations or problem history?
(I noticed that adding and removing hosts in Maintenance is logged in the “Audit Log,” at least—though unfortunately, only the host ID is recorded.) - Does the tag matching described in the documentation refer to the host’s tags or the trigger’s tags?
(For example, one could introduce a tag for hosts that could be used for my purpose:)Code:host_status=in_maintenance|in_production
- What are the technical implications of creating thousands of maintenance periods when having individual maintenances for every host?
(i.e. performance)
Another very simple approach could be to deactivate hosts, but in my view, this has the drawback that triggers like
Code:
notata()