Ad Widget

Collapse

Alerting if all hosts in a group are unreachable

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • davux
    Junior Member
    • Mar 2015
    • 4

    #1

    Alerting if all hosts in a group are unreachable

    Hello,

    Some services on my network are HA (Highly Available), meaning that a few hosts down within a same cluster is not a problem, as long as at least one host stays up.

    As a consequence, I would like to get alerted with a Disaster severity when all hosts in a given cluster (i.e. host group, probably?) become unreachable. How can I do that?

    Ideally, I would even be alerted gradually, as availability decreases. For example:
    • 50% availability: Average
    • 10% availability: High
    • 0% availability: Disaster


    I tried to define an item as such:
    Code:
    grpsum["My HA service","agent.ping",last,0]
    but unfortunately, when a host becomes unavailable, there's no value (nodata), as opposed to a value of 0... which means "last" is still 1 for all hosts, and so the sum stays constant.

    It sounded easy at first, but it's turning out to be quite tricky! Any idea?
  • ingus.vilnis
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Mar 2014
    • 908

    #2
    Hi,

    Now the following steps will work only if you have at least one passive Zabbix agent item on hosts. If you have active items only then it will fail.

    You can check the agent availability with two items.
    agent.ping - which returns 1 when agent runs and nothing when it doesn't
    but you can also use
    zabbix[host,agent,available] - this item (type: Zabbix internal) will return 1 when agent is available and 0 each update interval when the agent is not available for any reason.


    This item then can be used in your aggregated item.
    Code:
    grpsum["My HA service","zabbix[host,agent,available]",last,0]
    Give it a try, check the difference between those both items and maybe it helps.

    Best Regards,
    Ingus

    Comment

    • davux
      Junior Member
      • Mar 2015
      • 4

      #3
      Originally posted by ingus.vilnis
      zabbix[host,agent,available] - this item (type: Zabbix internal) will return 1 when agent is available and 0 each update interval when the agent is not available for any reason.


      This item then can be used in your aggregated item.
      Code:
      grpsum["My HA service","zabbix[host,agent,available]",last,0]
      Thanks a lot, Ingus. It works. I had to add the item zabbix[host,agent,available] to all the hosts I want to monitor (I added it to a template where all the hosts are, so it was easy). Once that was done, I could refer to that check in the aggregate check.

      Thank you again!

      Comment

      Working...