Ad Widget

Collapse

Service uptime - ZABBIX vs tickets?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dminstrel
    Member
    • Apr 2005
    • 72

    #1

    Service uptime - ZABBIX vs tickets?

    Hello all,

    The Holy Grail in IT monitoring is that nice dashboard where for each Service, you get precise uptime metrics such as Number of Events, Total Downtime, Average Downtime per Event, and so on.

    While ZABBIX can give this automatically, we're struggling on the following areas:

    1) It's Just Too Much or Pareto's Principle: for complex Services, such as distributed financial systems, monitoring everything in ZABBIX can become extremely complex - there's the law of diminishing returns at some point where it just stops being worth it.

    2) False Positives (and negatives): as a consequence of 1), sometimes something goes wrong that is not caught by the monitoring system.

    Thus, could it be that a good practice to measure actual service uptime would be to use a trouble-ticket database?

    ZABBIX Actions can send e-mails to that database indicating an outage is happening. System administrators can then indicate actual downtime duration in the ticket when closing.

    I know I might be stating the obvious here, but is this the way to go?

    I hate having a human element involved in getting uptime metrics but for medium/large environments, is going fully automatic even possible?

    Thanks!

    Jonathan
Working...