Ad Widget

**tim.mooney** · 13-10-2020, 21:53

Originally posted by kberrien

To provide BASIC dependency operation, I set devices within a building as dependent on the building core switch. During this recent outage, when some buildings lost power and UPS exhausted their batteries a handful of devices alerted even though they were dependent, and other devices did not alert.

Does anyone know how the dependency checks actually work that might explain the behavior I see? I would imagine if a test come up negative (ie, no ping) the software would then IMMEDIATELY check/recheck the status (in this case, ping) of the associated dependent device before triggering an alarm? This then I would think ensure no trigger executes when the dependent device is offline.

Or do checks run in some sort of processing order, and if a device is tested as negative state, it's trigger processes depending on last test on the dependent device (be it earlier or later in the processing order). Then, depending on the processing order and failure timing, some dependent devices would trigger because their dependent device last test WAS not triggered, but in reality during the processing loop, did enter a failure state.

As far as I can tell, it's the 2nd scenario you described, and there is no "recheck the dependencies before treating this as a problem".

It's essentially a race condition. If the dependency is checked first and a problem event is detected, then you're OK. Otherwise, not.

What you can do to reduce alerts from this type of thing is to check your dependencies as frequently as possible, and check the other devices less frequently. Alternately, you can do things either with triggers or with escalations so that either the problem isn't triggered right away (for example, 2 or 3 consecutive checks of a device need to fail before it's treated as problem) or it's detected right away but an alert doesn't go out immediately (alerting happens on later steps with escalations).

These are both just workarounds, though. Having a way to force an immediate recheck of all dependencies would be very nice, but when there are multiple dependencies in a complicated network, it can become complicated.

**Clontarf[X]** · 14-10-2020, 03:34

I am a supporter of the "more than one received check before triggering" workaround, and it's good practice anyway to avoid trigger flapping.

Ad Widget

Dependency Detection Doesn't Always Work?

Dependency Detection Doesn't Always Work?

Comment

Comment