Ad Widget
Collapse
Event Log Monitoring Syntax
Collapse
X
-
-
There is a way to configure an item obtaining only the events generated by a predefined source? Something like eventlog[application].source[NTBackup]?
If nothing like that already exists, it would be a nice thing to include in the next release. I hope developers will read this.
cheersComment
-
Information on wiki for eventlog monitoring is sort of confusing, especially with usage of nodata, which I consider not being reliable (as there could be situations when initial message that we track results in flow of other errors for considerable time), and being bad for monitoring purposes (I need a trigger to stay ON until situation clears).
So, here's how we do it. First, all this is about a situation when certain data being logged when anomaly starts (e.g. RAID status changes from healthy, UPS goes on battery, etc), and later different message is logged when anomaly ends (RAID changes back to healthy, etc). We want a trigger to stay ON during an entire anomaly period.
For that matter, we have to write a condition which becomes TRUE when bad message appears in the log:
(({host:eventlog[Application].logsource(ServiceName)}=1)&({host:eventlog[Application].str(something BAD happened)}=1))
this will fire a trigger whenever service named ServiceName sends to Application log an event containing words "something BAD happened" in its description.
Next, we need to negate that expression (as there's no negation operator in 1.4.5 as far as I can tell, and that's bad) and replace "something BAD happened" with "something BAD has ended". In effect, such a condition will become FALSE *ONLY* when we encounter in the log a string "something BAD has ended", and otherwise will remain TRUE, regardless of what other services or even this particular service will send to eventlog - this keeps the trigger ON until the situation is resolved. So the second expression would be:
(({host:eventlog[Application].logsource(ServiceName)}#1)|({host:eventlog[Application].str(something BAD has ENDED)}#1))
Note that this expression is negated according to rules of boolean logic.
Now, we need to combine the second expression with a state of the trigger itself, so that this expression would not fire the trigger by itself whenever something else is being logged into the eventlog:
(({TRIGGER.VALUE}=1)&(({host:eventlog[Application].logsource(ServiceName)}#1)|({host:eventlog[Application].str(something BAD has ENDED)}#1)))
This new expression becomes TRUE when trigger is in ON state and any non-relevant message is being logged to eventlog, and FALSE when trigger is ON and relevant clearing message is logged; it will also be FALSE when trigger is OFF.
And finally, we need to combine the first expression with the last one using logical OR operator:
((({host:eventlog[Application].logsource(ServiceName)}=1)&({host:eventlog[Application].str(something BAD has happened)}=1))|(({TRIGGER.VALUE}=1)&(({host:eventl og[Application].logsource(ServiceName)}#1)|({host:eventlog[Application].str(something BAD has ENDED)}#1))))
Now, if you try and use this expression, it will default to UNKNOWN state until something is logged to Application eventlog that matches your criteria of severity specified in Items for this eventlog. You can set a trigger to OFF state just by logging any message with desired severity into eventlog using eventcreate utility available in W2K3 server.
However, suppose your manager has a paranoid behavior of watching eventlogs for errors and subtracting 1K from your salary for each Error posted there regardless of what they mean. We'll leave alone the fact that you should be then bankrupt by now, and just assume that you cannot afford to log any fake event and need trigger to be in OFF state 30 seconds after its creation.
To deal with this situation we need to devise a condition that becomes FALSE when trigger state is UNKNOWN (2) and there's no data for 30 seconds, being TRUE otherwise.
Such a condition is easy to construct, its going to look like this:
(({TRIGGER.VALUE}#2)|({host:eventlog[Application].nodata(30)}#1))
We will then combine such a condition with previous version of our expression using logical AND operator; as the result we'll get TRUE (trigger firing) when final expression returns TRUE, and result will be FALSE (trigger falling) when final expressions returns FALSE and/or when trigger state is UNKNOWN and there's no data for 30 seconds.
So, given the above final version of the trigger is going to look like this:
((({TRIGGER.VALUE}#2)|({host:eventlog[Application].nodata(30)}#1))&((({host:eventlog[Application].logsource(ServiceName)}=1)&({host:eventlog[Application].str(something BAD has happened)}=1))|(({TRIGGER.VALUE}=1)&(({host:eventl og[Application].logsource(ServiceName)}#1)|({host:eventlog[Application].str(something BAD has ENDED)}#1)))))
Well, I know it looks horrible, but it is how things actually work for me
Comment
-
Thanks Salmon for the time and effort invested in writing this howto.
I have tried to implement it to track event log entries for Exchange Database size, I have one question:
What should I put in instead of TRIGGER.VALUE ?
My two first keys look like this:
(({hostname:eventlog[Application].logsource(MSExchangeIS Mailbox Store)}=1)&({hostname:eventlog[Application].str(approaching the size limit of 75 GB)}=1))
(({hostname:eventlog[Application].logsource(MSExchangeIS Mailbox Store)}#1)|({hostname:eventlog[Application].str(is 70 GB)}#1))
Zabbix eccept both with no complaints, then I have moved to the third key in the howto (I have followed them one by one to see how they are eccepted)
(({TRIGGER.VALUE}=1)&(({hostname:eventlog[Application].logsource(MSExchangeIS Mailbox Store)}#1)|({hostname:eventlog[Application].str(is 70 GB)}#1))
When I replace TRIGGER.VALUE with TRUE or with FALSE or just leave it as it is I get an error "Incorrect trigger expression. [( ... ]"
Any help will be appreciatedComment
Comment