PDA

View Full Version : Question about web.page.regexp and timeouts


danrog
02-03-2010, 11:12
I have a (hopefully quick) question about web.page.regexp and timeouts. I have a trigger that looks like this.


item interval = 120 secs
{server:web.page.regexp[server,/path/monitor.display,8080,"Smoke test = success"].count(#3,EOF)}=3
|
{server:web.page.regexp[server,/path/monitor.display,8080,"Smoke test = success"].nodata(360)}=1

If the check times out, we don't get data (as expected, hence the .nodata trigger), however, the .nodata trigger clears if the server starts sending data even if its responding with EOF (which is still a failure). This results in false positives. We need to have the trigger stay TRUE if .nodata, then when data comes back but failed. (I've played around with nodata at different time intervals but it doesn't fix this situation)

Here is a timeline
T1. Nodata seen for web.page.regexp
T2. Trigger fires on .nodata clause
T3. web.page.regexp returns data; value = EOF
T4. Nodata clears, all actions cleared, escalation clears
T5. web.page.regexp still returns EOF
T6. web.page.regexp still returns EOF
T7. Trigger fires on .web.page.regexp clause, alerts sent
T8. people get escalations and fix it
T9. All triggers clear, we are happy

My issue is at T4 and the time to T7, even though the trigger clears, its not really fixed. I think changing .count(#3,EOF)}=3 to .count(#1,EOF)}=1 will do it, but this also could introduce some false positives depending on the load of the server or quick restarts of the app for whatever reason. Is there anything else I can try?

bashman
14-04-2010, 11:04
I recommend to use the function "nodata" only with "agent.ping".

danrog
14-04-2010, 12:45
We've actually had a lot of different scenarios where .nodata is working quiet well (especially for web.page.regex). Sometimes a single web server (behind a load balanced virtual IP) is not responding to web.page.regex's (timing out) and does not return any data. So the last or count function never fires because technically the last value seen by zabbix is the correct value. I've only been able to detect this with .nodata.

Why do you recommend this? Have you seen issues with it?