Maintenance status not in "maintenance"
Trigger value = "PROBLEM"
Trigger severity >= "Average"
Trigger name like "RANDOM_STRING"
Host = "server_in_cluster"
I have an action configured with the above conditions. No recovery message (generally we don't care when there's a recovery). There's one operation - sending an email.
Two things I've noticed - I never received recovery emails (originally had set this up with a recovery email). Are recovery emails tied to the host or the trigger? I'm hoping trigger, as on a host with lots of different triggers, some triggers go to some people, other triggers go to others, and a server can be in a good state but some of the triggers on the server in a bad state. But still no sign of those emails.
The second thing is that I'm getting a TON of messages on repeating basis. The event list shows:
Event list [previous 20]
Time Status Duration Age Ack Actions
14 Nov 2013 11:58:15 OK 20h 46m 15s 20h 46m 15s No
14 Nov 2013 11:43:14 PROBLEM 15m 1s 21h 1m 16s Yes (1) 13
13 Nov 2013 16:30:14 OK 19h 13m 1d 16h 14m Yes (1) 40
13 Nov 2013 16:24:18 PROBLEM 5m 56s 1d 16h 20m Yes (1) 2
13 Nov 2013 16:23:19 OK 59s 1d 16h 21m Yes (1) 2
11 Nov 2013 11:31:44 PROBLEM 2d 4h 51m 3d 21h 12m Yes (1)
SO the problem is "OK" but I'm still getting emails from the event on "14 Nov 2013 11:43:14 PROBLEM 15m 1s 21h 1m 16s Yes (1) 13 " every operation interval. Is there something I'm missing here?
Thanks for any advice/help! This is driving me nuts right now! Here's the expression for the trigger:
{server_in_cluster:rabbitmq[exchange,queue_consumers,queue.name.unknown.errors].max(120)}=0 & {server_in_cluster:rabbitmq[exchange,queue_msgs,queue.name.unknown.errors].max(120)}#0
Trigger value = "PROBLEM"
Trigger severity >= "Average"
Trigger name like "RANDOM_STRING"
Host = "server_in_cluster"
I have an action configured with the above conditions. No recovery message (generally we don't care when there's a recovery). There's one operation - sending an email.
Two things I've noticed - I never received recovery emails (originally had set this up with a recovery email). Are recovery emails tied to the host or the trigger? I'm hoping trigger, as on a host with lots of different triggers, some triggers go to some people, other triggers go to others, and a server can be in a good state but some of the triggers on the server in a bad state. But still no sign of those emails.
The second thing is that I'm getting a TON of messages on repeating basis. The event list shows:
Event list [previous 20]
Time Status Duration Age Ack Actions
14 Nov 2013 11:58:15 OK 20h 46m 15s 20h 46m 15s No
14 Nov 2013 11:43:14 PROBLEM 15m 1s 21h 1m 16s Yes (1) 13
13 Nov 2013 16:30:14 OK 19h 13m 1d 16h 14m Yes (1) 40
13 Nov 2013 16:24:18 PROBLEM 5m 56s 1d 16h 20m Yes (1) 2
13 Nov 2013 16:23:19 OK 59s 1d 16h 21m Yes (1) 2
11 Nov 2013 11:31:44 PROBLEM 2d 4h 51m 3d 21h 12m Yes (1)
SO the problem is "OK" but I'm still getting emails from the event on "14 Nov 2013 11:43:14 PROBLEM 15m 1s 21h 1m 16s Yes (1) 13 " every operation interval. Is there something I'm missing here?
Thanks for any advice/help! This is driving me nuts right now! Here's the expression for the trigger:
{server_in_cluster:rabbitmq[exchange,queue_consumers,queue.name.unknown.errors].max(120)}=0 & {server_in_cluster:rabbitmq[exchange,queue_msgs,queue.name.unknown.errors].max(120)}#0