Hi guys,
Running Zabbix 2.0.5 in a CentOS clustered environment with a remote clustered MySQL database (corosync and pacemaker for both clusters). We have about the following:
Number of hosts (monitored/not monitored/templates) 2300 2077 / 66 / 157
Number of items (monitored/disabled/not supported) 356961 240523 / 23373 / 93065
Number of triggers (enabled/disabled)[problem/unknown/ok] 33121 26309 / 6812 [201 / 0 / 26108]
Number of users (online) 158 13
Required server performance, new values per second 1359.51
We have run into a fairly annoying issue.
I've done a bunch of checking as to possible causes, as well as searched online, and I can't seem to find anyone with a similar issue.
Basically, when I have one server or proxy in the ServerActive line of /etc/zabbix/zabbix_agentd.conf, everything works great, no errors in the logs.
However, when I put the server AND a proxy, it starts throwing the common "No active checks on server: host [boweb41.csnzoo.com] not found" error in the agent logs on the host.
The only thing I've found in my extensive troubleshooting, is that for the error to go away, the server or proxy the host is assigned to in the UI needs to be the sole entry for ServerActive.
I've tried:
Changing the order of the 2 entries
Validating the permissions, which are ok: -rw-r--r--. 1 zabbix zabbix 5957 Jan 22 15:53 zabbix_agentd.conf
Declaring the server and/or proxy via dns names instead of IP's
Validating that the "hostname" of each server is indeed what the zabbix server has in the UI
Validating that the host has the correct name listed in the UI, as confirmed by using a zabbix_get against it with system.hostname
Changing all passive checks to active in my templates
The strangest thing is that hosts have no issue reporting data, and that all the checks (both active and passive work).
This is happening for ALL CentOS hosts in our infrastructure (over 1.5k)
Here are the agent logs from both states (hostnames sanitized):
Btw, 10.22.65.20 = proxy, and 10.22.165.100 = primary zabbix server
The host in question is currently associated with the proxy in the UI.
Switching it to the server via the UI, results in the same issue if 2 IP's are in ServerActive.
With: ServerActive=10.22.65.20
10973:20140122:155710.131 Zabbix Agent stopped. Zabbix 2.0.5 (revision 33558).
11524:20140122:155712.337 Starting Zabbix Agent [<hostname>]. Zabbix 2.0.5 (revision 33558).
11530:20140122:155712.381 agent #0 started [collector]
11531:20140122:155712.383 agent #1 started[listener]
11532:20140122:155712.385 agent #2 started[listener]
11533:20140122:155712.387 agent #3 started[listener]
11534:20140122:155712.389 agent #4 started [active checks]
11535:20140122:155712.392 agent #5 started [active checks]
Added back in: ServerActive=10.22.65.20,10.22.165.100
11524:20140122:160755.870 Zabbix Agent stopped. Zabbix 2.0.5 (revision 33558).
13892:20140122:160756.082 Starting Zabbix Agent [<hostname>]. Zabbix 2.0.5 (revision 33558).
13898:20140122:160756.143 agent #0 started [collector]
13899:20140122:160756.148 agent #1 started[listener]
13900:20140122:160756.149 agent #2 started[listener]
13903:20140122:160756.152 agent #5 started [active checks]
13901:20140122:160756.154 agent #3 started[listener]
13902:20140122:160756.151 agent #4 started [active checks]
13903:20140122:160756.160 No active checks on server: host [<hostname>] not found
I've done a lot of searching, and have tried everything I can think of. It's obviously related to having the 2 entries in the file, however the documentation says you can do that. I think this is especially strange, because if you use a proxy, you want to have the ability to flip hosts back to the primary server if you need to do maintenance on the proxy, and unless 2 entries are there, all your active checks will fail when you switch the host back to the server.
Any ideas?
Running Zabbix 2.0.5 in a CentOS clustered environment with a remote clustered MySQL database (corosync and pacemaker for both clusters). We have about the following:
Number of hosts (monitored/not monitored/templates) 2300 2077 / 66 / 157
Number of items (monitored/disabled/not supported) 356961 240523 / 23373 / 93065
Number of triggers (enabled/disabled)[problem/unknown/ok] 33121 26309 / 6812 [201 / 0 / 26108]
Number of users (online) 158 13
Required server performance, new values per second 1359.51
We have run into a fairly annoying issue.
I've done a bunch of checking as to possible causes, as well as searched online, and I can't seem to find anyone with a similar issue.
Basically, when I have one server or proxy in the ServerActive line of /etc/zabbix/zabbix_agentd.conf, everything works great, no errors in the logs.
However, when I put the server AND a proxy, it starts throwing the common "No active checks on server: host [boweb41.csnzoo.com] not found" error in the agent logs on the host.
The only thing I've found in my extensive troubleshooting, is that for the error to go away, the server or proxy the host is assigned to in the UI needs to be the sole entry for ServerActive.
I've tried:
Changing the order of the 2 entries
Validating the permissions, which are ok: -rw-r--r--. 1 zabbix zabbix 5957 Jan 22 15:53 zabbix_agentd.conf
Declaring the server and/or proxy via dns names instead of IP's
Validating that the "hostname" of each server is indeed what the zabbix server has in the UI
Validating that the host has the correct name listed in the UI, as confirmed by using a zabbix_get against it with system.hostname
Changing all passive checks to active in my templates
The strangest thing is that hosts have no issue reporting data, and that all the checks (both active and passive work).
This is happening for ALL CentOS hosts in our infrastructure (over 1.5k)
Here are the agent logs from both states (hostnames sanitized):
Btw, 10.22.65.20 = proxy, and 10.22.165.100 = primary zabbix server
The host in question is currently associated with the proxy in the UI.
Switching it to the server via the UI, results in the same issue if 2 IP's are in ServerActive.
With: ServerActive=10.22.65.20
10973:20140122:155710.131 Zabbix Agent stopped. Zabbix 2.0.5 (revision 33558).
11524:20140122:155712.337 Starting Zabbix Agent [<hostname>]. Zabbix 2.0.5 (revision 33558).
11530:20140122:155712.381 agent #0 started [collector]
11531:20140122:155712.383 agent #1 started[listener]
11532:20140122:155712.385 agent #2 started[listener]
11533:20140122:155712.387 agent #3 started[listener]
11534:20140122:155712.389 agent #4 started [active checks]
11535:20140122:155712.392 agent #5 started [active checks]
Added back in: ServerActive=10.22.65.20,10.22.165.100
11524:20140122:160755.870 Zabbix Agent stopped. Zabbix 2.0.5 (revision 33558).
13892:20140122:160756.082 Starting Zabbix Agent [<hostname>]. Zabbix 2.0.5 (revision 33558).
13898:20140122:160756.143 agent #0 started [collector]
13899:20140122:160756.148 agent #1 started[listener]
13900:20140122:160756.149 agent #2 started[listener]
13903:20140122:160756.152 agent #5 started [active checks]
13901:20140122:160756.154 agent #3 started[listener]
13902:20140122:160756.151 agent #4 started [active checks]
13903:20140122:160756.160 No active checks on server: host [<hostname>] not found
I've done a lot of searching, and have tried everything I can think of. It's obviously related to having the 2 entries in the file, however the documentation says you can do that. I think this is especially strange, because if you use a proxy, you want to have the ability to flip hosts back to the primary server if you need to do maintenance on the proxy, and unless 2 entries are there, all your active checks will fail when you switch the host back to the server.
Any ideas?
ort (or hostname
Great minds think alike!
Comment