Some preliminary information:
[root@centosbase ~]# zabbix_server -V
Zabbix server v2.0.5 (revision 33558) (12 February 2013)
Compilation time: Feb 13 2013 20:16:05
All zabbix_agentd are 2.0+
A few issues I've run into as a moderately new user:
1. If I unlink a template from a host - any warnings that were generated for it previously are still reflected in Monitoring>Overview and do not dissipate on their own.
2. If I update an item prototype is takes some time for this to be reflected in Zabbix Monitoring->Overview->Data
Something like 20 minutes
3. When an agent auto-registers upon the first try the server replies with: cannot send list of active checks to [77.74.76.78]: host [test-host.com] not found. If I restart the agent - it successfully gets a list of items to check for:
12317:20130520:104809.982 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
12318:20130520:104809.983 agent #0 started [collector]
12320:20130520:104809.983 agent #1 started [active checks]
This is just a necessary, and I think intentional lag, that must take place. I just listed in here in case auto-registration could perhaps be made to have the hosts item-checks ready before the initial check.
4. I get errors on many of my active agents like:
13129:20130520:104923.019 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
13130:20130520:104923.019 agent #0 started [collector]
13132:20130520:104923.020 agent #1 started [active checks]
zabbix_agentd [13780]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
zabbix_agentd [3980]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
zabbix_agentd [24719]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
I assume this is normal since three processes/threads are trying to access the same pid file? However a message letting me know this isn't an error per se would be nice (assuming it isn't)
5. I cannot seem to find the interval which zabbix-agent sends/recvs updates though I do understand the process (an agent checking for items with it's hostname - the server responds with a list - agent sends back the results). I would like to set this as small as possible.
I am using the template OS-Linux on all of my hosts - all items and item prototypes and so on have been changed to agent(active). However when I do something intentionally to set off a trigger such as update /etc/passwd, it takes far too long for me to get alerted. If this were a disk that filled up on a production server I'd really need the alert within 30 seconds. A minute at the most (phone calls would be in well before this).
6. On the frontend when I go to graphs, select Eth0 Traffic - and set it to hosts:all - only once host graph actually shows. I was expecting more of a screen like appearance with eth0 traffic for all hosts.
7. I get this one annoying log message every now and then on the zabbix server:
3664:20130519:193201.978 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,status,filter,error,lifetime from items where itemid=23521]
I've read about this problem and increased the memory limit and several things in my my.cnf:
max_connections = 20000
wait_timeout = 28000
max_allowed_packet = 64M
Obviously I don't want that many connections on a 4GB box and it's not solving the problem anyway - any help is appreciated.
I am very new to Zabbix so I assume all of these are personal ignorance. That said I think this is the best monitoring solution by miles. Nagios with it's annoyingly complex nrpe setups to monitor local disks and then the complete lack of easy graphing etc. Your auto-registration has also allowed me deploy Zabbix with Puppet which as been key. If posting all of these at once is a headache my apologies.
[root@centosbase ~]# zabbix_server -V
Zabbix server v2.0.5 (revision 33558) (12 February 2013)
Compilation time: Feb 13 2013 20:16:05
All zabbix_agentd are 2.0+
A few issues I've run into as a moderately new user:
1. If I unlink a template from a host - any warnings that were generated for it previously are still reflected in Monitoring>Overview and do not dissipate on their own.
2. If I update an item prototype is takes some time for this to be reflected in Zabbix Monitoring->Overview->Data
Something like 20 minutes
3. When an agent auto-registers upon the first try the server replies with: cannot send list of active checks to [77.74.76.78]: host [test-host.com] not found. If I restart the agent - it successfully gets a list of items to check for:
12317:20130520:104809.982 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
12318:20130520:104809.983 agent #0 started [collector]
12320:20130520:104809.983 agent #1 started [active checks]
This is just a necessary, and I think intentional lag, that must take place. I just listed in here in case auto-registration could perhaps be made to have the hosts item-checks ready before the initial check.
4. I get errors on many of my active agents like:
13129:20130520:104923.019 Starting Zabbix Agent [test-host.com]. Zabbix 2.0.6 (revision 35158).
13130:20130520:104923.019 agent #0 started [collector]
13132:20130520:104923.020 agent #1 started [active checks]
zabbix_agentd [13780]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
zabbix_agentd [3980]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
zabbix_agentd [24719]: Is this process already running? Could not lock PID file [/var/run/zabbix/zabbix_agentd.pid]: [11] Resource temporarily unavailable
I assume this is normal since three processes/threads are trying to access the same pid file? However a message letting me know this isn't an error per se would be nice (assuming it isn't)
5. I cannot seem to find the interval which zabbix-agent sends/recvs updates though I do understand the process (an agent checking for items with it's hostname - the server responds with a list - agent sends back the results). I would like to set this as small as possible.
I am using the template OS-Linux on all of my hosts - all items and item prototypes and so on have been changed to agent(active). However when I do something intentionally to set off a trigger such as update /etc/passwd, it takes far too long for me to get alerted. If this were a disk that filled up on a production server I'd really need the alert within 30 seconds. A minute at the most (phone calls would be in well before this).
6. On the frontend when I go to graphs, select Eth0 Traffic - and set it to hosts:all - only once host graph actually shows. I was expecting more of a screen like appearance with eth0 traffic for all hosts.
7. I get this one annoying log message every now and then on the zabbix server:
3664:20130519:193201.978 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,status,filter,error,lifetime from items where itemid=23521]
I've read about this problem and increased the memory limit and several things in my my.cnf:
max_connections = 20000
wait_timeout = 28000
max_allowed_packet = 64M
Obviously I don't want that many connections on a 4GB box and it's not solving the problem anyway - any help is appreciated.
I am very new to Zabbix so I assume all of these are personal ignorance. That said I think this is the best monitoring solution by miles. Nagios with it's annoyingly complex nrpe setups to monitor local disks and then the complete lack of easy graphing etc. Your auto-registration has also allowed me deploy Zabbix with Puppet which as been key. If posting all of these at once is a headache my apologies.
Comment