PDA

View Full Version : zabbix_server (1.1 alpha 5) fails to start up


Mark Ramm-Christensen
09-02-2005, 21:34
I am consistently having trouble getting the zabbix_server agent to start.

If the server fails to start, no zabbix_server.pid file seems to have ever been created. The error log shows only standard server startup stuff, and then after starting one of the trapper threads, I get "One child process died. Exiting"

Every time I start the server it fails. But I can always get it to work by trying to start the zabbix_server several times in quick succession.

When I do this I get a zabbix_server.pid file with an incorrect pid -- but everything works.

If it would be helpful I can provide level 4 log files, although they don't seem to have anything particularly useful.

--Mark Ramm

Alexei
09-02-2005, 21:49
I've never heard of such situations.

Do the following:

1. killall zabbix_server
2. remove zabbix_server.pid
3. start zabbix_server

Does the server work?

Mark Ramm-Christensen
09-02-2005, 22:15
I've never heard of such situations.

Do the following:

1. killall zabbix_server
2. remove zabbix_server.pid
3. start zabbix_server

Does the server work?

I've tried this multiple times. When I kill the processes, the pid file is automatically removed. Then I start the server and after a second or so, I check the logs, I get the "one child died" error and everything shuts down. The shutdown shows an error that no zabbix_server.pid file exists.

I verify that the zabbix_server.pid file does not exist, and try restarting the server. When I check the logs again, I get the same thing. A ps -u zabbix shows nothing (I am not running the agent on this machine).

However, if I type /opt/zabbix/zabbix_server two or three times very quickly the zabbix_server.pid file shows up in /tmp and everything runs. However the number contained in the PID file is 20 or 30 below the actual PID of the last zabbix_server process.

I reciently installed fping, and updated the zabbix_server conf file with the location of fping, and a new pid location (to match the init.d configuration script). But I don't see how either of those things could be the cause of this problem.

--Mark



Here is some some information that might be helpful.

This time I tried starting zabbix_server, looking at the log and then tried again two more times.

First the zabbix_server log


003633:20050209:151702 Starting zabbix_server...
003635:20050209:151702 #server 1 started [Alerter]
003636:20050209:151702 server #2 started [nodata() calculator]
003637:20050209:151702 server #3 started [ICMP pinger]
003639:20050209:151702 server #5 started [Trapper]
003640:20050209:151702 server #6 started [Trapper]
003641:20050209:151702 server #7 started [Trapper]
003633:20050209:151702 One child process died. Exiting ...
003633:20050209:151702 Got QUIT or INT or TERM or PIPE signal. Exiting...
003635:20050209:151702 Got QUIT or INT or TERM or PIPE signal. Exiting...
003636:20050209:151702 Got QUIT or INT or TERM or PIPE signal. Exiting...
003637:20050209:151702 Got QUIT or INT or TERM or PIPE signal. Exiting...
003639:20050209:151702 Got QUIT or INT or TERM or PIPE signal. Exiting...
003639:20050209:151702 Cannot remove PID file [/tmp/zabbix_server.pid] [No such file or directory]
003638:20050209:151702 Got QUIT or INT or TERM or PIPE signal. Exiting...
003748:20050209:151750 Starting zabbix_server...
003750:20050209:151750 #server 1 started [Alerter]
003752:20050209:151750 server #2 started [nodata() calculator]
003754:20050209:151750 server #3 started [ICMP pinger]
003757:20050209:151750 server #5 started [Trapper]
003759:20050209:151750 server #6 started [Trapper]
003748:20050209:151750 One child process died. Exiting ...
003750:20050209:151750 Got QUIT or INT or TERM or PIPE signal. Exiting...
003752:20050209:151750 Got QUIT or INT or TERM or PIPE signal. Exiting...
003748:20050209:151750 Got QUIT or INT or TERM or PIPE signal. Exiting...
003754:20050209:151750 Got QUIT or INT or TERM or PIPE signal. Exiting...
003756:20050209:151750 Got QUIT or INT or TERM or PIPE signal. Exiting...
003757:20050209:151750 Got QUIT or INT or TERM or PIPE signal. Exiting...
003757:20050209:151750 Cannot remove PID file [/tmp/zabbix_server.pid] [No such file or directory]
003766:20050209:151755 Starting zabbix_server...
003768:20050209:151755 #server 1 started [Alerter]
003770:20050209:151755 server #2 started [nodata() calculator]
003772:20050209:151755 server #3 started [ICMP pinger]
003776:20050209:151755 server #5 started [Trapper]
003775:20050209:151755 server #4 started [Sucker. SNMP:ON]
003778:20050209:151756 server #6 started [Trapper]
003779:20050209:151755 server #7 started [Trapper]


The zabbix PID file contains just one line:

3766

and the results of a ps -u zabbix are:

PID TTY TIME CMD
3768 ? 00:00:00 zabbix_server
3770 ? 00:00:00 zabbix_server
3772 ? 00:00:00 zabbix_server
3775 ? 00:00:00 zabbix_server
3776 ? 00:00:00 zabbix_server

Kayou
11-02-2005, 09:47
I had this problem once too ad hereis what happened :

zabbix_server process will not run under root user,

You need to make it run under zabbix user for example (just create a user called zabbix or whatever). Then you need to modify the startup script so the process is launched with that user and not with root user.

On my example the startup script had an error in the parameters given to the launcher and by default the process was trying to start under root user wich is not supported by zabbix_server process.

I m running a suse 9.2 and the line is that is launchng the process looks like :

startproc -u zabbix -p ${ZABBIX_PID} ${ZABBIX_BIN}

And in the original file i had the -u option at the end of the line wich couldnt work.

Hope this Help.

Kayou

petkovsc
12-02-2005, 02:25
What does strace/truss report? Make sure zabbix_server is not running. Then run it once as follows:

Linux:
strace -f -o trace.log /opt/zabbix/bin/zabbix_server

BSD/Solaris (if strace isn't installed):
truss -f -o trace.log /opt/zabbix/bin/zabbix_server

Upload or post trace.log. Install strace or truss if your system doesn't have it. Strace is available for all linux distributions.

However, if I type /opt/zabbix/zabbix_server two or three times very quickly the zabbix_server.pid file shows up in /tmp and everything runs. However the number contained in the PID file is 20 or 30 below the actual PID of the last zabbix_server process.

When you do this, each time you are telling zabbix_server to daemonize itself. I don't know if zabbix_server is set to look for itself in memory before initializing. Likely one instance is grabbing the pid file and writing to it. One instance meets the specific conditions you apparently need for it to continue running. However the two instances are not the same.

Alexei
17-02-2005, 22:34
Hi Mark,

I think I found cause of the problem.

Here is quick fix. In file server.c:

// Replace!
//pids=calloc(CONFIG_SUCKERD_FORKS-1,sizeof(pid_t));
pids=calloc(CONFIG_SUCKERD_FORKS+CONFIG_TRAPPERD_F ORKS-1,sizeof(pid_t));

....

// Comment this line
// pids = calloc(CONFIG_TRAPPERD_FORKS, sizeof(pid_t));

Recompile zabbix_server and restart it. Let me know if it works.

Thanks for your report!

Mark Ramm-Christensen
21-02-2005, 16:38
Hi Mark,

I think I found cause of the problem ...Recompile zabbix_server and restart it. Let me know if it works.

Thanks for your report!


So far so good!

--Mark

Lovespider
22-02-2005, 09:51
Works also for me...thank you Alexei.