PDA

View Full Version : Zabbix_suckerd randomly stops


thesaintjim
23-11-2004, 20:55
Is there a log or anything that shows what the last thing done which killed the suckerd program? It just randomly stops..Any idea? I noticed this is happening when I add hosts.

thesaintjim
23-11-2004, 21:02
Well, think I solved that one. I noticed if you turn the monitor to unmonitored it keeps suckerd running even though I am running a simple check. once you add the host, turn it to monitor. I know you had to do this for an agent installed on a client, but wasn't sure for a simple http check, etc.

All is good again in the land of no idea.

thesaintjim
23-11-2004, 21:09
Ok, scratch the above post. It still shut down..

Alexei
23-11-2004, 21:23
Anything in Logfile?

thesaintjim
23-11-2004, 21:36
Zabbix creates a log? which directory, didn't know it did

I have checked
/var/log/daemon.log
/var/log/dmesg
/var/log/messages

thesaintjim
23-11-2004, 21:49
well, i'll tell you this. I started suckerd again, went to the restroom, came back and it killed itself or something

Alexei
23-11-2004, 22:12
Check /etc/zabbix/zabbix_suckerd.conf, parameter Logfile. Also, pay attention to DebugLevel.

thesaintjim
23-11-2004, 22:45
that is using debug level 2 above
002640:20041123:112627 Got QUIT or INT or TERM or PIPE signal. Exiting...
002642:20041123:112627 Got QUIT or INT or TERM or PIPE signal. Exiting...
002643:20041123:112627 Got QUIT or INT or TERM or PIPE signal. Exiting...
~


this is now using debug level 3
002695:20041123:112742 Starting zabbix_suckerd...
002697:20041123:112742 zabbix_suckerd #1 started [Alerter]
002698:20041123:112742 zabbix_suckerd #2 started [nodata() calculator]
002699:20041123:112742 zabbix_suckerd #3 started [ICMP pinger]
002695:20041123:112742 zabbix_suckerd #0 started [Housekeeper]
002705:20041123:112742 zabbix_suckerd #4 started [Sucker. SNMP:ON]
002695:20041123:113007 One child process died. Exiting ...
002697:20041123:113007 Got QUIT or INT or TERM or PIPE signal. Exiting...
002698:20041123:113007 Got QUIT or INT or TERM or PIPE signal. Exiting...
002699:20041123:113007 Got QUIT or INT or TERM or PIPE signal. Exiting...


waiting on it to shut off again..im sure it will when I return home from work. I'll show you debug 4 (just what is causing it to die)

thesaintjim
23-11-2004, 22:50
Nothing unusual..this is the last part of the log before it died

[}]
002965:20041123:114435 Macro:JimsComp:diskfree[c:].last(0)
002965:20041123:114435 Before find_char:JimsComp:diskfree[c:].last(0)[:]
002965:20041123:114435 Host:JimsComp
002965:20041123:114435 Before find_char:diskfree[c:].last(0)[.]
002965:20041123:114435 Key:diskfree[c:]
002965:20041123:114435 Before find_char:last(0)[(]
002965:20041123:114435 Function:last
002965:20041123:114435 Before find_char:0)[)]
002965:20041123:114435 Parameter:0
002965:20041123:114435 In get_lastvalue()
002965:20041123:114435 Executing query:select i.itemid,i.prevvalue,i.lastvalue,i.value_type,i.mu ltiplier,i.units from items i,hosts h where h.host='JimsComp' and h.hostid=i.hostid and i.key_='diskfree[c:]'
002965:20041123:114435 In DBnum_rows
002965:20041123:114435 Result of DBnum_rows [1]
002965:20041123:114435 Itemid:17210
002965:20041123:114435 Before evaluate_FUNCTION()
002965:20041123:114435 Function [last]
002965:20041123:114435 In evaluate_FUNCTION() 1
002965:20041123:114435 In evaluate_FUNCTION() 2
002958:20041123:114435 One child process died. Exiting ...
002960:20041123:114435 Got QUIT or INT or TERM or PIPE signal. Exiting...
002961:20041123:114435 Got QUIT or INT or TERM or PIPE signal. Exiting...
002964:20041123:114435 Got QUIT or INT or TERM or PIPE signal. Exiting...

thesaintjim
23-11-2004, 23:49
I took down the windows agent being monitored and now zabbix_suckerd doesnt shut off...more tests soon

thesaintjim
25-11-2004, 19:23
002958:20041123:114435 One child process died. Exiting ...

Any idea what is causing it to die?

Alexei
29-11-2004, 10:38
No idea so far. I'll try to reproduce this problem.

obstler
30-11-2004, 12:05
I think this might have something to do with triggers and email alerts. 1.1a2 was running fine for the whole time since it was released, until I added a trigger with email alert today. Now the same happens here, suckerd just exits after one child process has died.

As soon as I disable the trigger/emailalert suckerd runs fine again. I tested with both localhost as smtp server as well as a remote smtp.

and no email was ever sent, regardless of smtp configuration.. so either it dies before it even gets to sending the email, or somewhere within the email code.

I hope this helps you narrow down and solve the problem, as email alerts are somewhat critical to the whole network monitoring enterprise ;)

update2: when I remove the email action from the trigger suckerd keeps on running fine, so at least in my case it's linked to the email action for the trigger.

with the email action this is the output from the log:


032060:20041130:112450 Before find_char:last(0)[(]
032060:20041130:112450 Function:last
032060:20041130:112450 Before find_char:0)[)]
032060:20041130:112450 Parameter:0
032060:20041130:112450 In get_lastvalue()
032060:20041130:112450 Executing query:select i.itemid,i.prevvalue,i.lastvalue,i.value_type,i.mu ltiplier,i.units from items i,hosts h where h.host='EVK_Router' and h.hostid=i.hostid and i.key_='icmppingsec'
032060:20041130:112450 In DBnum_rows
032060:20041130:112450 Result of DBnum_rows [1]
032060:20041130:112450 Itemid:17215
032060:20041130:112450 Before evaluate_FUNCTION()
032060:20041130:112450 Function [last]
032060:20041130:112450 In evaluate_FUNCTION() 1
032060:20041130:112450 In evaluate_FUNCTION() 2
032054:20041130:112450 One child process died. Exiting ...
032056:20041130:112450 Got QUIT or INT or TERM or PIPE signal. Exiting...
032058:20041130:112450 Got QUIT or INT or TERM or PIPE signal. Exiting...
032062:20041130:112450 Got QUIT or INT or TERM or PIPE signal. Exiting...


the trigger is a simple check for icmppingsec being greater than a set value.

tom.

Alexei
30-11-2004, 15:28
I've added more debug printing. Please, get the latest include/functions.c from CVS and recompile everything.

Then run ZABBIX. If it crashes post debug output from LogFile here.

Thanks!

obstler
30-11-2004, 16:49
see email for the logs and further debug info.

obstler
01-12-2004, 11:45
any idea yet what might be the problem from the log and info I sent you? or are there further tests you need done?

Alexei
01-12-2004, 12:00
Thanks for the log files! I've managed to replicate this problem on my test system. It's fixed now.

I plan to release 1.1alpha3 next Monday, the fix will be included.

thesaintjim
02-12-2004, 05:59
Great news :)

thesaintjim
02-12-2004, 22:35
Now, as the other guy said he removed the email trigger and his works, but I did the same and it still dies. I will wait and try the new one and see if that fixes my problem too..

obstler
06-12-2004, 15:33
Thanks for the log files! I've managed to replicate this problem on my test system. It's fixed now.

I plan to release 1.1alpha3 next Monday, the fix will be included.

What's the status on 1.1a3? We're all eagerly awaiting the release and ready to test the new bugfixes!

regards.

Alexei
06-12-2004, 16:08
I'm not sure if it worth releasing 1.1alpha3 now with bug fixes only. I'd like to put new features (such as hard-linked templates or escalation; maybe something else) into 1.1alpha3 as well. This may take another 3-7 days.

thesaintjim
06-12-2004, 21:37
well, let's do it :) I got present this to my CEO on why we should use this, LOL.

obstler
13-12-2004, 10:52
I'm not sure if it worth releasing 1.1alpha3 now with bug fixes only. I'd like to put new features (such as hard-linked templates or escalation; maybe something else) into 1.1alpha3 as well. This may take another 3-7 days.

Alexei,

another week, and we're eagerly awaiting the next release. I think many of us just need the known bugfixes right now, and can wait for the new features.

Time until christmas is running short, and I know that at least I do need/want the trigger stuff implemented before then.

regards, tom.

Alexei
13-12-2004, 16:45
another week, and we're eagerly awaiting the next release. I think many of us just need the known bugfixes right now, and can wait for the new features.

Time until christmas is running short, and I know that at least I do need/want the trigger stuff implemented before then. Please, expect 1.1alpha3 release very soon, maybe even today, if Santa Claus helps me of course.

thesaintjim
22-12-2004, 23:10
I think Santa and all his elves helped you :)

whoever
08-01-2005, 19:48
1.1aplha4

zabbix_server stops if action for trigger contains macro

022151:20050108:192830 Macro:switch-10:2.last(0)
022151:20050108:192830 Before find_char:switch-10:2.last(0)[:]
022151:20050108:192830 Host:switch-10
022151:20050108:192830 Key:2
022151:20050108:192830 Before find_char:last(0)[(]
022151:20050108:192830 Function:last
022151:20050108:192830 Before find_char:0)[)]
022151:20050108:192830 Parameter:0
022151:20050108:192830 In get_lastvalue()
022151:20050108:192830 Executing query:select i.itemid,i.prevvalue,i.lastvalue,i.value_type,i.mu ltiplier,i.units from items i,hosts h where h.host='switch-1' and h.hostid=i.hostid and i.key_='2'
022151:20050108:192830 In DBnum_rows
022151:20050108:192830 Result of DBnum_rows [1]
022151:20050108:192830 Itemid:20054
022151:20050108:192830 Before evaluate_FUNCTION()
022151:20050108:192830 In evaluate_FUNCTION() Function [last] flag [1]
022151:20050108:192830 In evaluate_FUNCTION() 1
022151:20050108:192830 In evaluate_FUNCTION() 2 value [3372425.090909]
022151:20050108:192830 In evaluate_FUNCTION() pre-7
022145:20050108:192830 One child process died. Exiting ...

mysql> select i.itemid,i.prevvalue,i.lastvalue,i.value_type,i.mu ltiplier,i.units from items i,hosts
-> h where h.host='switch-10' and h.hostid=i.hostid and i.key_='2';
+--------+-----------------+-----------------+------------+------------+--------+
| itemid | prevvalue | lastvalue | value_type | multiplier | units |
+--------+-----------------+-----------------+------------+------------+--------+
| 20054 | 40291984.258065 | 35184466.105263 | 0 | 1 | bits/s |
+--------+-----------------+-----------------+------------+------------+--------+

Item gets values from switch through SNMP, custom multiplier 8 (to get bits/s instead of bytes/s).

Alexei
08-01-2005, 23:11
Are you sure you have 1.1alpha4 running? Please, double check! Thanks.

whoever
09-01-2005, 11:33
root@:~# md5sum zabbix-1.1alpha4.tar.gz
fc9cc13158b1b0c7ac1b620c9666e8fc zabbix-1.1alpha4.tar.gz

Alexei
09-01-2005, 11:51
I shouldn't have asked my question, I just noticed that you're talking about zabbix_server which obviously was part of 1.1alpha4. Sorry for the silly question.

Alexei
09-01-2005, 21:17
1.1aplha4

zabbix_server stops if action for trigger contains macro Fixed. The fix will be available in ZABBIX 1.1alpha5. Thanks for reporting this issue!

Alexei
10-01-2005, 11:05
I'd like to add that the problem happens only in case if a notification contains a macro referring an item having custom multiplier defined.

Tarball
05-02-2005, 01:20
I'm having a similar problem on a fresh install -- never used zabbix before. This is 1.1 alpha 5 downloaded from the website -- not a cvs copy.
Mysql/php parts are all configured, seem to work fine.

I try to start the zabbix server, and I get this:

031606:20050204:155423 Starting zabbix_server...
031608:20050204:155423 #server 1 started [Alerter]
031609:20050204:155423 server #2 started [nodata() calculator]
031611:20050204:155423 server #3 started [ICMP pinger]
031615:20050204:155423 server #5 started [Trapper]
031616:20050204:155423 server #6 started [Trapper]
031606:20050204:155423 One child process died. Exiting ...
031606:20050204:155423 Got QUIT or INT or TERM or PIPE signal. Exiting...
031608:20050204:155423 Got QUIT or INT or TERM or PIPE signal. Exiting...
031609:20050204:155423 Got QUIT or INT or TERM or PIPE signal. Exiting...
031611:20050204:155423 Got QUIT or INT or TERM or PIPE signal. Exiting...
031613:20050204:155423 Got QUIT or INT or TERM or PIPE signal. Exiting...
031615:20050204:155423 Got QUIT or INT or TERM or PIPE signal. Exiting...
031615:20050204:155423 Cannot remove PID file [/var/tmp/zabbix_server.pid] [No such file or directory]

I tried turning the debug level to 4, but it didn't seem to provide any additional useful information. I haven't modified anything other than some user permissions and attempted to add a host through the web interface.

Any thoughts?

Rick LEE
07-04-2005, 12:33
I've run zabbix on FreeBSD 4.10 installed with package. Some day ,I think when MySQL database size was reached almost 1GB, suckerd's one child process died. I've change many thing in suckerd.conf file however I still have this problem. I usually see bellow message when the process died. :confused:

================================================== ================================================== =======
033569:20050407:091831 zabbix_suckerd #9 started [Sucker. SNMP:ON]
033568:20050407:110825 Query::select i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.nextcheck,i.type,i.snmp_community,i
.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.pre vvalue,i.hostid,h.status,i.value_type,h.network_er rors,i.snmp_port,i.delta,i
.prevorgvalue,i.lastclock from items i,hosts h where i.nextcheck<=1112839705 and i.status=0 and i.type not in (2) and (h.statu
s=0 or (h.status=2 and h.disable_until<=1112839705)) and h.hostid=i.hostid and i.itemid%6=4 and i.key_<>'status' and i.key_<>'
icmpping' and i.key_<>'icmppingsec' order by i.nextcheck
033568:20050407:110825 Query failed:Lost connection to MySQL server during query [2013]
033560:20050407:110825 One child process died. Exiting ...
033561:20050407:110825 Got QUIT or INT or TERM or PIPE signal. Exiting...
033562:20050407:110825 Got QUIT or INT or TERM or PIPE signal. Exiting...

klavs
09-04-2005, 20:04
I've posted a bug about zabbix_serverd (and probably also suckerd from 1.0 release) dieing, when mysql is restarted. Perhaps this is what happens here?