Ad Widget

**Petya** · 23-01-2008, 18:13

Do you mean "Zabbix Agent (active)" when saying
"zabbix_agentd with active checks"?

If yes then I'm another one who have similar problem,
(I don't have such problem when items are of type "Zabbix Agent").

Try changing item types (you can use "Mass update" button) --
this works well when you have not many hosts (and it's the default actually).

Also there's similar issue here:

We’ll be back soon!

http://www.zabbix.com/forum/showthread.php?t=8718

**pascalp** · 24-01-2008, 16:11

Originally posted by Petya

Do you mean "Zabbix Agent (active)" when saying
"zabbix_agentd with active checks"?

If yes then I'm another one who have similar problem,
(I don't have such problem when items are of type "Zabbix Agent").

Try changing item types (you can use "Mass update" button) --
this works well when you have not many hosts (and it's the default actually).

Exactly, I can't use passive checks because all my servers are running behind routers for which I'm not responsible of the maintenance.

Originally posted by Petya

Also there's similar issue here:
http://www.zabbix.com/forum/showthread.php?t=8718

but in fact, my zabbix_server.log in /tmp is filled with the statement
"(...) Error while sending list of active checks"
And this message can be found in the place in the source code where the patch described in the thread http://www.zabbix.com/forum/showthread.php?t=8703 (this thread is mentionned in your link) is applied. Does the patch to fix the average load problem could cause this problem? I'm absolutely not sure because my server is throwing these messages already when it's still collecting data..

Regards,
Pascal

**torti-** · 17-03-2008, 12:30

this is exactly the situation I have - did someone already solve this?

**xs-** · 17-03-2008, 13:37

I believe this is fixed in 1.4.5-pre (1.4.4 nightly build, on website -> developer)

**Alexei** · 17-03-2008, 21:30

It is fixed in pre 1.4.5.

**torti-** · 18-03-2008, 09:54

Well if the mentioned archive is http://www.zabbix.com/downloads/nigh...bix-1.4.tar.gz then it is not fixed

zabbix_server still stops responding (and collecting data) without an error. the only thin I can see in the logs is:

this is like I guess it should look when the server process is still ok:

Code:

  2016:20080228:223009 In process_httptests()
  2016:20080228:223009 Query [select httptestid,name,applicationid,nextcheck,status,delay,macros,agent from httptest where status=0 and nextcheck<=1204234209 and mod(httptestid,5)=2 and  httptestid>=100000000000000*0 and httptestid<=(100000000000000*0+99999999999999) ]
  2016:20080228:223009 End process_httptests()
  2016:20080228:223009 Spent 0 seconds while processing HTTP tests
  2016:20080228:223009 Query [select count(*),min(nextcheck) from httptest t where t.status=0 and mod(t.httptestid,5)=2 and  t.httptestid>=100000000000000*0 and t.httptestid<=(100000000000000*0+99999999999999) ]
  2016:20080228:223009 Nextcheck:1204234259 Time:1204234209
  2016:20080228:223009 Sleeping for 5 seconds

and this is what I get when the server process hangs:

Code:

  2015:20080228:223009 In process_httptests()
  2015:20080228:223009 Query [select httptestid,name,applicationid,nextcheck,status,delay,macros,agent from httptest where status=0 and nextcheck<=1204234209 and mod(httptestid,5)=1 and  httptestid>=100000000000000*0 and httptestid<=(100000000000000*0+99999999999999) ]
  2015:20080228:223009 End process_httptests()
  2015:20080228:223009 Spent 0 seconds while processing HTTP tests
  2015:20080228:223009 Query [select count(*),min(nextcheck) from httptest t where t.status=0 and mod(t.httptestid,5)=1 and  t.httptestid>=100000000000000*0 and t.httptestid<=(100000000000000*0+99999999999999) ]
  2015:20080228:223009 No httptests to process in get_minnextcheck.
  2015:20080228:223009 Nextcheck:-1 Time:1204234209
  2015:20080228:223009 Sleeping for 5 seconds

**xs-** · 18-03-2008, 12:14

Heh, well yesterday we had a similar thing again.
It very much looked like the problems we had before (trapper not receiving data) but this time the load was 0, no zabbix threads going haywire.

After not finding anything to blame, we restarted zabbix_server (master node in a distributed setup) and all was well again.
Shortly after that we saw one of the distributed nodes had its zabbix_server stopped (connection to db lost, local database, didnt stop). After inspection we saw it had stopped around the same time the master node stopped receiving data.

Maybe this is related, maybe not. worth looking into tho.
It might be possible the trapper part of zabbix can experience problems when another server node dies during a send or action (or vice versa).

-- Edit
We are running 1.4.5-pre on the main node, 1.4.4 on the child nodes

**torti-** · 18-03-2008, 12:38

hm you might be right, that the problem is in the db-connection-part of zabbix.

I am currently not running a distributed setup of zabbix_server, so I don't think, that it is a problem related to multiple servers.

**bbrendon** · 19-03-2008, 06:39

We’ll be back soon!

http://www.zabbix.com/forum/showthread.php?p=31732#post31732

Seems to be related to the mysql server being very busy, which seems to sometimes be caused by the web monitoring, which I don't use in production so I delete all web monitoring.

We'll see if things improve. My zabbix has been down for the past week because of this.

**torti-** · 19-03-2008, 14:52

I have thought about that too and disabling web monitoring didn't help at all. I tried various 1.4.* versions including developer pre-1.4.5 from monday

actually the problem raised, when I started using active agents.

This is a major issue for me because at this point zabbix isn't useful at all if you need to use active agents and the zabbix_server process has stability issues

PLEASE fix this as soon as possible alexei

**bbrendon** · 19-03-2008, 18:41

Originally posted by torti-

I have thought about that too and disabling web monitoring didn't help at all. I tried various 1.4.* versions including developer pre-1.4.5 from monday

actually the problem raised, when I started using active agents.

This is a major issue for me because at this point zabbix isn't useful at all if you need to use active agents and the zabbix_server process has stability issues

PLEASE fix this as soon as possible alexei

FYI:
- My zabbix seems to die between 3:50 AM and 4:10 AM (almost every night, but not quite)
- I only use active agents.
- I'm running 1.4.4 with the load patch
- I disabled web monitoring last night
- I looked at the mysql-slow logs and it seems that the problem is related to a busy mysql server
- Non-active agent related items appear to get data, while active agents don't. 90% of my system are active agent agent items though.
- I updated SNMP to 5.4.1 hoping it was SNMP lib related, recompiled, and no change
- server didn't stop recording data last night. We'll see how long it lasts...

Thats about it here.

**bbrendon** · 19-03-2008, 20:01

Okay. I have a fix! ...You'll love it, I swear!

Code:

# tail -2 crontab
# disable zabbix actions before zabbix_server breaks at 4 AM
22 1 * * * root mysql --user=zabbix --password=mypass zabbix -e "update actions set status = 1"

**torti-** · 20-03-2008, 11:13

well that is not my definition of a 'fix'

last night it broke at 22:05 or so. Restarting the server process works fine but this is no solution for serious use of a program.

I have attached the server logfile with debuglevel 4. maybe someone more familiar with zabbix might look over it?

I'm not really sure that the server breaks everytime at the same time...

thanks,
torti-

ps:
please increase the maximal size of the zip attachment - Your file of 262.5 KB bytes exceeds the forum's limit of 97.7 KB for this filetype.
I have renamed the archive for now to .c

Attached Files

zabbix_server.zip.c (262.5 KB, 181 views)

**bbrendon** · 20-03-2008, 18:18

I have narrowed it down to plain old busy server. It doesn't appear to have anything to do with mysql. Mysql just has long queries because the server gets very busy, causing zabbix to malfunction.

Ad Widget

[1.4.4] zabbix_server doesn't crash, but no longer collects data

[1.4.4] zabbix_server doesn't crash, but no longer collects data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment