PDA

View Full Version : suckerd dying while trying to delete history


mucknet
05-01-2005, 10:48
Howdy --

I have a large zabbix DB (~50GB), whenever I try to delete an items history, I get entries similar to the following in my zabbix_suckerd.log file, then zabbix_suckerd dies:

006020:20050104:231610 Query::insert into history (clock,itemid,value) values (1104909319,18614,72.912990)
006020:20050104:231610 Query failed:Lock wait timeout exceeded; Try restarting transaction [1205]
006020:20050104:231717 Query::insert into history (clock,itemid,value) values (1104909386,18614,72.802160)
006020:20050104:231717 Query failed:Lock wait timeout exceeded; Try restarting transaction [1205]
006020:20050104:231817 Query::insert into history (clock,itemid,value) values (1104909446,18614,72.897020)
006020:20050104:231817 Query failed:Lock wait timeout exceeded; Try restarting transaction [1205]
006018:20050104:231823 Query::insert into history (clock,itemid,value) values (1104909503,18588,2.665790)
006018:20050104:231823 Query failed:Lost connection to MySQL server during query [2013]
006018:20050104:231823 Query::select num,value_min,value_avg,value_max from trends where itemid=18588 and clock=1104908400
006020:20050104:231823 Query::select function,parameter,itemid from functions where itemid=18386 group by 1,2,3 order by 1,2,3
006018:20050104:231823 Query failed:Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111) [2002]
006020:20050104:231823 Query failed:Lost connection to MySQL server during query [2013]
006021:20050104:231823 Query::insert into history (clock,itemid,value) values (1104909503,18567,19.882370)
006021:20050104:231823 Query failed:Lost connection to MySQL server during query [2013]
006021:20050104:231823 Query::select num,value_min,value_avg,value_max from trends where itemid=18567 and clock=1104908400
006021:20050104:231823 Query failed:Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111) [2002]
006009:20050104:231823 One child process died. Exiting ...
006017:20050104:231823 Got QUIT or INT or TERM or PIPE signal. Exiting...
006015:20050104:231823 Got QUIT or INT or TERM or PIPE signal. Exiting...
006019:20050104:231823 Got QUIT or INT or TERM or PIPE signal. Exiting...
006023:20050104:231823 Got QUIT or INT or TERM or PIPE signal. Exiting...
006022:20050104:231823 Got QUIT or INT or TERM or PIPE signal. Exiting...
006016:20050104:231823 Got QUIT or INT or TERM or PIPE signal. Exiting...


This happens when I try to delete an individual item for a host, or when I try to delete a whole host.

I store items every 10 seconds for about the last 6 months.

any ideas or suggestions?

Thanks! :)

Alexei
05-01-2005, 12:38
Interesting problem! I'm pretty sure you're using MySQL InnoDB database.

006020:20050104:231610 Query::insert into history (clock,itemid,value) values (1104909319,18614,72.912990)
006020:20050104:231610 Query failed:Lock wait timeout exceeded; Try restarting transaction [1205]

ZABBIX suckerd process (PID=6020) tries to insert a record into history but fails because of timeout (the table is locked, obviously). ZABBIX ignores the error, I'm not sure if it is ok.

006020:20050104:231823 Query::select function,parameter,itemid from functions where itemid=18386 group by 1,2,3 order by 1,2,3
006020:20050104:231823 Query failed:Lost connection to MySQL server during query [2013]

Then, we loose MySQL connection. It seems that MySQL dropped the connection. Why? I have no idea. Bug on MySQL side?

Anyway, when deleting a host, ZABBIX v1.0 housekeeper tries to delete all data from table history at once. This is not most efficient way. I plan to improve it in 1.1, however, I still have no clear vision how to do it most efficiently.

Several workarounds exist:

1. Convert 50GB database to MyISAM :rolleyes:
2. Purge the data manually from the history
3. Don't delete hosts. Let them be in Unreachable status

festivalman
11-05-2005, 16:33
Hi, I'm running into the same problem and would like to go with the "Delete old records manually" route. I have a db with many hosts that I had set to keep a 2 year history of. This has grown too large and now gives me the locking timeout problem. I've set all of the history times on the hosts to 6 months to solve the problem, but I need to manually erase the histories that are older than this to fix the problem so the cleanup starts working again. Can you list here exactly what I should be deleting or what query/function would accomplish this? Any help would be appreciated. Thanks.

festivalman
25-05-2005, 16:33
*bump* Anyone have any idea on this one? Thanks.