Ad Widget

**Alexei** · 21-08-2008, 13:20

We experience seldom deadlocks on MySQL as well. No ideas why this happens so far, still investigating...

**theologu** · 22-08-2008, 09:46

I am constantly trying to solve deadlocks problem for three days now.... I am really out of ideas and under these condintions zabbix 1.5.4 is unusable, it dies at random intervals from minutes to tens of minutes.

No. of hosts monitored: 72
No. of active items: 1298
No. of triggers: 544

Server is IBM, CPU Xeon E5320 quad core, 2GB of RAM, disks S-ATA in raid 0;

I don`t think it a problem with resources and load... average load is at 0,27.

Items are configured for check at different time periods, so the collection is distributed. Database is tuned for performance, the zabbix interface is really speedy; I indexed and reindexed manually data in DB. I don`t know what to do to solve deadlocks.

Alexei, is is normal for zabbix to die, if deadlocks are detected? Can it be configured to re-start the transaction that failed without exiting? I see in logs that "One child process died. Exiting ..." . It is normal? If one child dies, all the processes dies too?

I give you another log extract:

Code:

update items set lastclock=1219343631,prevvalue=lastvalue,lastvalue='0.000000' where itemid=20310;
update items set lastclock=1219343635,prevvalue=lastvalue,lastvalue='2423653' where itemid=21233;
 
  4890:20080821:214853 Query failed:PGRES_FATAL_ERROR:ERROR:  deadlock detected
DETAIL:  Process 4894 waits for ShareLock on transaction 204674; blocked by process 4920.
Process 4920 waits for ShareLock on transaction 204635; blocked by process 4894.
 
  4890:20080821:214853 Query::select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid,i.delay_flex,h.dns,i.params,f.function,f.parameter,f.itemid,f.lastvalue from hosts h, items i,functions f,triggers t where f.itemid=i.itemid and h.hostid=i.hostid and f.triggerid=t.triggerid and t.status in (0) and f.itemid in (20463,22159,20164,20226,20286,20046,21110,20270,22357,20938,20363,20093,20023,20338,20413,20488,22138,20513,20438,21308,20273,22133,20183,21134,20114,19859,21135,21137,20182,22042,22287,20937,19857,22017,20067,20199,22329,21099,22164,20139,20934,19854,22014,19855,20185,20935,21136,20311,21101,20186,21146,20951,22136,20251,21309,22299,22359,22659,20140,22300,20950,21348,20928,21390,22140,20250,20310,21621,22341,20030,22010,21140,20330,22340,21351,21200,22670,21230,21410,22250,22820,21350,22301,22361,22381,22331,22661,21391,20312,21409,20929,20252,20412,20952,21381,21392,22142,22342,22371,21352,22702,21141,21111,22701,21622,22641,22251,22612,20931,20512,20092,22611,21231,22642,22610,22163,21353,22703,21623,21553,20113,22613,22043,20439,22143,22269,22383,20930,21170,21382,21232,21532,22132,22144,20464,21383,20033,20034,21533,21534,20514,21554,20153,20414,22334,20214,21624,21354,20115,22335,21233)
  4890:20080821:214853 Query failed:PGRES_FATAL_ERROR:ERROR:  current transaction is aborted, commands ignored until end of transaction block
 
  4827:20080821:214853 One child process died. Exiting ...
  4827:20080821:214903 ZABBIX Server stopped. ZABBIX 1.5.4.

This very long SQL statement that seems to produce deadlocks, can be optimized? Before running the query, is it possible to order the itemid`s?

Query::select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h .useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.ho stid,h.status,i.value_type,h.errors_from,i.snmp_po rt,i.delta,i.prevorgvalue,i.lastclock,
i.units,i.multiplier,i.snmpv3_securityname,i.snmpv 3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_p rivpassphrase,i.formula,h.available,i.status,i.tra pper_hosts,i.logtimefmt,i.valuemapid,i.delay_flex, h.dns,i.params,f.function,f.parameter,f.itemid,f.l astvalue from hosts h, items i,functions f,triggers t where f.itemid=i.itemid and h.hostid=i.hostid and f.triggerid=t.triggerid and t.status in (0) and f.itemid in (20463,22159,20164,20226,20286,20046,21110,20270,2 2357,20938,20363,20093,20023,20338,20413,20488,221 38,20513,20438,
21308,20273,22133,20183,21134,20114,19859,21135,21 137,20182,22042,22287,20937,19857,22017,20067,2019 9,22329,21099,
22164,20139,20934,19854,22014,19855,20185,20935,21 136,20311,21101,20186,21146,20951,22136,20251,2130 9,22299,22359,
22659,20140,22300,20950,21348,20928,21390,22140,20 250,20310,21621,22341,20030,22010,21140,20330,2234 0,21351,21200,
22670,21230,21410,22250,22820,21350,22301,22361,22 381,22331,22661,21391,20312,21409,20929,20252,2041 2,20952,21381,
21392,22142,22342,22371,21352,22702,21141,21111,22 701,21622,22641,22251,22612,20931,20512,20092,2261 1,21231,22642,
22610,22163,21353,22703,21623,21553,20113,22613,22 043,20439,22143,22269,22383,20930,21170,21382,2123 2,21532,22132,
22144,20464,21383,20033,20034,21533,21534,20514,21 554,20153,20414,22334,20214,21624,21354,20115,2233 5,21233)

These itemids are unordered, and the query will lock tables inefficiently for long time.

**Alexei** · 22-08-2008, 10:12

Originally posted by theologu

Alexei, is is normal for zabbix to die, if deadlocks are detected? Can it be configured to re-start the transaction that failed without exiting? I see in logs that "One child process died. Exiting ..." . It is normal? If one child dies, all the processes dies too?

What is not normal is that we get the deadlock situations. I do not know how easy it would be to recover from a failed transaction...

**theologu** · 22-08-2008, 12:01

me again...
I turned on in postgres logs reporting of statemens that take long than 3000 ms. After a while, some sql statements ( updates on items ) take really long time - > 600000 ms. Here is the log:

LOG: process 16869 acquired ShareLock on transaction 237528 after 606098.903 ms
STATEMENT: update items set nextcheck=1219397813 where itemid in (20463);
update items set nextcheck=1219397818 where itemid in (21438);
update items set nextcheck=1219397838 where itemid in (22008,22698,21198,21378);
update items set nextcheck=1219397868 where itemid in (21228,22248);
update items set nextcheck=1219398108 where itemid in (21108,22308);
update items set nextcheck=1219398408 where itemid in (22608,20208);

LOG: process 16935 acquired ShareLock on transaction 237528 after 601931.944 ms
STATEMENT: update items set nextcheck=1219397816 where itemid in (20046,20226,20286);
update items set nextcheck=1219397821 where itemid in (21111);
update items set nextcheck=1219397841 where itemid in (21621,21381,22371,21351,22701,22341,22641,21141);
update items set nextcheck=1219397871 where itemid in (22611,21231,22251);
update items set nextcheck=1219398111 where itemid in (20511);
update items set nextcheck=1219399011 where itemid in (20211);

LOG: process 16939 acquired ShareLock on transaction 237528 after 603590.828 ms
STATEMENT: update items set nextcheck=1219397816 where itemid in (21146,20951,21101,20186,22136);
update items set nextcheck=1219397821 where itemid in (22661,22301,22331,22361);

LOG: process 16953 acquired ShareLock on transaction 237528 after 605514.468 ms
STATEMENT: update items set nextcheck=1219397814 where itemid in (19859,21134,20114);
update items set nextcheck=1219397819 where itemid in (20339,22289,20489);

LOG: duration: 550131.989 ms statement: update items set nextcheck=1219397922 where itemid in (22182);

LOG: duration: 604135.652 ms statement: update items set nextcheck=1219397815 where itemid in (21135);
update items set nextcheck=1219397820 where itemid in (22140);
update items set nextcheck=1219397830 where itemid in (21390,20310,20250);

LOG: duration: 602975.519 ms statement: update items set nextcheck=1219397816 where itemid in (21136,20251,20311);
update items set nextcheck=1219397821 where itemid in (21391,22381);

LOG: duration: 605525.407 ms statement: update items set nextcheck=1219397814 where itemid in (19859,21134,20114);
update items set nextcheck=1219397819 where itemid in (20339,22289,20489);

LOG: duration: 606856.136 ms statement: update items set nextcheck=1219397813 where itemid in (20183,21308,20273,20093,20513,22133,20438);
update items set nextcheck=1219397818 where itemid in (22118,22658);

LOG: duration: 604866.955 ms statement: update items set nextcheck=1219397814 where itemid in (20199,21099,22164,22329,22014,20934,20139,19854);
update items set nextcheck=1219397819 where itemid in (22269,21309,22659,22299,20439,22359);

What can be the cause? I think that-s the cause of deadlocks, these updates are taking verry long time, meantime table is locked and generates deadlocks when another transaction is trying to use the table.

**theologu** · 25-08-2008, 11:36

Hi to all,

After three full days of trying to solve deadlocks problems with postgres and zabbix 1.5.4 I had no alternative but return to my old zabbix 1.4.3; I am sorry, but 1.5.4 is unusable to me. With the same database, 1.4.3 does not give a single deadlock. Something has changed in SQL code in 1.5.x branch that makes deadlocks a real problem if you have many hosts, items, etc. Maybe for 10 monitored hosts it won`t be a problem... I liked what I saw in 1.5.4, there is a good improvement in web management console, also other features in server and agent.
Alexei, Aly and all the other hard workers, you did a good job...zabbix is heading in the right way. I think that more attention has to be given to stability and usability of the code, more than fancy graphs and other nice things. These are really great too, but if zabbix is not stable enough, it is not optimised to work with sql databases in the best way possible, it will not be a competing product. Not to mention that later will be harder to make radical changes in design...

This problem with deadlocks is quite serios in my oppinion; maybe it has something to do with my configuration.... maybe I have too many items? I don`t know... but with 1.4.3 works and with 1.5.4 don`t; I am far from an SQL expert, i don`t think you need to be told what deadlocks are and why they appear, there is documentation out there; The application should be optimized to avoid deadlocks; I imagine that are applications bigger than a zabbix with 200 hosts and 2000 active items, that do not generate deadlocks. I realize that it is also hard to test, because you need a big configuration of monitored items, but i think many users will be glad to help on testing.

**Alexei** · 25-08-2008, 11:50

Originally posted by theologu

This problem with deadlocks is quite serios in my oppinion; maybe it has something to do with my configuration.... maybe I have too many items? I don`t know... but with 1.4.3 works and with 1.5.4 don`t; I am far from an SQL expert, i don`t think you need to be told what deadlocks are and why they appear, there is documentation out there; The application should be optimized to avoid deadlocks; I imagine that are applications bigger than a zabbix with 200 hosts and 2000 active items, that do not generate deadlocks. I realize that it is also hard to test, because you need a big configuration of monitored items, but i think many users will be glad to help on testing.

Yes, the problem IS serious. We are working on the problem while I write the message. Remember that we are talking about beta software, so it is quite unfair to compare stability of beta 1.5.4 with 1.4.x.

Tons of changes have been made in 1.5.4, and most of the changes are related to improving interaction with database engine. Set StartDBSyncers=0 to disable database cache and to get old 1.4.x-like behavior. I am pretty sure this will "fix" the deadlock problem for you.

I appreciate your reports. Stay tuned!

Ad Widget

Postgresql Deadlocks

Postgresql Deadlocks

Comment

Comment

Comment

Comment

Comment

Comment