Ad Widget

Collapse

Postgresql Deadlocks

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • theologu
    Junior Member
    • Dec 2007
    • 23

    #1

    Postgresql Deadlocks

    Hello,

    Since yesterday, I am using zabbix 1.5.4 on a postgresql 8.2.5 database ( just upgraded to 8.3.3) but I have big problems with deadlocks:

    Code:
    8499:20080821:112737 Query failed:PGRES_FATAL_ERROR:ERROR:  deadlock detected
    DETAIL:  Process 8504 waits for ShareLock on transaction 84652; blocked by process 8517.
    Process 8517 waits for ShareLock on transaction 84649; blocked by process 8504.
     
      8499:20080821:112737 Query::select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid,i.delay_flex,h.dns,i.params,f.function,f.parameter,f.itemid,f.lastvalue from hosts h, items i,functions f,triggers t where f.itemid=i.itemid and h.hostid=i.hostid and f.triggerid=t.triggerid and t.status in (0) and f.itemid in (20185,20951,20311,21101,20046,21136,22042,20182,20272,20273,22133,20513,20363,20186,20286,21146,20226,22136,22329,20269,20338,22357,19857,21137,20067,22287,20937,21134,20114,22164,22014,19854,20164,20934,19855,21135,21110,20270,20488,20938,22138,21308,20438,20023,20139,20199,22159,21099,20093,20183,20463,20413,20061,22641,21141,21621,21391,22821,21111,22331,22361,22381,22041,22661,22301,22342,22382,20362,21352,21322,20062,22702,20482,20332,21622,22642,21381,22701,21503,20063,22643,21353,20483,22163,22703,20333,21623,20163,21382,22612,20312,20022,20462,22362,21392,22142,21442,20952,22282,21142,21383,22143,22043,22383,22343,22285,21355,21384,20215,22335,22145,20994,20115,22645,22115,21203,20903,20214,21354,21624,22794,22704,21385,20285,20024,20126,21444,22324,20094,20996,21356,21206,22646,20364,21533,21506,22613,21536,22116,22766,20464,20514,20414,22286,21416,21554,20066,21326)
      8499:20080821:112737 Query failed:PGRES_FATAL_ERROR:ERROR:  current transaction is aborted, commands ignored until end of transaction block
     
      8450:20080821:112737 One child process died. Exiting ...
      8450:20080821:112748 ZABBIX Server stopped. ZABBIX 1.5.4.
    I tried everything I can think of, tuned various parameters in postgresql.conf, reindexed the indexes on DB, but nothing helped! With zabbix 1.4.X I never had problems with deadlocks.

    I don`t know why zabbix dies when postgres reports deadlocks, because deadlocks are not fatal, just aborted transactions. The application should know to re-start the transaction.


    Hi All, Does this message make any sense? Is PostgreSQL in fact implicitly locking tables for INSERTs, UPDATEs, and DELETEs? …


    I know that deadlocks are PSQL related, but are triggered by transactions made by zabbix, which I think can be optimized but is beyond my skills.

    So please give me an advice to get rid of deadlocks.
    Last edited by theologu; 21-08-2008, 12:01.
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    We experience seldom deadlocks on MySQL as well. No ideas why this happens so far, still investigating...
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • theologu
      Junior Member
      • Dec 2007
      • 23

      #3
      I am constantly trying to solve deadlocks problem for three days now.... I am really out of ideas and under these condintions zabbix 1.5.4 is unusable, it dies at random intervals from minutes to tens of minutes.

      No. of hosts monitored: 72
      No. of active items: 1298
      No. of triggers: 544

      Server is IBM, CPU Xeon E5320 quad core, 2GB of RAM, disks S-ATA in raid 0;

      I don`t think it a problem with resources and load... average load is at 0,27.

      Items are configured for check at different time periods, so the collection is distributed. Database is tuned for performance, the zabbix interface is really speedy; I indexed and reindexed manually data in DB. I don`t know what to do to solve deadlocks.

      Alexei, is is normal for zabbix to die, if deadlocks are detected? Can it be configured to re-start the transaction that failed without exiting? I see in logs that "One child process died. Exiting ..." . It is normal? If one child dies, all the processes dies too?

      I give you another log extract:

      Code:
      update items set lastclock=1219343631,prevvalue=lastvalue,lastvalue='0.000000' where itemid=20310;
      update items set lastclock=1219343635,prevvalue=lastvalue,lastvalue='2423653' where itemid=21233;
       
        4890:20080821:214853 Query failed:PGRES_FATAL_ERROR:ERROR:  deadlock detected
      DETAIL:  Process 4894 waits for ShareLock on transaction 204674; blocked by process 4920.
      Process 4920 waits for ShareLock on transaction 204635; blocked by process 4894.
       
        4890:20080821:214853 Query::select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.delta,i.prevorgvalue,i.lastclock,i.units,i.multiplier,i.snmpv3_securityname,i.snmpv3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_privpassphrase,i.formula,h.available,i.status,i.trapper_hosts,i.logtimefmt,i.valuemapid,i.delay_flex,h.dns,i.params,f.function,f.parameter,f.itemid,f.lastvalue from hosts h, items i,functions f,triggers t where f.itemid=i.itemid and h.hostid=i.hostid and f.triggerid=t.triggerid and t.status in (0) and f.itemid in (20463,22159,20164,20226,20286,20046,21110,20270,22357,20938,20363,20093,20023,20338,20413,20488,22138,20513,20438,21308,20273,22133,20183,21134,20114,19859,21135,21137,20182,22042,22287,20937,19857,22017,20067,20199,22329,21099,22164,20139,20934,19854,22014,19855,20185,20935,21136,20311,21101,20186,21146,20951,22136,20251,21309,22299,22359,22659,20140,22300,20950,21348,20928,21390,22140,20250,20310,21621,22341,20030,22010,21140,20330,22340,21351,21200,22670,21230,21410,22250,22820,21350,22301,22361,22381,22331,22661,21391,20312,21409,20929,20252,20412,20952,21381,21392,22142,22342,22371,21352,22702,21141,21111,22701,21622,22641,22251,22612,20931,20512,20092,22611,21231,22642,22610,22163,21353,22703,21623,21553,20113,22613,22043,20439,22143,22269,22383,20930,21170,21382,21232,21532,22132,22144,20464,21383,20033,20034,21533,21534,20514,21554,20153,20414,22334,20214,21624,21354,20115,22335,21233)
        4890:20080821:214853 Query failed:PGRES_FATAL_ERROR:ERROR:  current transaction is aborted, commands ignored until end of transaction block
       
        4827:20080821:214853 One child process died. Exiting ...
        4827:20080821:214903 ZABBIX Server stopped. ZABBIX 1.5.4.
      This very long SQL statement that seems to produce deadlocks, can be optimized? Before running the query, is it possible to order the itemid`s?

      Query::select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h .useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.ho stid,h.status,i.value_type,h.errors_from,i.snmp_po rt,i.delta,i.prevorgvalue,i.lastclock,
      i.units,i.multiplier,i.snmpv3_securityname,i.snmpv 3_securitylevel,i.snmpv3_authpassphrase,i.snmpv3_p rivpassphrase,i.formula,h.available,i.status,i.tra pper_hosts,i.logtimefmt,i.valuemapid,i.delay_flex, h.dns,i.params,f.function,f.parameter,f.itemid,f.l astvalue from hosts h, items i,functions f,triggers t where f.itemid=i.itemid and h.hostid=i.hostid and f.triggerid=t.triggerid and t.status in (0) and f.itemid in (20463,22159,20164,20226,20286,20046,21110,20270,2 2357,20938,20363,20093,20023,20338,20413,20488,221 38,20513,20438,
      21308,20273,22133,20183,21134,20114,19859,21135,21 137,20182,22042,22287,20937,19857,22017,20067,2019 9,22329,21099,
      22164,20139,20934,19854,22014,19855,20185,20935,21 136,20311,21101,20186,21146,20951,22136,20251,2130 9,22299,22359,
      22659,20140,22300,20950,21348,20928,21390,22140,20 250,20310,21621,22341,20030,22010,21140,20330,2234 0,21351,21200,
      22670,21230,21410,22250,22820,21350,22301,22361,22 381,22331,22661,21391,20312,21409,20929,20252,2041 2,20952,21381,
      21392,22142,22342,22371,21352,22702,21141,21111,22 701,21622,22641,22251,22612,20931,20512,20092,2261 1,21231,22642,
      22610,22163,21353,22703,21623,21553,20113,22613,22 043,20439,22143,22269,22383,20930,21170,21382,2123 2,21532,22132,
      22144,20464,21383,20033,20034,21533,21534,20514,21 554,20153,20414,22334,20214,21624,21354,20115,2233 5,21233)

      These itemids are unordered, and the query will lock tables inefficiently for long time.
      Last edited by theologu; 22-08-2008, 09:53.

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        Originally posted by theologu
        Alexei, is is normal for zabbix to die, if deadlocks are detected? Can it be configured to re-start the transaction that failed without exiting? I see in logs that "One child process died. Exiting ..." . It is normal? If one child dies, all the processes dies too?
        What is not normal is that we get the deadlock situations. I do not know how easy it would be to recover from a failed transaction...
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • theologu
          Junior Member
          • Dec 2007
          • 23

          #5
          me again...
          I turned on in postgres logs reporting of statemens that take long than 3000 ms. After a while, some sql statements ( updates on items ) take really long time - > 600000 ms. Here is the log:

          LOG: process 16869 acquired ShareLock on transaction 237528 after 606098.903 ms
          STATEMENT: update items set nextcheck=1219397813 where itemid in (20463);
          update items set nextcheck=1219397818 where itemid in (21438);
          update items set nextcheck=1219397838 where itemid in (22008,22698,21198,21378);
          update items set nextcheck=1219397868 where itemid in (21228,22248);
          update items set nextcheck=1219398108 where itemid in (21108,22308);
          update items set nextcheck=1219398408 where itemid in (22608,20208);

          LOG: process 16935 acquired ShareLock on transaction 237528 after 601931.944 ms
          STATEMENT: update items set nextcheck=1219397816 where itemid in (20046,20226,20286);
          update items set nextcheck=1219397821 where itemid in (21111);
          update items set nextcheck=1219397841 where itemid in (21621,21381,22371,21351,22701,22341,22641,21141);
          update items set nextcheck=1219397871 where itemid in (22611,21231,22251);
          update items set nextcheck=1219398111 where itemid in (20511);
          update items set nextcheck=1219399011 where itemid in (20211);

          LOG: process 16939 acquired ShareLock on transaction 237528 after 603590.828 ms
          STATEMENT: update items set nextcheck=1219397816 where itemid in (21146,20951,21101,20186,22136);
          update items set nextcheck=1219397821 where itemid in (22661,22301,22331,22361);

          LOG: process 16953 acquired ShareLock on transaction 237528 after 605514.468 ms
          STATEMENT: update items set nextcheck=1219397814 where itemid in (19859,21134,20114);
          update items set nextcheck=1219397819 where itemid in (20339,22289,20489);

          LOG: duration: 550131.989 ms statement: update items set nextcheck=1219397922 where itemid in (22182);

          LOG: duration: 604135.652 ms statement: update items set nextcheck=1219397815 where itemid in (21135);
          update items set nextcheck=1219397820 where itemid in (22140);
          update items set nextcheck=1219397830 where itemid in (21390,20310,20250);

          LOG: duration: 602975.519 ms statement: update items set nextcheck=1219397816 where itemid in (21136,20251,20311);
          update items set nextcheck=1219397821 where itemid in (21391,22381);

          LOG: duration: 605525.407 ms statement: update items set nextcheck=1219397814 where itemid in (19859,21134,20114);
          update items set nextcheck=1219397819 where itemid in (20339,22289,20489);

          LOG: duration: 606856.136 ms statement: update items set nextcheck=1219397813 where itemid in (20183,21308,20273,20093,20513,22133,20438);
          update items set nextcheck=1219397818 where itemid in (22118,22658);

          LOG: duration: 604866.955 ms statement: update items set nextcheck=1219397814 where itemid in (20199,21099,22164,22329,22014,20934,20139,19854);
          update items set nextcheck=1219397819 where itemid in (22269,21309,22659,22299,20439,22359);


          What can be the cause? I think that-s the cause of deadlocks, these updates are taking verry long time, meantime table is locked and generates deadlocks when another transaction is trying to use the table.

          Comment

          • theologu
            Junior Member
            • Dec 2007
            • 23

            #6
            Hi to all,

            After three full days of trying to solve deadlocks problems with postgres and zabbix 1.5.4 I had no alternative but return to my old zabbix 1.4.3; I am sorry, but 1.5.4 is unusable to me. With the same database, 1.4.3 does not give a single deadlock. Something has changed in SQL code in 1.5.x branch that makes deadlocks a real problem if you have many hosts, items, etc. Maybe for 10 monitored hosts it won`t be a problem... I liked what I saw in 1.5.4, there is a good improvement in web management console, also other features in server and agent.
            Alexei, Aly and all the other hard workers, you did a good job...zabbix is heading in the right way. I think that more attention has to be given to stability and usability of the code, more than fancy graphs and other nice things. These are really great too, but if zabbix is not stable enough, it is not optimised to work with sql databases in the best way possible, it will not be a competing product. Not to mention that later will be harder to make radical changes in design...

            This problem with deadlocks is quite serios in my oppinion; maybe it has something to do with my configuration.... maybe I have too many items? I don`t know... but with 1.4.3 works and with 1.5.4 don`t; I am far from an SQL expert, i don`t think you need to be told what deadlocks are and why they appear, there is documentation out there; The application should be optimized to avoid deadlocks; I imagine that are applications bigger than a zabbix with 200 hosts and 2000 active items, that do not generate deadlocks. I realize that it is also hard to test, because you need a big configuration of monitored items, but i think many users will be glad to help on testing.

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #7
              Originally posted by theologu
              This problem with deadlocks is quite serios in my oppinion; maybe it has something to do with my configuration.... maybe I have too many items? I don`t know... but with 1.4.3 works and with 1.5.4 don`t; I am far from an SQL expert, i don`t think you need to be told what deadlocks are and why they appear, there is documentation out there; The application should be optimized to avoid deadlocks; I imagine that are applications bigger than a zabbix with 200 hosts and 2000 active items, that do not generate deadlocks. I realize that it is also hard to test, because you need a big configuration of monitored items, but i think many users will be glad to help on testing.
              Yes, the problem IS serious. We are working on the problem while I write the message. Remember that we are talking about beta software, so it is quite unfair to compare stability of beta 1.5.4 with 1.4.x.

              Tons of changes have been made in 1.5.4, and most of the changes are related to improving interaction with database engine. Set StartDBSyncers=0 to disable database cache and to get old 1.4.x-like behavior. I am pretty sure this will "fix" the deadlock problem for you.

              I appreciate your reports. Stay tuned!
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              Working...