Ad Widget

Collapse

Zabbix 1.8.3 and PostgreSQL: deadlock + transaction errors

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • floriang
    Junior Member
    • Jan 2009
    • 4

    #1

    Zabbix 1.8.3 and PostgreSQL: deadlock + transaction errors

    Hi,

    our Zabbix server ran into database deadlocks several times today. After that there seems to be a bug with the transaction handling.
    The messages from the postgres-log:

    Nov 12 15:46:09 zs postgres[16928]: [2-1] ERROR: deadlock detected
    Nov 12 15:46:09 zs postgres[16928]: [2-2] DETAIL: Process 16928 waits for ShareLock on transaction 278998374; blocked by process 31052.
    Nov 12 15:46:09 zs postgres[16928]: [2-3] Process 31052 waits for ShareLock on transaction 278998391; blocked by process 16928.
    Nov 12 15:46:09 zs postgres[16928]: [2-4] Process 16928: update triggers set value=0,lastchange=1289570142,error='' where triggerid=14881
    Nov 12 15:46:09 zs postgres[16928]: [2-5] Process 31052: update triggers set value=1,lastchange=1289573165,error='' where triggerid=14596
    Nov 12 15:46:09 zs postgres[16928]: [2-6] HINT: See server log for query details.
    Nov 12 15:46:09 zs postgres[16928]: [2-7] STATEMENT: update triggers set value=0,lastchange=1289570142,error='' where triggerid=14881
    Nov 12 15:46:09 zs postgres[16928]: [3-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [3-2] STATEMENT: select description,priority,comments,url,type from triggers where triggerid=14881
    Nov 12 15:46:09 zs postgres[16928]: [4-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [4-2] STATEMENT: select eventid,value from events where source=0 and object=0 and objectid=14881 and value in (0,1) order by object desc,objectid desc,eventid desc
    limit 1
    Nov 12 15:46:09 zs postgres[16928]: [5-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [5-2] STATEMENT: insert into events (eventid,source,object,objectid,clock,value) values (609568,0,0,14881,1289570142,0)
    Nov 12 15:46:09 zs postgres[16928]: [6-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [6-2] STATEMENT: select serviceid from services where triggerid=14881
    Nov 12 15:46:09 zs postgres[16928]: [7-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [7-2] STATEMENT: select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.type,h.useip,h.ip,i.history,i.lastvalue,i.prev value,i.hostid,i.value_type,i.d
    elta,i.prevorgvalue,i.lastclock,i.units,i.multipli er,i.formula,i.status,i.valuemapid,h.dns,i.trends, i.lastlogsize,i.data_type,i.mtime,f.function,f.par ameter from hosts h,items i,functions f where i.hostid=h.
    hostid and i.itemid=f.itemid and f.functionid=18625
    Nov 12 15:46:09 zs postgres[16928]: [8-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [8-2] STATEMENT: select t.triggerid, t.value from trigger_depends d,triggers t where d.triggerid_down=14882 and d.triggerid_up=t.triggerid
    Nov 12 15:46:09 zs postgres[16928]: [9-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [9-2] STATEMENT: update triggers set value=2,lastchange=1289570113,error='Could not obtain function and item for functionid: 18625' where triggerid=14882
    Nov 12 15:46:09 zs postgres[16928]: [10-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [10-2] STATEMENT: select description,priority,comments,url,type from triggers where triggerid=14882
    Nov 12 15:46:09 zs postgres[16928]: [11-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [11-2] STATEMENT: insert into events (eventid,source,object,objectid,clock,value) values (609569,0,0,14882,1289570113,2)
    Nov 12 15:46:09 zs postgres[16928]: [12-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [12-2] STATEMENT: select serviceid from services where triggerid=14882
    Nov 12 15:46:09 zs postgres[16928]: [13-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [13-2] STATEMENT: select distinct i.itemid,i.key_,h.host,h.port,i.delay,i.descriptio n,i.type,h.useip,h.ip,i.history,i.lastvalue,i.prev value,i.hostid,i.value_type,i.
    delta,i.prevorgvalue,i.lastclock,i.units,i.multipl ier,i.formula,i.status,i.valuemapid,h.dns,i.trends ,i.lastlogsize,i.data_type,i.mtime,f.function,f.pa rameter from hosts h,items i,functions f where i.hostid=h
    .hostid and i.itemid=f.itemid and f.functionid=18586
    Nov 12 15:46:09 zs postgres[16928]: [14-1] ERROR: current transaction is aborted, commands ignored until end of transaction block
    Nov 12 15:46:09 zs postgres[16928]: [14-2] STATEMENT: update triggers set error='Could not obtain function and item for functionid: 18586' where triggerid=14884
    [ several hundreds of those messages per second ]

    I "fixed" it by selectively killing the database connection processes from zabbix_server to Postgres until everything worked again. Meanwhile the load on the system rose above 50.

    Maybe the following error (also several times today) has something to do with it:

    Nov 12 16:07:36 zs postgres[4817]: [2-1] ERROR: duplicate key value violates unique constraint "trends_pkey"
    Nov 12 16:07:36 zs postgres[4817]: [2-2] STATEMENT: insert into trends (itemid,clock,num,value_min,value_avg,value_max) values (23283,1289574000,1,1.183105,1.183105,1.183105);
    Nov 12 16:07:36 zs postgres[4817]: [2-3]
    Nov 12 16:07:36 zs postgres[4817]: [3-1] WARNING: there is no transaction in progress


    Can you already debug the issue with this information? Do you need more?
  • sh1ny
    Junior Member
    • Nov 2010
    • 2

    #2
    I am experiencing the exact same, and it is killing my database.
    Any suggestions ?

    Comment

    • floriang
      Junior Member
      • Jan 2009
      • 4

      #3
      Dirty workaround

      Like I wrote, I fixed it by killing all database connections from the zabbix_server processes that had aborted transactions. So with log lines like:

      > postgres[16928]: [14-1] ERROR: current transaction is aborted, commands ignored until end of transaction block

      I killed process ID 16928 and so on...

      Comment

      • sh1ny
        Junior Member
        • Nov 2010
        • 2

        #4
        I did stop the whole zabbix server + the pgsql server, then started them fresh and i am still getting this issue. The zabbix server is on a different box than the pgsql server. Rebooted both, still no luck. As soon as i start zabbix server it starts throwing those errors and the pg server gets loads around 10-20.

        Comment

        • floriang
          Junior Member
          • Jan 2009
          • 4

          #5
          Originally posted by sh1ny
          I did stop the whole zabbix server + the pgsql server, then started them fresh and i am still getting this issue. The zabbix server is on a different box than the pgsql server. Rebooted both, still no luck. As soon as i start zabbix server it starts throwing those errors and the pg server gets loads around 10-20.
          That explicitly did not work for me either. As described above I chose a different method of solving it.

          Killing the connection between the running server process and the database seems to reset the transaction in a way that the zabbix_server process can handle. A couple of kills later the problem stopped here. Maybe it's a race condition or the problematic data is just not inserted again by the same zabbix_server process.

          Comment

          Working...