Ad Widget

Collapse

Recursive SQL query following deadlock

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • nmail_uk
    Member
    • May 2009
    • 65

    #1

    Recursive SQL query following deadlock

    I had an unusual situation on my Zabbix box this evening - processor usage jumped up unusually high shortly after 9:30pm - went from less than 1% to over 20%.

    When I looked at the processes it was PostgreSQL hitting the CPU quite hard, specifically one of the ZABBIX server processes. I watched what it was doing using the pg_stat_activity view and it seemed to be running the same query over and over again.

    select t.triggerid,i.itemid from triggers t,functions f,items i where t.triggerid=f.triggerid and f.itemid=i.itemid and t.s
    tatus in (0) and t.value not in (2) and i.itemid in (25,26)
    This was the query - it would then run an UPDATE, then COMMIT, then do the above SELECT, then keep repeating itself. This had been going on for 2 hours.

    I restarted the Zabbix server process and all is well - the processor usage is back below 1% again. Looking through the server log file, it seemed like it hit a deadlock in PostgreSQL which caused it to go into a spiral.

    18761:20091221:212937.423 [Z3005] Query failed: [0] PGRES_FATAL_ERROR:ERROR: deadlock detected
    DETAIL: Process 18762 waits for ShareLock on transaction 64356; blocked by process 882.
    Process 882 waits for ShareLock on transaction 64359; blocked by process 18762.
    HINT: See server log for query details.
    [update items set status=3,lastclock=1261430976,error='Unsupported parameters' where itemid=25;
    update items set status=3,lastclock=1261430976,error='Unsupported parameters' where itemid=26;
    ]
    The same detail was logged in the PostgreSQL error log.
  • nmail_uk
    Member
    • May 2009
    • 65

    #2
    OK the restart didn't do the trick - Zabbix is still trying to execute the same query, and it's doing it over and over again.

    The two items in question (25 and 26) have got incorrect parameters defined in Zabbix (I'm waiting for a reply to another forum post to get them configured correctly.) However the update interval is set to 60 seconds but Zabbix seems to be updating them at least once a second.

    Code:
    zabbix=# select current_query from pg_stat_activity where procpid=2754;
                                             current_query                                          
    ------------------------------------------------------------------------------------------------
     update items set status=3,lastclock=1261440218,error='Unsupported parameters' where itemid=25;
     update items set status=3,lastclock=1261440218,error='Unsupported parameters' where itemid=26;
    
    zabbix=# select current_query from pg_stat_activity where procpid=2754;
                                             current_query                                          
    ------------------------------------------------------------------------------------------------
     update items set status=3,lastclock=1261440219,error='Unsupported parameters' where itemid=25;
     update items set status=3,lastclock=1261440219,error='Unsupported parameters' where itemid=26;
    Any ideas? I'll try disabling the items in the meantime.

    Comment

    • nmail_uk
      Member
      • May 2009
      • 65

      #3
      This is actually pretty easy to reproduce on my setup.

      I had a working ICMP ping command defined as "icmpping[]" which was active and had been returning data.

      I then changed it to "icmpping[,4,,8,2000]" which caused the command to fail with "unsupported parameters" presumably because I didn't give a value for the third parameter.

      Within a minute (that's the interval on the command) I'd started getting high CPU load on the server again. I changed the command definition back correctly but if I tried to re-activate the commands on the two servers, I got a "deadlock in transaction" error, or it succeeded but then immediately reverted back to "unsupported parameters."

      After killing the PostgreSQL backend process that was stuck (sudo kill <pid>) I could re-activate the command and all is hunky-dory again.

      If it's of use, my server agents are set up as active-only; there are no passive checks.

      Comment

      Working...