Ad Widget

Collapse

[1.4.1] Strange behavior. Possible bug in housekeeper

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bobrivers
    Senior Member
    • Feb 2007
    • 115

    #1

    [1.4.1] Strange behavior. Possible bug in housekeeper

    Hi,

    After that I did the upgrade from 1.4 to 1.4.1, I noticed that every hour, zabbix stops to gather data for about 10 minutes.

    I was checking the graphs this afternoom and I can see that all monitored items has a gap of 10 minutes... All of them at the same time.

    I checked the log, and I could see that the problem occurs at the same time that the housekeeper is started. During the time that housekeeper is running, all the inserts return an error telling that Query failed: [insert into history (clock,itemid,value) values (1183301962,20930,0.658615)] Lock wait timeout exceeded; try restarting transaction [1205]

    After that housekeeper stops (I can see a message that tells Deleted 575150 records from history and trends) everything works fine...

    Everytime that housekeeper runs, it deletes about 570000 records...

    Is this an expected behavior (lock the table)? Or the machine were zabbix is running isn't powerful enough (p4 3.0Ghz -- 512Mb)?

    I'm running zabbix 1.4.1 using RedHat EL 5 (php 5.1.6 with mysql 5.0.22). Mysql is running with the default instalation. I didn't make any change or tunning on it.

    Any adevice?

    I didn't had this problem with 1.4. And I'm running zabbix with the same amount of hosts (25 hosts using default templates for linux and windows -- and maybe 3 or 4 additional items that I created).

    TIA,

    Bob
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Do you use InnoDB or MyISAM? MyISAM locks table on update. Housekeeper has been fixed in 1.4.1, so it starts to delete old data. Initially housekeeping may take some time, but at some point it will be back to normal state.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • bobrivers
      Senior Member
      • Feb 2007
      • 115

      #3
      Hi,

      Seems to be InnoDB. I don't know how to check, but using MySQL Administrator, when I click over zabbix tables, it shows that the engine is InnoDB.

      I will try to use 1.4.1 again. Due tue the problems that I had, I rolled back to 1.4...

      Thanks.

      Comment

      • bbrendon
        Senior Member
        • Sep 2005
        • 870

        #4
        Sounds like you should try to stick to 1.4.1 because the problem may only get worse since your DB will grow using 1.4.

        From my mysql command line, do "show table status;" ...and it'll show information in addition to the engine for the table.

        Also, if your database is more than maybe 500mb, you might have performance issues because the default settings for mysql and innodb aren't very good from what I remember.
        Unofficial Zabbix Expert
        Blog, Corporate Site

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Yes, it does look like MySQL was not tuned for better performance.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • StanZoid
            Member
            • Oct 2005
            • 47

            #6
            Houekeeper causing gaps in data collection

            I am having the same problem with my new 1.4.1 installation. Every hour when the housekeeper kicks off, I get a 3-minute gap in data collection across the board, with the following entries in the zabbix_server.log file:

            25417:20070703:131857 Executing housekeeper
            25401:20070703:131953 Query failed: [insert into history_uint (clock,itemid,value) values (1183493942,18482,1)] Lock wait timeout exceeded; try restarting transaction [1205]
            25399:20070703:131957 Query failed: [insert into history (clock,itemid,value) values (1183493946,19205,98.691915)] Lock wait timeout exceeded; try restarting transaction [1205]
            25402:20070703:131959 Query failed: [insert into history_str (clock,itemid,value) values (1183493948,19148,'1.4')] Lock wait timeout exceeded; try restarting transaction [1205]
            25403:20070703:132000 Query failed: [insert into history (clock,itemid,value) values (1183493949,19329,91.926488)] Lock wait timeout exceeded; try restarting transaction [1205]
            25400:20070703:132003 Query failed: [insert into history (clock,itemid,value) values (1183493952,19211,99.926415)] Lock wait timeout exceeded; try restarting transaction [1205]
            25401:20070703:132044 Query failed: [insert into history_str (clock,itemid,value) values (1183493993,18542,'mysql Ver 14.12 Distrib 5.0.22, for redhat-linux-gnu (i686) using readline 5.0')] Lock wait timeout exceeded; try restarting transaction [1205]
            25399:20070703:132048 Query failed: [insert into history_uint (clock,itemid,value) values (1183493997,19295,1183493997)] Lock wait timeout exceeded; try restarting transaction [1205]
            25402:20070703:132050 Query failed: [insert into history (clock,itemid,value) values (1183493999,19208,91.396894)] Lock wait timeout exceeded; try restarting transaction [1205]
            25403:20070703:132051 Query failed: [insert into history_uint (clock,itemid,value) values (1183494000,19509,1)] Lock wait timeout exceeded; try restarting transaction [1205]
            25400:20070703:132054 Query failed: [insert into history_uint (clock,itemid,value) values (1183494003,19511,1)] Lock wait timeout exceeded; try restarting transaction [1205]
            25417:20070703:132120 Deleted 451258 records from history and trends


            The number of records deleted every hour is roughly the same as shown above. I have tuned my.cnf somewhat:

            [mysqld]
            datadir=/var/lib/mysql
            socket=/var/lib/mysql/mysql.sock
            # Default to using old password format for compatibility with mysql 3.x
            # clients (those using the mysqlclient10 compatibility package).
            old_passwords=1
            innodb_buffer_pool_size=640M
            innodb_log_file_size=128M
            innodb_flush_log_at_trx_commit=0
            innodb_thread_concurrency=2
            thread_cache_size=50
            thread_concurrency=2
            max_connections=100


            During the housekeeping run, network traffic drops off dramatically, and the CPU doubles its utilization (20% to 40%) and load (.7 to 2.5).

            Any ideas as to how to convince housekeeper to play nice?

            [img]file:///C:/DOCUME%7E1/SPAGE%7E1.GP0/LOCALS%7E1/Temp/moz-screenshot.jpg[/img][img]file:///C:/DOCUME%7E1/SPAGE%7E1.GP0/LOCALS%7E1/Temp/moz-screenshot-1.jpg[/img]
            StanZoid
            Hosts (m/n/t/d): 104(72/6/26/0) Items (m/d/n)[t]: 7318(3305/3657/356)[24] Triggers (e/d)[t/u/f]: 2114(1976/138)[3/27/1946] Number of events: 43718 Number of alerts: 23892
            Last edited by StanZoid; 03-07-2007, 23:40.

            Comment

            • qix
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • Oct 2006
              • 423

              #7
              Still a problem in 1.4.4

              Hi all,

              I seem to be hitting the same problem with 1.4.4.

              I get these error messages with houskeeping enabled:

              Code:
               24815:20080107:165524 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
               24800:20080107:165527 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
               24816:20080107:165612 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
               24815:20080107:165616 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
               24800:20080107:165619 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
               24816:20080107:165705 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
               24815:20080107:165709 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
               24800:20080107:165712 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='events' and field_name='eventid'] Lock wait timeout exceeded; try restarting transaction [1205]
              No triggers are reset/fired during this period.
              When I force Housekeeping not to run via the Zabbix server config file, all seems well.

              This is however not very optimal

              Any ideas on how to get it all going?
              With kind regards,

              Raymond

              Comment

              • ahanson
                Junior Member
                • Sep 2007
                • 15

                #8
                Anybody have any information on this???

                This is becoming an issue for me as well. Not only do I get a gap in my graphs, but I am also getting around 50 e-mails every hour for the systems it thinks are no longer reachable. If I could get an answer on this (that isn't "disable the housekeeper") I would be very appreciative.

                Comment

                • qix
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Oct 2006
                  • 423

                  #9
                  Just curious, do your messages look like what is described in this thread?

                  With kind regards,

                  Raymond

                  Comment

                  • ahanson
                    Junior Member
                    • Sep 2007
                    • 15

                    #10
                    No, i've never seen any messages like that. The housekeeper is locking the tables, preventing any date from getting in, and causing big gaps in my monitoring graphs.

                    Comment

                    Working...