Ad Widget

Collapse

Active agents stopped reporting after upgrading to 1.6

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Jason
    Senior Member
    • Nov 2007
    • 430

    #1

    Active agents stopped reporting after upgrading to 1.6

    I've just upgraded from 1.4 to 1.6 and all of the active agents are showing as offline and unable to contact the zabbix server.

    the database upgrade ran through suspiciously quickly (5GB database) but I couldn't find any errors reported.

    Any ideas on this? Otherwise I'll have to revert back to 1.4 and I really like the new interface.
  • Jason
    Senior Member
    • Nov 2007
    • 430

    #2
    Some more info... This is from the zabbix_server.log on the server.

    6486:20080924:101124 Trapper got [ZBX_GET_ACTIVE_CHECKS
    6486:20080924:101124 In send_list_of_active_checks()
    6486:20080924:101124 Query [select i.key_,i.delay,i.lastlogsize from items i,hosts h where i.hostid=h.hostid and h.status=0 and i.type=7 and h
    .host='shepherdsvr01.shepherd.local' and h.proxy_hostid=0 and (i.status=0 or (i.status=3 and i.nextcheck<=1222247484)) and h.hostid between 1000
    00000000000 and 199999999999999]
    6486:20080924:101124 Sending [agent.ping:90:0
    6486:20080924:101124 Error while sending list of active checks
    6486:20080924:101124 Trapper got [<req><host>c2VydmVyMDEuYmhsLmxvY2Fs</host><key>ZXZlbnRsb2dbQXBwbGljYXRpb25d</key><data>Q2xpZW50IGNvbXB1dGVyc
    yBhcmUgaW5zdGFsbGluZyB1cGRhdGVzIHdpdGggYSBoaWdoZXI gdGhhbiAyNSBwZXJjZW50IGZhaWx1cmUgcmF0ZS4gVGhpcyBpc yBub3Qgbm9ybWFsLg0K</data><lastlogsize>NDM3N
    TU=</lastlogsize><timestamp>MTIyMjI0MjI5OQ==</timestamp><source>V2luZG93cyBTZXJ2ZXIgVXBkYXRlIFNl cnZpY2Vz</source><severity>NA==</severity></req>
    ] len 396
    6486:20080924:101124 XML received [<req><host>c2VydmVyMDEuYmhsLmxvY2Fs</host><key>ZXZlbnRsb2dbQXBwbGljYXRpb25d</key><data>Q2xpZW50IGNvbXB1dGVy
    cyBhcmUgaW5zdGFsbGluZyB1cGRhdGVzIHdpdGggYSBoaWdoZX IgdGhhbiAyNSBwZXJjZW50IGZhaWx1cmUgcmF0ZS4gVGhpcyBp cyBub3Qgbm9ybWFsLg0K</data><lastlogsize>NDM3
    NTU=</lastlogsize><timestamp>MTIyMjI0MjI5OQ==</timestamp><source>V2luZG93cyBTZXJ2ZXIgVXBkYXRlIFNl cnZpY2Vz</source><severity>NA==</severity></req
    >]

    Comment

    • Jason
      Senior Member
      • Nov 2007
      • 430

      #3
      Hmmm... the debug on the agentd is

      5858:20080924:112635 Sending [ZBX_GET_ACTIVE_CHECKS
      hostname.ourdomain
      ]
      5858:20080924:112635 Before read
      5858:20080924:112655 Timeout while answering request
      5858:20080924:112655 Get active checks error: ZBX_TCP_READ() failed [Interrupted system call]
      5858:20080924:112655 Getting list of active checks failed. Will retry after 60 seconds
      5858:20080924:112755 get_active_checks('192.168.124.25',10051)

      It looks like the server isn't responding, but it worked fine before in 1.4.5...

      I've dumped and reloaded backup of DB and repatched and no errors, but still no joy.

      Comment

      • Jason
        Senior Member
        • Nov 2007
        • 430

        #4
        There is this thread from the 1.5.4 beta for a similar problem and it links to some code patch for trapper.c which apparently should be in 1.6, but I can't seem to find it in the 1.6 version. Adding the code from this patch fails because processed_fail isn't defined in process_new_values.



        from serverlog

        30943:20080924:122153 Error sending result back
        30943:20080924:122157 Error while sending list of active checks
        30943:20080924:122245 Error sending result back
        30943:20080924:122338 Error sending result back
        30943:20080924:122427 Error while sending list of active checks
        30943:20080924:122430 Error sending result back
        Last edited by Jason; 24-09-2008, 13:15. Reason: added extra info

        Comment

        • Jason
          Senior Member
          • Nov 2007
          • 430

          #5
          I've just done a completely fresh database as per the instructions with 1.6 and added an active agent to connect to it and no data comes in and the remote agent (1.4.5) reports a failed to connect to server suggesting that active agents are broken in the release version.

          I've reverted back to my 1.4 database and binaries and the agents are all reporting in again.

          The system is built on CentOS 5.2 64bit version. Has anyone else experienced this problem with active agents?

          Comment

          • Antras
            Junior Member
            • Oct 2007
            • 12

            #6
            Did you changed parameter StartAgents in the zabbix_agent.conf.
            I had the same problem, if StartAgents was 2. When i set it to 5 (default), everything began to work.

            Comment

            • Jason
              Senior Member
              • Nov 2007
              • 430

              #7
              I'll test that tomorrow... Surely that would only affect linux agents though and not windows agents?

              Comment

              • Antras
                Junior Member
                • Oct 2007
                • 12

                #8
                I've tested it only on my linux machines. Tomorrow will try on the windows too.

                Comment

                • Jason
                  Senior Member
                  • Nov 2007
                  • 430

                  #9
                  I still can't seem to get active agents to work at all either linux or windows. Some of the windows checks eventually check in (UserParameter ones only), but nothing from the rest.

                  On the agents I'm still getting

                  21905:20080926:093956 Getting list of active checks failed. Will retry after 60 seconds
                  21905:20080926:094116 Timeout while answering request


                  and on the zabbix server the mysql server appears maxed out at 100% cpu usage.

                  Comment

                  • Alexei
                    Founder, CEO
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Sep 2004
                    • 5654

                    #10
                    Originally posted by Jason
                    and on the zabbix server the mysql server appears maxed out at 100% cpu usage.
                    This is interesting. What does 'mysqladmin processlist' show?
                    Alexei Vladishev
                    Creator of Zabbix, Product manager
                    New York | Tokyo | Riga
                    My Twitter

                    Comment

                    • Jason
                      Senior Member
                      • Nov 2007
                      • 430

                      #11
                      here is the processlist
                      +-------+-------------+-----------+---------+---------+------+----------------+------------------------------------------------------------------------------------------------+
                      | Id | User | Host | db | Command | Time | State | Info |
                      +-------+-------------+-----------+---------+---------+------+----------------+------------------------------------------------------------------------------------------------+
                      | 81070 | zabbix_user | localhost | zabbix2 | Query | 95 | Updating | update ids set nextid=nextid+1 where nodeid=1 and table_name='history_log' and field_name='id' |
                      | 81071 | zabbix_user | localhost | zabbix2 | Query | 3 | Updating | update ids set nextid=nextid+1 where nodeid=1 and table_name='history_log' and field_name='id' |
                      | 81072 | zabbix_user | localhost | zabbix2 | Sleep | 0 | | |
                      | 81073 | zabbix_user | localhost | zabbix2 | Query | 3 | Sorting result | select value from history_log where itemid=100100000022859 order by id desc limit 1 |
                      | 81074 | zabbix_user | localhost | zabbix2 | Query | 95 | Updating | update ids set nextid=nextid+1 where nodeid=1 and table_name='history_log' and field_name='id' |
                      | 81075 | zabbix_user | localhost | zabbix2 | Sleep | 6 | | |
                      | 81076 | zabbix_user | localhost | zabbix2 | Sleep | 24 | | |
                      | 81077 | zabbix_user | localhost | zabbix2 | Query | 0 | optimizing | select min(clock) from history where itemid=100100000022604 |
                      | 81080 | zabbix_user | localhost | zabbix2 | Sleep | 2 | | |
                      | 81081 | zabbix_user | localhost | zabbix2 | Sleep | 2 | | |
                      | 81084 | zabbix_user | localhost | zabbix2 | Sleep | 5 | | |
                      | 81085 | zabbix_user | localhost | zabbix2 | Sleep | 0 | | |
                      | 81086 | zabbix_user | localhost | zabbix2 | Sleep | 2 | | |
                      | 81087 | zabbix_user | localhost | zabbix2 | Sleep | 4 | | |
                      | 81088 | zabbix_user | localhost | zabbix2 | Sleep | 109 | | |
                      | 81089 | zabbix_user | localhost | zabbix2 | Sleep | 1 | | |
                      | 81090 | zabbix_user | localhost | zabbix2 | Sleep | 4 | | |
                      | 81113 | root | localhost | | Query | 0 | | show processlist |
                      +-------+-------------+-----------+---------+---------+------+----------------+------------------------------------------------------------------------------------------------+


                      Also something I just noticed from log file..

                      13199:20080929:094441 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='history_log' and field_name='id'] Lock wait timeout exceeded; try restarting transaction [1205]
                      13195:20080929:094441 Query failed: [update ids set nextid=nextid+1 where nodeid=1 and table_name='history_log' and field_name='id'] Lock wait timeout exceeded; try restarting transaction [1205]

                      Comment

                      • Jason
                        Senior Member
                        • Nov 2007
                        • 430

                        #12
                        Any ideas on this and a workaround?

                        Comment

                        • msklizmantas
                          Junior Member
                          • Jun 2008
                          • 1

                          #13
                          i am experiencing similar issue:

                          Id User Host/IP DB Time Cmd Query or State
                          -- ---- ------- -- ---- --- ----------
                          10864 zabbix localhost zabbix 0 Sleep
                          10868 zabbix localhost zabbix 0 Query commit
                          10871 zabbix localhost zabbix 0 Query commit
                          10874 saint localhost mysql 0 Query show full processlist
                          26 zabbix localhost zabbix 1 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          515 zabbix localhost zabbix 1 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          5257 zabbix localhost zabbix 1 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          10853 zabbix localhost zabbix 1 Query commit
                          10860 zabbix localhost zabbix 1 Query delete from history_uint where itemid=19651 and clock<1222362731
                          10869 zabbix localhost zabbix 1 Query commit
                          10872 zabbix localhost zabbix 1 Query commit
                          10873 zabbix localhost zabbix 1 Query commit
                          22 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          23 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          24 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          25 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          58 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          129 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          207 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          209 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          429 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          625 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          10876 zabbix localhost zabbix 2 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          381 zabbix localhost zabbix 3 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          416 zabbix localhost zabbix 3 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          540 zabbix localhost zabbix 3 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          624 zabbix localhost zabbix 3 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          10857 zabbix localhost zabbix 3 Sleep
                          10870 zabbix localhost zabbix 3 Sleep
                          165 zabbix localhost zabbix 4 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          10677 zabbix localhost zabbix 4 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          5263 zabbix localhost zabbix 5 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          10863 zabbix localhost zabbix 5 Sleep
                          57 zabbix localhost zabbix 6 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          490 zabbix localhost zabbix 6 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil
                          10788 zabbix localhost zabbix 6 Query UPDATE ids SET nextid=nextid+1 WHERE nodeid=0 AND table_name='profiles' AND field_name='profil

                          web frontend can't even load.

                          Comment

                          • disgruntleddutch
                            Member
                            • Oct 2006
                            • 34

                            #14
                            Yup this is the same exact problem I encountered.

                            Active checks failing and the mysqld process ramping up to the max of box which in my case is 400%. Not very fun and it forces the use of non-active agents everywhere :-(.

                            Comment

                            • dantheman
                              Senior Member
                              • May 2006
                              • 209

                              #15
                              I also have the same behavior where my active agents have failed to work after the upgrade to 1.6, but my mysql is staying low (between 0-30 percent).

                              Comment

                              Working...