Ad Widget

Collapse

Zabbix 1.1a9 - agentd-crashes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • raa
    Junior Member
    • May 2005
    • 11

    #1

    Zabbix 1.1a9 - agentd-crashes

    hi,

    i am testing 1.1a9 and i detected some problems in zabbix_agentd - maybe pointer-misstakes?

    Systems: debian woody, debian testing
    I have configured most of items as 'active'

    1) crash after deactivating an active check:
    010036:20050520:230733 Active check [uffers]] is not supported. Disabled.
    010026:20050520:230833 One child process died. Exiting ...
    010033:20050520:230833 Got signal. Exiting ...

    it seems, that there is a pointer-problem because the 'b' of buffers is missing in this message.

    2) same probleme here but without crash of agentd:
    016523:20050521:054147 Active check [wblk]] is not supported. Disabled.
    016523:20050521:060748 Active check [_wio]] is not supported. Disabled.
    016523:20050521:060948 Active check [ared]] is not supported. Disabled.
    016523:20050521:061150 Active check [_rio]] is not supported. Disabled.
    016523:20050521:061348 Active check [_wio]] is not supported. Disabled.
    016523:20050521:061848 Active check [_wio]] is not supported. Disabled.
    016523:20050521:062348 Active check [_wio]] is not supported. Disabled.

    3) One time the logfile was filled with more than 10000 of this messages:
    007821:20050520:210023 No sleeping
    007821:20050520:210023 No sleeping
    007821:20050520:210023 No sleeping

    agentd used 80% cpu-time at this time (until i killed it)


    I am using older versions of zabbix about 2 years, it seems 1.1 would be a great new Version!
    many thanks, andi
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    I try to replicate this on my test system today, no luck so far. Are you sure both server and the agent are 1.1alpha9?

    Please, may I ask you to run zabbix_agentd with more debug information? Thanks.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • raa
      Junior Member
      • May 2005
      • 11

      #3
      Originally posted by Alexei
      I try to replicate this on my test system today, no luck so far. Are you sure both server and the agent are 1.1alpha9?
      Yes!

      Originally posted by Alexei
      Please, may I ask you to run zabbix_agentd with more debug information? Thanks.
      Yes of course.

      I had an idea: i compiled the agentd on my debian-testing, but my clients are debian-woody and i did not use the --enable-static.
      No i made 2 tests: one client running agentd with --enable-static
      And one with localy compiled version of agentd

      I'll report if i have some results...

      andi

      Comment

      • raa
        Junior Member
        • May 2005
        • 11

        #4
        Originally posted by raa
        I had an idea: i compiled the agentd on my debian-testing, but my clients are debian-woody and i did not use the --enable-static.
        No i made 2 tests: one client running agentd with --enable-static
        And one with localy compiled version of agentd

        I'll report if i have some results...
        Ok, it happened faster than i thought:

        023884:20050521:081429 Active check [wblk]] is not supported. Disabled.
        023884:20050521:081632 Active check [lk]] is not supported. Disabled.
        023884:20050521:081828 Active check [_wio]] is not supported. Disabled.
        023884:20050521:082132 Active check [lk]] is not supported. Disabled.
        023884:20050521:082232 One child process died. Exiting ...
        023884:20050521:082232 Got signal. Exiting ...

        the bug appears on both clients...

        andi

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Please, replace your src/zabbix_agent/active.c with attached file (latest from CVS) and report results. This must fix high CPU usage issue, at least.

          Thank you!
          Attached Files
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • raa
            Junior Member
            • May 2005
            • 11

            #6
            Originally posted by Alexei
            Please, replace your src/zabbix_agent/active.c with attached file (latest from CVS) and report results. This must fix high CPU usage issue, at least.

            Thank you!
            Done, but sorry, the agentd crashed again.:

            024578:20050521:090238 One child process died. Exiting ...
            024580:20050521:090238 Got signal. Exiting ...
            024587:20050521:090238 Got signal. Exiting ...
            024586:20050521:090238 Got signal. Exiting ...

            and

            025238:20050521:092704 Active check [_wio]] is not supported. Disabled.
            025228:20050521:092903 One child process died. Exiting ...
            025229:20050521:092903 Got signal. Exiting ...

            The high cpu-usage issue happened only one time last night, so i can't currently verify that this version corrects this.

            andi

            Comment

            • Wolfgang
              Senior Member
              Zabbix Certified Trainer
              Zabbix Certified Specialist
              • Apr 2005
              • 116

              #7
              @raa
              pls note that debian-woody uses an older version of libc than debian-testing does. So they are incomaptible and you must compile static.
              http://www.intellitrend.de
              Specialised in monitoring large environments and Zabbix API programming.

              Comment

              • raa
                Junior Member
                • May 2005
                • 11

                #8
                Originally posted by Wolfgang
                @raa
                pls note that debian-woody uses an older version of libc than debian-testing does. So they are incomaptible and you must compile static.
                Yes of course, therefore i compiled the agent with --enable-static witch didn't solve the problem.
                Also i compiled the agentd directly in the client-server - the same.
                On the other site, i have agentd also running on the same host as the server (testing) and the same error exists.

                andi

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #9
                  I cannot reproduce it still. Hmm... interesting.
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • raa
                    Junior Member
                    • May 2005
                    • 11

                    #10
                    Originally posted by Alexei
                    I cannot reproduce it still. Hmm... interesting.
                    no problem, max. 30min to the next crash here ....

                    But i have new facts:
                    On my zappix_server-host the agentd crashed only 1x.
                    Also the message "Active check [...]] is not supported. Disabled." appears very rare.
                    This is the solely host, where the communication is not tunneled through ssh.
                    All other hosts are connected via ssh-tunnels

                    I had changed all Items to "Zabbix agent" there was no crash over 7 hours.

                    Then i changed all Items to "Zabbix agent active" and immediatly comes the Message (more than 100000 times, the logfile rotates some times):
                    003035:20050522:xxxxxx No sleeping
                    zappix_agentd uses about 97% cputime and after a view minutes it crashes.

                    Alexei, i can you send a sql-dump of my configuration, also some logfiles if usable?

                    andi

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #11
                      Originally posted by raa
                      Then i changed all Items to "Zabbix agent active" and immediatly comes the Message (more than 100000 times, the logfile rotates some times):
                      003035:20050522:xxxxxx No sleeping
                      I don't get it. The message may appear if and only if DebugLevel is set to 4, but I think you have it set to 3. So, my guess is that your active.c is outdated or something, however it's quite hard to believe.

                      On my test system it works perfectly, no crashes at all with agent running for days with delay 1 second for all active items, weird...
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      • raa
                        Junior Member
                        • May 2005
                        • 11

                        #12
                        Hi, sorry for my long delay...
                        Originally posted by Alexei
                        I don't get it. The message may appear if and only if DebugLevel is set to 4, but I think you have it set to 3. So, my guess is that your active.c is outdated or something, however it's quite hard to believe.

                        On my test system it works perfectly, no crashes at all with agent running for days with delay 1 second for all active items, weird...
                        The bug only appeared if the connection was made through a ssh-tunnel.
                        Every direct connection did work.
                        Unfortunately most of my clients are firewalled, so i need ssh-tunnels.

                        Today i testet the new version 1.1alpha10:
                        Whatever you changed, it was a good work.
                        The bug seems to be fixed

                        regards, andi

                        Comment

                        Working...