Ad Widget

Collapse

1.4 Node Watcher Thread

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • agehring
    Member
    • Oct 2006
    • 49

    #1

    1.4 Node Watcher Thread

    Are there any patches for the "node watcher" thread for server?

    I'm getting segment violations (SIGSEGV SIG_DFL) on zabbix server running on OS X, and it always happens with the node watcher thread.

    Is there a way to turn of the distributed attribute during run and/or compile time?

    Thanks,
    Andrew

    P.S. Working on debugging it "deeper"...
  • tronite
    Senior Member
    • Jun 2007
    • 147

    #2
    Originally posted by agehring
    Are there any patches for the "node watcher" thread for server?

    I'm getting segment violations (SIGSEGV SIG_DFL) on zabbix server running on OS X, and it always happens with the node watcher thread.

    Is there a way to turn of the distributed attribute during run and/or compile time?

    Thanks,
    Andrew

    P.S. Working on debugging it "deeper"...
    Have you checked what they say here: http://sciss.de/jcollider/doc/api/de...deWatcher.html
    ?

    Comment

    • agehring
      Member
      • Oct 2006
      • 49

      #3
      What does a Java class have to do with zabbix?

      Comment

      • agehring
        Member
        • Oct 2006
        • 49

        #4
        More Info...

        I'm getting "deeper".

        The node watcher server SIGSEGVs at the 103942nd call of

        In send_to_master_and_slave(node:0)

        of the node watcher...

        I've repeated this 10 times, without fail...


        memory leak?

        Andrew

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Please could you apply this patch and let me know if it helped.
          Attached Files
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • agehring
            Member
            • Oct 2006
            • 49

            #6
            I appled the patch, and the only change is that nw it process 103941 calls, vs 103942..

            This is the log output just before the crash dump...

            19307:20070608:134636 In get_master_node(0)
            19307:20070608:134636 Query [select masterid from nodes where nodeid=0]
            19307:20070608:134636 In get_slave_node(0)
            19307:20070608:134636 Query [select masterid from nodes where nodeid=0]
            19307:20070608:134636 Query [select nodeid from nodes where masterid=0]
            19307:20070608:134636 In process_node(node:0)
            19307:20070608:134636 In send_to_master_and_slave(node:0)
            19292:20070608:134637 One child process died. Exiting ...
            19293:20070608:134637 Got signal. Exiting ...
            19294:20070608:134637 Got signal. Exiting ...
            19295:20070608:134637 Got signal. Exiting ...
            19296:20070608:134637 Got signal. Exiting ...
            19297:20070608:134637 Got signal. Exiting ...
            19306:20070608:134637 Got signal. Exiting ...
            19308:20070608:134637 Got signal. Exiting ...
            19299:20070608:134637 Got signal. Exiting ...
            19298:20070608:134637 Got signal. Exiting ...
            19302:20070608:134637 Got signal. Exiting ...
            19304:20070608:134637 Got signal. Exiting ...
            19301:20070608:134637 Got signal. Exiting ...
            19300:20070608:134637 Got signal. Exiting ...
            19305:20070608:134637 Got signal. Exiting ...
            19303:20070608:134637 Got signal. Exiting ...
            19292:20070608:134639 ZABBIX Server stopped

            I also compiled the code with -g (CCFLAGS="-g"), and ran it under gdb.

            What is interesting in that senario(to me anyway), is that it DOES NOT crash while either itself, or it parent is running under gdb, but the minute I exit gdb , it immediately crashes.

            Thanks,
            Andrew
            Last edited by agehring; 08-06-2007, 22:00. Reason: error

            Comment

            • agehring
              Member
              • Oct 2006
              • 49

              #7
              GDB OUtput

              GDB reported the following...

              Program received signal EXC_BAD_ACCESS, Could not access memory.
              Reason: KERN_INVALID_ADDRESS at address: 0xbf7fffec
              0x0001b1dc in zabbix_log (level=4, fmt=0x3979c "Query [%s]") at log.c:180
              180 {

              Comment

              • agehring
                Member
                • Oct 2006
                • 49

                #8
                More GDB

                Program received signal EXC_BAD_ACCESS, Could not access memory.
                Reason: KERN_INVALID_ADDRESS at address: 0xbf7fffdc
                0x0001b1dc in zabbix_log (level=4, fmt=0x3979c "Query [%s]") at log.c:180
                180 {

                (gdb) x 0x0001b1dc
                0x1b1dc <zabbix_log+12>: 0x01ec3be8
                (gdb)


                And from here I'm lost...

                Thanks,
                Andrew

                Comment

                • agehring
                  Member
                  • Oct 2006
                  • 49

                  #9
                  Yet more GDB

                  Program received signal EXC_BAD_ACCESS, Could not access memory.
                  Reason: KERN_INVALID_ADDRESS at address: 0xbf7fffec
                  0x9000319e in szone_malloc ()

                  0x9000319e <szone_malloc+12>: 0x16786de8

                  Program starts at approx 161M in size, and grows to 161G in about 46 seconds...
                  Last edited by agehring; 08-06-2007, 23:09. Reason: more info

                  Comment

                  • Alexei
                    Founder, CEO
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Sep 2004
                    • 5654

                    #10
                    I am pretty sure that configuration of nodes is messed up on your system. Please could you send me result of:

                    select * from nodes
                    NodeID from zabbix_server.log


                    Thanks.
                    Alexei Vladishev
                    Creator of Zabbix, Product manager
                    New York | Tokyo | Riga
                    My Twitter

                    Comment

                    • agehring
                      Member
                      • Oct 2006
                      • 49

                      #11
                      mysql> select * from nodes;
                      +--------+------------+----------+-----------+-------+---------------+--------------+--------------+----------------+--------------------+---------------------+----------+----------+
                      | nodeid | name | timezone | ip | port | slave_history | slave_trends | event_lastid | history_lastid | history_str_lastid | history_uint_lastid | nodetype | masterid |
                      +--------+------------+----------+-----------+-------+---------------+--------------+--------------+----------------+--------------------+---------------------+----------+----------+
                      | 0 | Local node | 0 | 127.0.0.1 | 10051 | 30 | 365 | 0 | 0 | 0 | 0 | 1 | 0 |
                      +--------+------------+----------+-----------+-------+---------------+--------------+--------------+----------------+--------------------+---------------------+----------+----------+
                      1 row in set (0.00 sec)


                      zabbix:~ root# cat /etc/zabbix/zabbix_server.conf
                      # This is config file for ZABBIX server process
                      # To get more information about ZABBIX,
                      # go http://www.zabbix.com

                      ############ GENERAL PARAMETERS #################

                      # This defines unique NodeID in distributed setup,
                      # Default value 0 (standalone server)
                      # This parameter must be between 0 and 999
                      #NodeID=0

                      Comment

                      • agehring
                        Member
                        • Oct 2006
                        • 49

                        #12
                        libMallocDebug

                        I enabled libMallocDebug, and am getting the following...

                        libMallocDebug[zabbix_server-719]: frame pointer goes from bffee098 to bfffe758 -- assuming invalid.


                        relevant?

                        Thanks,
                        Andrew

                        Comment

                        • Alexei
                          Founder, CEO
                          Zabbix Certified Trainer
                          Zabbix Certified SpecialistZabbix Certified Professional
                          • Sep 2004
                          • 5654

                          #13
                          Please could you try the latest code from svn://svn.zabbix.com/branches/1.4.1 ? We fixed several memory related issues, it could affect the reported problem.
                          Alexei Vladishev
                          Creator of Zabbix, Product manager
                          New York | Tokyo | Riga
                          My Twitter

                          Comment

                          • agehring
                            Member
                            • Oct 2006
                            • 49

                            #14
                            It still crashes...

                            I'm going to compile with -g again...


                            Andrew

                            Comment

                            • Alexei
                              Founder, CEO
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Sep 2004
                              • 5654

                              #15
                              I'm very very interested in getting more details about this crash! Can you get a backtrace of executed functions? Thank you.
                              Alexei Vladishev
                              Creator of Zabbix, Product manager
                              New York | Tokyo | Riga
                              My Twitter

                              Comment

                              Working...