Ad Widget

Collapse

Zabbix Server Flapping on Centos 7

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • e-coder
    Junior Member
    • Sep 2016
    • 8

    #1

    Zabbix Server Flapping on Centos 7

    Hi There!

    I don't know what to do anymore.

    Zabbix Server always stops and starts and I think this causes the Server to just stop working from time to time.

    I am running Zabbix 3.2.6.

    /var/log/messages says:
    Code:
    Jun  6 09:20:38 zabbix systemd: zabbix-server.service: control process exited, code=exited status=1
    Jun  6 09:20:38 zabbix systemd: Unit zabbix-server.service entered failed state.
    Jun  6 09:20:38 zabbix systemd: zabbix-server.service failed.
    Jun  6 09:20:49 zabbix systemd: zabbix-server.service holdoff time over, scheduling restart.
    Jun  6 09:20:49 zabbix systemd: Starting Zabbix Server...
    Jun  6 09:20:49 zabbix systemd: zabbix-server.service: Supervising process 20752 which is not our child. We'll most likely not notice when it exits.
    Jun  6 09:20:49 zabbix systemd: Started Zabbix Server.
    Jun  6 09:25:37 zabbix kill: Usage:
    Jun  6 09:25:37 zabbix kill: kill [options] <pid|name> [...]
    Jun  6 09:25:37 zabbix kill: Options:
    Jun  6 09:25:37 zabbix kill: -a, --all              do not restrict the name-to-pid conversion to processes
    Jun  6 09:25:37 zabbix kill: with the same uid as the present process
    Jun  6 09:25:37 zabbix kill: -s, --signal <sig>     send specified signal
    Jun  6 09:25:37 zabbix kill: -q, --queue <sig>      use sigqueue(2) rather than kill(2)
    Jun  6 09:25:37 zabbix kill: -p, --pid              print pids without signaling them
    Jun  6 09:25:37 zabbix kill: -l, --list [=<signal>] list signal names, or convert one to a name
    Jun  6 09:25:37 zabbix kill: -L, --table            list signal names and numbers
    Jun  6 09:25:37 zabbix kill: -h, --help     display this help and exit
    Jun  6 09:25:37 zabbix kill: -V, --version  output version information and exit
    Jun  6 09:25:37 zabbix kill: For more details see kill(1).

    zabbix_server.log says:
    Code:
     21229:20170606:093034.931 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0x7f0031bb11c8]. Crashing ...
     21229:20170606:093034.931 ====== Fatal information: ======
     21229:20170606:093034.931 Program counter: 0x7f829d6c7491
     21229:20170606:093034.931 === Registers: ===
     21229:20170606:093034.931 r8      =     7ffe31bb00f4 =      140729732759796 =      140729732759796
     21229:20170606:093034.931 r9      =     7ffe31bb0110 =      140729732759824 =      140729732759824
     21229:20170606:093034.931 r10     =              200 =                  512 =                  512
     21229:20170606:093034.931 r11     =                1 =                    1 =                    1
     21229:20170606:093034.931 r12     =     7f829d71ea74 =      140198963964532 =      140198963964532
     21229:20170606:093034.931 r13     =                5 =                    5 =                    5
     21229:20170606:093034.931 r14     =                0 =                    0 =                    0
     21229:20170606:093034.931 r15     =                0 =                    0 =                    0
     21229:20170606:093034.931 rdi     =     7f829eefdf70 =      140198988996464 =      140198988996464
     21229:20170606:093034.931 rsi     =     7f829ef1e470 =      140198989128816 =      140198989128816
     21229:20170606:093034.931 rbp     =     7ffe31bb1170 =      140729732764016 =      140729732764016
     21229:20170606:093034.931 rbx     =                5 =                    5 =                    5
     21229:20170606:093034.931 rdx     =                2 =                    2 =                    2
     21229:20170606:093034.931 rax     =     7f0031bb11a0 =      139638811070880 =      139638811070880
     21229:20170606:093034.931 rcx     =                0 =                    0 =                    0
     21229:20170606:093034.931 rsp     =     7ffe31bb0910 =      140729732761872 =      140729732761872
     21229:20170606:093034.931 rip     =     7f829d6c7491 =      140198963606673 =      140198963606673
     21229:20170606:093034.931 efl     =            10293 =                66195 =                66195
     21229:20170606:093034.931 csgsfs  =               33 =                   51 =                   51
     21229:20170606:093034.931 err     =                4 =                    4 =                    4
     21229:20170606:093034.932 trapno  =                e =                   14 =                   14
     21229:20170606:093034.932 oldmask =                0 =                    0 =                    0
     21229:20170606:093034.932 cr2     =     7f0031bb11c8 =      139638811070920 =      139638811070920
     21229:20170606:093034.932 === Backtrace: ===
     21229:20170606:093034.932 14: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](print_fatal_info+0x114) [0x7f829d686b40]
     21229:20170606:093034.933 13: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](+0xccefc) [0x7f829d686efc]
     21229:20170606:093034.933 12: /lib64/libc.so.6(+0x35250) [0x7f829a1be250]
     21229:20170606:093034.933 11: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](odbc_DBfetch+0x260) [0x7f829d6c7491]
     21229:20170606:093034.933 10: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](+0x587d9) [0x7f829d6127d9]
     21229:20170606:093034.933 9: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](get_value_db+0xa9) [0x7f829d612a1c]
     21229:20170606:093034.933 8: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](+0x4aead) [0x7f829d604ead]
     21229:20170606:093034.933 7: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](+0x4bff5) [0x7f829d605ff5]
     21229:20170606:093034.933 6: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](poller_thread+0x1b2) [0x7f829d606cae]
     21229:20170606:093034.933 5: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](zbx_thread_start+0x37) [0x7f829d687bd3]
     21229:20170606:093034.933 4: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](MAIN_ZABBIX_ENTRY+0x5b3) [0x7f829d5f3c6d]
     21229:20170606:093034.933 3: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](daemon_start+0x32f) [0x7f829d686020]
     21229:20170606:093034.933 2: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](main+0x2ba) [0x7f829d5f36b8]
     21229:20170606:093034.933 1: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f829a1aab35]
     21229:20170606:093034.933 0: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](+0x2df49) [0x7f829d5e7f49]
     21229:20170606:093034.933 === Memory map: ===
     21229:20170606:093034.933 7f81f4840000-7f81f4842000 r-xp 00000000 08:14 3941978                    /usr/lib64/gconv/CP1252.so
     21229:20170606:093034.933 7f81f4842000-7f81f4a41000 ---p 00002000 08:14 3941978                    /usr/lib64/gconv/CP1252.so
     21229:20170606:093034.933 7f81f4a41000-7f81f4a42000 r--p 00001000 08:14 3941978                    /usr/lib64/gconv/CP1252.so
     21229:20170606:093034.933 7f81f4a42000-7f81f4a43000 rw-p 00002000 08:14 3941978                    /usr/lib64/gconv/CP1252.so
    MEMORY MAP HERE
     21229:20170606:093034.942 ================================
     21229:20170606:093034.942 Please consider attaching a disassembly listing to your bug report.
     21229:20170606:093034.942 This listing can be produced with, e.g., objdump -DSswx zabbix_server.
     21229:20170606:093034.942 ================================
     21222:20170606:093034.945 One child process died (PID:21229,exitcode/signal:1). Exiting ...
     21222:20170606:093036.946 syncing history data...
     21222:20170606:093036.946 syncing history data done
     21222:20170606:093036.946 syncing trend data...
     21222:20170606:093037.842 syncing trend data done
     21222:20170606:093037.842 Zabbix Server stopped. Zabbix 3.2.6 (revision 67849).
     21697:20170606:093047.886 Starting Zabbix Server. Zabbix 3.2.6 (revision 67849).
    And also i See in systemctl status zabbix-server:

    Code:
     zabbix-server.service - Zabbix Server
       Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
       Active: active (running) since Die 2017-06-06 09:35:47 CEST; 48s ago
      Process: 22172 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=1/FAILURE)
      Process: 22174 ExecStart=/usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf (code=exited, status=0/SUCCESS)
     Main PID: 22176 (zabbix_server)
    CHILD PROCCESSES HERE
    Jun 06 09:35:47 zabbix.idm-lab.local systemd[1]: Starting Zabbix Server...
    Jun 06 09:35:47 zabbix.idm-lab.local systemd[1]: zabbix-server.service: Supervising process 22176 which is not our child. We'll most likely not notice when it exits.
    Jun 06 09:35:47 zabbix.idm-lab.local systemd[1]: Started Zabbix Server.

    PID Files in the Config and Unit files are identical!

    Does anyone know what to do?

    Thanks in Advance!
  • Atsushi
    Senior Member
    • Aug 2013
    • 2028

    #2
    Looking at the log, it seems that it is abnormally terminated by ODBC processing.
    Please temporarily disable items using ODBC.

    Comment

    • e-coder
      Junior Member
      • Sep 2016
      • 8

      #3
      Thank you for replying!

      Where do you read that this is due to an odbc crash?

      I don't have any ODBC Logs running.

      The thing is, if I disable alle the odbc checks, zabbix gets almost useless because we have many SQL checks Using the odbc drivers.

      Also what bugs me is why it always says "Not a child Process" and the kill syntax error....

      Thank you!

      Regards
      David

      Comment

      • Atsushi
        Senior Member
        • Aug 2013
        • 2028

        #4
        Please carefully check the contents of the log zabbix_server.log.
        Firstly, since the following log is SIGSEGV, you can see that abnormal termination occurred due to illegal memory access.

        Code:
         21229:20170606:093034.931 Got signal [signal:11(SIGSEGV),reason:1,refaddr:0x7f0031bb11c8]. Crashing ...
        I will carefully look at the following logs.

        Code:
         21229:20170606:093034.932 === Backtrace: ===
         21229:20170606:093034.932 14: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](print_fatal_info+0x114) [0x7f829d686b40]
         21229:20170606:093034.933 13: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](+0xccefc) [0x7f829d686efc]
         21229:20170606:093034.933 12: /lib64/libc.so.6(+0x35250) [0x7f829a1be250]
         21229:20170606:093034.933 11: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](odbc_DBfetch+0x260) [0x7f829d6c7491]
         21229:20170606:093034.933 10: /usr/sbin/zabbix_server: poller #5 [got 9 values in 0.009092 sec, getting values](+0x587d9) [0x7f829d6127d9]
        "Poller # 5" is the process of getting the value of an item called poller.
        It looks like there was a problem with this process.

        The function "print_fatal_info()" will be called when a fatal error occurs, so this will not be the cause of the failure.
        Then, it is expected that a failure occurred with the function "odbc_DBfetch ()" below that.

        There is a possibility that ODBC problem occurred from this log message.
        In order to identify the cause, I suggested that you confirm that the Zabbix server can be started without using ODBC processing.

        Please reconfirm the driver and settings used for ODBC access.

        Comment

        • e-coder
          Junior Member
          • Sep 2016
          • 8

          #5
          Hi!

          Thanks for the reply.

          I Downgraded the mysql-odbc connector 2 Versions down, but the problem still exists....

          Right now i am on mysql-connector-odbc.x86_64 0:5.3.6-1.el7

          When I disable all Database checks by hand, that would take days and 70% of the functionality is gone because most of our checks are mysql checks.

          Comment

          Working...