Ad Widget

Collapse

reason:1,refaddr:(nil) on Zabbix Server 5.4.4

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • josh.baker
    Junior Member
    • Oct 2021
    • 6

    #1

    reason:1,refaddr:(nil) on Zabbix Server 5.4.4

    Hello

    I am experiencing a constant restart of the Zabbix server 5.4.4, the log points to refaddrnil). Does anyone have an idea as to what causes this. I have seen an article somewhere that said it could be a misconfiguration of one of the tables but I have no idea which one it could be.

    Some help with this issue would be greatly appreciated.
    Thanks

    Additional Info
    On proxmox virtual environment
    Ubuntu 20.04
    Zabbix 5.4.4
    10.3.31-MariaDB

    Code:
    1287339:20211004:132457.724 Starting Zabbix Server. Zabbix 5.4.4 (revision 1765c4f1bc).
    1287339:20211004:132457.724 ****** Enabled features ******
    1287339:20211004:132457.724 SNMP monitoring: YES
    1287339:20211004:132457.724 IPMI monitoring: YES
    1287339:20211004:132457.724 Web monitoring: YES
    1287339:20211004:132457.724 VMware monitoring: YES
    1287339:20211004:132457.724 SMTP authentication: YES
    1287339:20211004:132457.724 ODBC: YES
    1287339:20211004:132457.724 SSH support: YES
    1287339:20211004:132457.724 IPv6 support: YES
    1287339:20211004:132457.724 TLS support: YES
    ... redacted for character limit...
    1287373:20211004:133139.687 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
    1287373:20211004:133139.687 ====== Fatal information: ======
    1287373:20211004:133139.687 Program counter: (nil)
    1287373:20211004:133139.687 === Registers: ===
    1287373:20211004:133139.687 r8 = 0 = 0 = 0
    1287373:20211004:133139.687 r9 = 5645f1afd010 = 94858407563280 = 94858407563280
    1287373:20211004:133139.687 r10 = 5645f1afd010 = 94858407563280 = 94858407563280
    1287373:20211004:133139.687 r11 = 7f54d4adabe0 = 140002322131936 = 140002322131936
    1287373:20211004:133139.687 r12 = ffffffff = 4294967295 = 4294967295
    1287373:20211004:133139.687 r13 = 0 = 0 = 0
    1287373:20211004:133139.687 r14 = 0 = 0 = 0
    1287373:20211004:133139.687 r15 = 0 = 0 = 0
    1287373:20211004:133139.687 rdi = 5645f1b71320 = 94858408039200 = 94858408039200
    1287373:20211004:133139.687 rsi = 2 = 2 = 2
    1287373:20211004:133139.687 rbp = 0 = 0 = 0
    1287373:20211004:133139.687 rbx = 0 = 0 = 0
    1287373:20211004:133139.687 rdx = 0 = 0 = 0
    1287373:20211004:133139.687 rax = 5645f1b99240 = 94858408202816 = 94858408202816
    1287373:20211004:133139.687 rcx = 0 = 0 = 0
    1287373:20211004:133139.687 rsp = 7ffdd640f670 = 140728198035056 = 140728198035056
    1287373:20211004:133139.687 rip = 0 = 0 = 0
    1287373:20211004:133139.687 efl = 10206 = 66054 = 66054
    1287373:20211004:133139.687 csgsfs = 2b000000000033 = 12103423998558259 = 12103423998558259
    1287373:20211004:133139.687 err = 14 = 20 = 20
    1287373:20211004:133139.687 trapno = e = 14 = 14
    1287373:20211004:133139.687 oldmask = 0 = 0 = 0
    1287373:20211004:133139.687 cr2 = 0 = 0 = 0
    1287373:20211004:133139.687 === Backtrace: ===
    1287373:20211004:133139.688 3: /usr/sbin/zabbix_server: poller #5 [got 1 values in 0.002742 sec, getting values](zbx_backtrace+0x52) [0x5645f089ba1a]
    1287373:20211004:133139.688 2: /usr/sbin/zabbix_server: poller #5 [got 1 values in 0.002742 sec, getting values](zbx_log_fatal_info+0x183) [0x5645f089bd13]
    1287373:20211004:133139.688 1: /usr/sbin/zabbix_server: poller #5 [got 1 values in 0.002742 sec, getting values](+0x23c6c5) [0x5645f089c6c5]
    1287373:20211004:133139.688 0: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f54d56de3c0]
    1287373:20211004:133139.688 === Memory map: ===
    ... redacted for character limit ...
    1287373:20211004:133139.707 ================================
    1287373:20211004:133139.707 Please consider attaching a disassembly listing to your bug report.
    1287373:20211004:133139.707 This listing can be produced with, e.g., objdump -DSswx zabbix_server.
    1287373:20211004:133139.707 ================================
    1287339:20211004:133139.709 One child process died (PID:1287373,exitcode/signal:1). Exiting ...
  • josh.baker
    Junior Member
    • Oct 2021
    • 6

    #2
    Following up after some personal research, looking at some bug related issues led me down this path...

    Switch logging to level 4 for zabbix server, debugging level

    waited for error to come around again,

    got this as log output when crash happens

    Code:
    1299783:20211006:083139.352 In substitute_key_macros_impl() data:'lldp.discovery[{HOST.CONN},{$SNMP_COMMUNITY},remote]'
    1299783:20211006:083139.352 In substitute_simple_macros_impl() data:'{HOST.CONN}'
    1299783:20211006:083139.352 End substitute_simple_macros_impl() data:'192.30.1.17'
    1299783:20211006:083139.352 In substitute_simple_macros_impl() data:'{$SNMP_COMMUNITY}'
    1299783:20211006:083139.352 In DCget_user_macro() macro:'{$SNMP_COMMUNITY}'
    1299783:20211006:083139.352 End of DCget_user_macro()
    1299783:20211006:083139.352 End substitute_simple_macros_impl() data:'public'
    1299783:20211006:083139.352 End of substitute_key_macros_impl():SUCCEED data:'lldp.discovery[192.30.1.17,public,remote]'
    1299783:20211006:083139.352 In substitute_simple_macros_impl() data:EMPTY
    1299783:20211006:083139.352 In substitute_simple_macros_impl() data:EMPTY
    1299783:20211006:083139.352 In get_value() key:'lldp.discovery[{HOST.CONN},{$SNMP_COMMUNITY},remote]'
    1299783:20211006:083139.352 In get_value_simple() key_orig:'lldp.discovery[{HOST.CONN},{$SNMP_COMMUNITY},remote]' addr:'192.30.1.17'
    1299783:20211006:083139.353 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
    1299783:20211006:083139.353 ====== Fatal information: ======
    1299783:20211006:083139.353 Program counter: (nil)
    1299783:20211006:083139.353 === Registers: ===
    1299783:20211006:083139.353 r8 = 0 = 0 = 0
    1299783:20211006:083139.353 r9 = 7fa688e62b80 = 140353238084480 = 140353238084480
    1299783:20211006:083139.353 r10 = 0 = 0 = 0
    1299783:20211006:083139.353 r11 = 7 = 7 = 7
    1299783:20211006:083139.353 r12 = ffffffff = 4294967295 = 4294967295
    1299783:20211006:083139.353 r13 = 0 = 0 = 0
    1299783:20211006:083139.353 r14 = 0 = 0 = 0
    1299783:20211006:083139.353 r15 = 0 = 0 = 0
    1299783:20211006:083139.353 rdi = 564c516a50f0 = 94885783425264 = 94885783425264
    1299783:20211006:083139.353 rsi = 2 = 2 = 2
    1299783:20211006:083139.353 rbp = 0 = 0 = 0
    1299783:20211006:083139.353 rbx = 0 = 0 = 0
    1299783:20211006:083139.353 rdx = 0 = 0 = 0
    1299783:20211006:083139.353 rax = 564c516a4df0 = 94885783424496 = 94885783424496
    1299783:20211006:083139.354 rcx = 0 = 0 = 0
    1299783:20211006:083139.354 rsp = 7ffd3b49ba80 = 140725598141056 = 140725598141056
    1299783:20211006:083139.354 rip = 0 = 0 = 0
    1299783:20211006:083139.354 efl = 10202 = 66050 = 66050
    1299783:20211006:083139.354 csgsfs = 2b000000000033 = 12103423998558259 = 12103423998558259
    1299783:20211006:083139.354 err = 14 = 20 = 20
    1299783:20211006:083139.354 trapno = e = 14 = 14
    1299783:20211006:083139.354 oldmask = 0 = 0 = 0
    1299783:20211006:083139.354 cr2 = 0 = 0 = 0
    1299783:20211006:083139.354 === Backtrace: ===
    1299783:20211006:083139.355 3: /usr/sbin/zabbix_server: poller #3 [got 1 values in 0.005125 sec, getting values](zbx_backtrace+0x52) [0x564c5124aa1a]
    1299783:20211006:083139.355 2: /usr/sbin/zabbix_server: poller #3 [got 1 values in 0.005125 sec, getting values](zbx_log_fatal_info+0x183) [0x564c5124ad13]
    1299783:20211006:083139.355 1: /usr/sbin/zabbix_server: poller #3 [got 1 values in 0.005125 sec, getting values](+0x23c6c5) [0x564c5124b6c5]
    1299783:20211006:083139.355 0: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fa689a663c0]
    It looks to me that this is an snmp call to my switch from ubiquiti. So, the question now is, why would this device return null statement?

    As part of the trouble shooting process, I will now disable the templates of snmp applied to this device to see if zabbix server becomes stable again. If it does then I at least know that this device and or the snmp template needs to be adjusted.

    Another odd thing to note as I go through the process in my head, I have multiple ubiquiti devices on my network, so why only this particular switch is giving the issue?

    Network Generic Device SNMP (EtherLike-MIB SNMP, Generic SNMP, Interfaces Simple SNMP), Template LLDP - General are the template involved. The template lldp - general is from the l2 discovery module for lldp found on zabbix share

    Further details when I find them.

    Comment

    • josh.baker
      Junior Member
      • Oct 2021
      • 6

      #3
      Under some preliminary review, I now think I know what is going on or happened,

      removing the L2 Discovery Module template from the switch seems to have ceased all crashing behavior. That is good and ok, that particular template was being used to see if I could expand automation and discovery of switch port status anyway. I will have to find a new way to do such things, manually assigning on network map is not my favorite way of doing it but maybe my approach is wrong. Still learning Zabbix here so I am sure someone could point out a way to do it.

      All of that aside, here are some things to note about this scenario...

      When a publisher of a template has some readme about the plugin module(e.g. libget.so) and they state it should be used in Cent or Redhat flavor, it would seem to be correct and no cross compatibility should be expected, like running in ubuntu.

      Child processes can or plugin can cause zabbix to crash, don't know if this is expected behavior, but it is the fact. I do kind of wish instead of crashing outright it would just say what module failed and attempt to continue. I am no programmer so I can't really say if that is attainable.

      I also came to a conclusion that all snmp templates were effecting, that is incorrect, it was just the one module and that template was only applied to the one switch at the time.(for testing purposes, which is a good way of doing it, don't always go to apply your templates system wide until you can get verified expected results)

      I hope these findings will help someone in the future.

      Comment

      • tim.mooney
        Senior Member
        • Dec 2012
        • 1427

        #4
        Originally posted by josh.baker
        Under some preliminary review, I now think I know what is going on or happened,

        removing the L2 Discovery Module template from the switch seems to have ceased all crashing behavior. That is good and ok, that particular template was being used to see if I could expand automation and discovery of switch port status anyway. I will have to find a new way to do such things, manually assigning on network map is not my favorite way of doing it but maybe my approach is wrong. Still learning Zabbix here so I am sure someone could point out a way to do it.

        All of that aside, here are some things to note about this scenario...

        When a publisher of a template has some readme about the plugin module(e.g. libget.so) and they state it should be used in Cent or Redhat flavor, it would seem to be correct and no cross compatibility should be expected, like running in ubuntu.

        Child processes can or plugin can cause zabbix to crash, don't know if this is expected behavior, but it is the fact. I do kind of wish instead of crashing outright it would just say what module failed and attempt to continue. I am no programmer so I can't really say if that is attainable.
        Nice job debugging this, and it's also great that you shared your results.

        Regarding the crash: child processes really shouldn't be able to cause zabbix to crash. If that's happening, it's indicative of a bug. Zabbix isn't handling an error condition that it should be somewhere.

        Plugins are a little different if the plugin is actually compiled code and it's loaded into Zabbix's address space (like a shared library). If that's the case, then it is definitely possible that problem in the plugin could cause the zabbix server process to crash. I would be very hesitant to load plugins into the same address space as the Zabbix server if those plugins didn't come from a trusted source. They would also have to be confirmed to work with the Zabbix series that you're using. Using a plugin that was written for Zabbix 4.0 or even 5.0 just may not be compatible with Zabbix 5.2 or 5.4.

        You don't say exactly which template you're using or where it came from, but if it's having you load shared libraries (like libget.so) into zabbix server, that's something that I would investigate very carefully before I would even consider loading it on my Zabbix server.

        Comment

        • josh.baker
          Junior Member
          • Oct 2021
          • 6

          #5
          All good points tim.mooney,

          Just for clarification

          The plugin was called L2 Discovery Module for lldp and it was listed in the zabbix share @
          HTML Code:
          https://share.zabbix.com/network_devices/l2-discovery-module-for-lldp
          From the release notes the author says it works on 5.4 but if looked at closely it mentions el8 or el7 which means a different flavor of linux than what I was using.

          I am glad you liked my troubleshooting, and again, I hope it will help some one out in the future.

          Comment

          Working...