Ad Widget

Collapse

Zabbix 6 upgrade - HA manager error

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • markfree
    Senior Member
    • Apr 2019
    • 868

    #1

    Zabbix 6 upgrade - HA manager error

    I'm facing an issue with a Zabbix installation using Raspberry Pi 4 64bits with MariaDB 10.5.12.

    I've updated Zabbix from 5.4 to 6.0 and when starting the server, it upgraded the database correctly.
    Right after that, it freezed for a couple of minutes, then issued an HA manager error and restarted.
    25580:20220221:174743.418 server #2 started [configuration syncer #1]
    25578:20220221:174747.410 HA manager has been paused
    25580:20220221:174903.958 slow query: 80.129512 sec, "select i.itemid,f.functionid,f.name,f.parameter,t.trigger id,i.hostid from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0,1) and t.flags<>2"
    25558:20220221:174904.158 cannot obtain HA status: the server HA registry record has changed ownership
    25578:20220221:174904.161 HA manager has been stopped
    25558:20220221:174904.165 Zabbix Server stopped. Zabbix 6.0.0 (revision 5203d2ea7d).


    After restarting, the server proceeded and all processes were started.
    But, again, the HA manager was paused with an error.
    25725:20220221:174918.308 HA manager has been paused
    25724:20220221:174918.308 HA manager error: the server HA registry record has changed ownership
    free(): double free detected in tcache 2

    Code:
    25558:20220221:174743.376 completed 98% of database upgrade
    25558:20220221:174743.380 completed 99% of database upgrade
    25558:20220221:174743.381 completed 100% of database upgrade
    25558:20220221:174743.381 database upgrade fully completed
    25558:20220221:174743.384 database could be upgraded to use primary keys in history tables
    25578:20220221:174743.407 starting HA manager
    25578:20220221:174743.413 HA manager started in active mode
    25558:20220221:174743.416 server #0 started [main process]
    25579:20220221:174743.417 server #1 started [service manager #1]
    25580:20220221:174743.418 server #2 started [configuration syncer #1]
    25578:20220221:174747.410 HA manager has been paused
    25580:20220221:174903.958 slow query: 80.129512 sec, "select i.itemid,f.functionid,f.name,f.parameter,t.trigger id,i.hostid from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0,1) and t.flags<>2"
    25558:20220221:174904.158 cannot obtain HA status: the server HA registry record has changed ownership
    25578:20220221:174904.161 HA manager has been stopped
    25558:20220221:174904.165 Zabbix Server stopped. Zabbix 6.0.0 (revision 5203d2ea7d).
    25724:20220221:174914.244 Starting Zabbix Server. Zabbix 6.0.0 (revision 5203d2ea7d).
    25724:20220221:174914.244 ****** Enabled features ******
    25724:20220221:174914.245 SNMP monitoring: YES
    25724:20220221:174914.245 IPMI monitoring: YES
    25724:20220221:174914.245 Web monitoring: YES
    25724:20220221:174914.245 VMware monitoring: YES
    25724:20220221:174914.245 SMTP authentication: YES
    25724:20220221:174914.245 ODBC: YES
    25724:20220221:174914.245 SSH support: YES
    25724:20220221:174914.245 IPv6 support: YES
    25724:20220221:174914.245 TLS support: YES
    25724:20220221:174914.245 ******************************
    25724:20220221:174914.246 using configuration file: /etc/zabbix/zabbix_server.conf
    25724:20220221:174914.260 current database version (mandatory/optional): 06000000/06000000
    25724:20220221:174914.260 required mandatory version: 06000000
    25724:20220221:174914.266 database could be upgraded to use primary keys in history tables
    25725:20220221:174914.304 starting HA manager
    25725:20220221:174914.311 HA manager started in active mode
    25724:20220221:174914.315 server #0 started [main process]
    25726:20220221:174914.317 server #1 started [service manager #1]
    25727:20220221:174914.319 server #2 started [configuration syncer #1]
    25729:20220221:174914.966 server #3 started [alert manager #1]
    25730:20220221:174914.967 server #4 started [alerter #1]
    25731:20220221:174914.969 server #5 started [alerter #2]
    25732:20220221:174914.971 server #6 started [alerter #3]
    25733:20220221:174914.972 server #7 started [preprocessing manager #1]
    25734:20220221:174914.974 server #8 started [preprocessing worker #1]
    25735:20220221:174914.975 server #9 started [preprocessing worker #2]
    25736:20220221:174914.977 server #10 started [preprocessing worker #3]
    25737:20220221:174914.978 server #11 started [lld manager #1]
    25738:20220221:174914.980 server #12 started [lld worker #1]
    25739:20220221:174914.981 server #13 started [lld worker #2]
    25741:20220221:174914.983 server #14 started [housekeeper #1]
    25742:20220221:174914.984 server #15 started [timer #1]
    25744:20220221:174914.986 server #16 started [http poller #1]
    25745:20220221:174914.987 server #17 started [discoverer #1]
    25748:20220221:174914.991 server #18 started [history syncer #1]
    25749:20220221:174914.992 server #19 started [history syncer #2]
    25751:20220221:174914.995 server #20 started [history syncer #3]
    25753:20220221:174914.997 server #21 started [history syncer #4]
    25758:20220221:174915.002 server #24 started [self-monitoring #1]
    25757:20220221:174915.002 server #23 started [proxy poller #1]
    25760:20220221:174915.005 server #26 started [poller #1]
    25755:20220221:174915.005 server #22 started [escalator #1]
    25763:20220221:174915.009 server #28 started [poller #3]
    25762:20220221:174915.011 server #27 started [poller #2]
    25759:20220221:174915.015 server #25 started [task manager #1]
    25765:20220221:174915.017 server #29 started [poller #4]
    25768:20220221:174915.018 server #32 started [trapper #1]
    25767:20220221:174915.022 server #31 started [unreachable poller #1]
    25772:20220221:174915.024 server #35 started [trapper #4]
    25766:20220221:174915.025 server #30 started [poller #5]
    25770:20220221:174915.027 server #33 started [trapper #2]
    25776:20220221:174915.031 server #36 started [trapper #5]
    25771:20220221:174915.032 server #34 started [trapper #3]
    25778:20220221:174915.035 server #38 started [alert syncer #1]
    25779:20220221:174915.036 server #39 started [history poller #1]
    25782:20220221:174915.037 server #40 started [history poller #2]
    25777:20220221:174915.039 server #37 started [icmp pinger #1]
    25783:20220221:174915.040 server #41 started [history poller #3]
    25786:20220221:174915.042 server #42 started [history poller #4]
    25789:20220221:174915.044 server #43 started [history poller #5]
    25790:20220221:174915.046 server #44 started [availability manager #1]
    25794:20220221:174915.048 server #45 started [report manager #1]
    25796:20220221:174915.051 server #46 started [report writer #1]
    25797:20220221:174915.053 server #47 started [trigger housekeeper #1]
    25799:20220221:174915.056 server #48 started [odbc poller #1]
    25725:20220221:174918.308 HA manager has been paused
    25724:20220221:174918.308 HA manager error: the server HA registry record has changed ownership
    free(): double free detected in tcache 2


    Now, I can access Zabbix frontend and it is actually collecting new data like nothing was wrong.
    Despite that, Systemd is showing Zabbix-server as "deactivating".

    Code:
    $ sudo systemctl status zabbix-server
    ● zabbix-server.service - Zabbix Server
    Loaded: loaded (/lib/systemd/system/zabbix-server.service; enabled; vendor preset: enabled)
    Active: deactivating (stop-sigterm) (Result: signal) since Mon 2022-02-21 17:49:18 -03; 13min ago
    Process: 25722 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
    Main PID: 25724 (code=killed, signal=ABRT)
    Tasks: 49 (limit: 4915)
    CPU: 17.244s
    (...)
    Feb 21 17:49:14 Double systemd[1]: Starting Zabbix Server...
    Feb 21 17:49:14 Double systemd[1]: zabbix-server.service: Supervising process 25724 which is not our child. We'll most likely not notice when it exits.
    Feb 21 17:49:14 Double systemd[1]: Started Zabbix Server.
    Feb 21 17:49:18 Double systemd[1]: zabbix-server.service: Main process exited, code=killed, status=6/ABRT


    Any thoughts on that?
    Last edited by markfree; 21-02-2022, 23:55.
  • markfree
    Senior Member
    • Apr 2019
    • 868

    #2
    I found some topics that relate to mine:The second one was resolved by using RPi 64 bits.
    But I'm not sure about that because my Pi was already set to 64 bits within the boot config.

    Code:
    # 64bit arch
    arm_64bit=1


    Anyhow, I reinstalled my Pi OS with the new 64bit release, restored the database, and now it works fine.

    Comment

    • maxonthenet
      Junior Member
      • Mar 2022
      • 4

      #3
      I think a (very) similar issue has been reported in ZBX-20715, which has been closed and is now merged in ZBX-20661.
      Since I am also experiencing the same issue on a 32-bit Raspberry Pi OS distribution, I have just added my findings as a comment to the latter issue.

      Comment

      • maxonthenet
        Junior Member
        • Mar 2022
        • 4

        #4
        After some recent code changes, it is very likely that this problem has been addressed and solved.
        I think that at least 3 issues contributed to the problem:A few days ago, release 6.0.3rc1 of Zabbix has been published. Its release notes mention the above issues (among others), and address them as indicated between parentheses above.
        Looking at the release history, it is likely that 6.0.3 may be out in a few days but, in the meantime, recompiling the Zabbix server from source indeed has completely solved the problem for me on a 32-bit system.

        Comment

        Working...