Ad Widget

Collapse

Galera Concurrency Issues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • the.monitor
    Junior Member
    Zabbix Certified Specialist
    • Aug 2019
    • 22

    #1

    Galera Concurrency Issues

    Hi all,

    One that's been causing a large amount of headaches for me.

    We're running a galera backend to support our zabbix infrastructure. Recently the database has been falling over with the following error;

    Code:
    ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
    Our investigation into mysql logs points towards a concurrency error taking down the cluster.

    We're still on Zabbix 4.0.25 -

    On the Database nodes were using 1GB NICS - 256GB RAM on each node and PCI-SSDs for storage.

    We have ~10 DBSyncers running to handle any excessive load (like a proxy needing to send catchup data).

    Our NVPS is rather low though for normal times averaging at ~2200.

    We've patched galera, we've patched zabbix - we tuned down the amount of DBsyncers - I'm a little stuck on how to further mitigate against this. Strangely this has only really started happening since we moved from Zabbix 3.2 to Zabbix 4.0.

    If anyone can help or provide any insight it'd be very welcome.

    Edit: Additional Information.

    This behaviour ALWAYS happens immediately after a housekeeper
    Last edited by the.monitor; 12-10-2020, 13:42.
Working...