Ad Widget

Collapse

MySQL Server has gone away

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • syndeysider
    Senior Member
    • Oct 2013
    • 115

    #1

    MySQL Server has gone away

    Zabbix 2.0.7
    Zabbix Proxy 2.0.7

    both running on quad core 3.4GHZ
    8GB Ram
    15k Disks


    103045 items
    NVPS = 508
    Hosts = 750

    my.cnf

    [mysqld]
    port = 3306
    socket = /var/lib/mysql/mysql.sock
    # Change following line if you want to store your database elsewhere
    datadir = /var/lib/mysql
    skip-external-locking
    key_buffer_size = 16M
    max_allowed_packet = 128M
    table_open_cache = 64
    sort_buffer_size = 512K
    net_buffer_length = 8K
    read_buffer_size = 256K
    read_rnd_buffer_size = 512K
    myisam_sort_buffer_size = 8M
    max_connections = 10000
    query_cache_size = 512M
    query_cache_limit = 2M
    table_cache = 1024
    # Changes
    innodb_buffer_pool_size = 5120M
    innodb_additional_mem_pool_size = 20M
    innodb_log_file_size = 64M
    thread_cache = 16
    innodb_flush_log_at_trx_commit = 2
    ## /End Changes
    innodb_file_per_table
    thread_cache_size = 8
    wait_timeout = 10800
    net_read_timeout = 3600
    expire_logs_days = 7

    zabbix_log
    28150:20131205:143311.053 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28355:20131205:143319.143 item [host1.1.com:system.topcpu[15]] became not supported: Not supported by Zabbix Agent
    28137:20131205:143335.192 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28075:20131205:143339.489 SNMP item [ifInOctets[FastEthernet0/23]] on host [host1.1.com] failed: first network error, wait for 60 seconds
    28135:20131205:143351.636 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28135:20131205:143351.658 SNMP item [ifInOctets[FastEthernet2]] on host [host1.1.com] failed: another network error, wait for 60 seconds
    28054:20131205:143402.650 [Z3005] query failed: [2006] MySQL server has gone away [select hostid,key_,status,filter,error,lifetime from items where itemid=100100000253540]
    28114:20131205:143410.546 resuming SNMP checks on host host1.1.com]: connection restored
    28114:20131205:143410.546 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28183:20131205:143419.425 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28360:20131205:143419.509 item [host1.1.com:system.topcpu[15]] became supported
    Timeout: No Response from xxxxxxx
    28081:20131205:143431.265 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28136:20131205:143438.639 resuming SNMP checks on host [host1.1.com]: connection restored
    28136:20131205:143438.674 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28106:20131205:143457.554 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
    28106:20131205:143457.572 SNMP item [ifOutErrors[FastEthernet3]] on host [host1.1.com] failed: another network error, wait for 60 seconds


    I've tried fiddling with the MySQL Setup to try get it as tweaked as possible....

    Any pointers in the right direction here?
    Last edited by syndeysider; 20-03-2014, 00:57.
  • jan.garaj
    Senior Member
    Zabbix Certified Specialist
    • Jan 2010
    • 506

    #2
    1.) check health of your network between Zabbix and DB server
    I have also messages MySQL Server has gone away in my Zabbix server log, but it is in my test environment, where DB and Zabbix server are in geographically different locations (with high ping latency, what is problem).

    2.) check health of your MySQL server
    - check status: number of Aborted_clients, Aborted_connects, Max_used_connections, ...
    - swapping
    - CPU usage/load
    - dmesg log
    - network/MySQL bandwidth
    ...

    I see in your log file messages related to network issues:
    Code:
    28075:20131205:143339.489 SNMP item [ifInOctets[FastEthernet0/23]] on host [host1.1.com] failed: first network error, wait for 60 seconds
    28135:20131205:143351.658 SNMP item [ifInOctets[FastEthernet2]] on host [host1.1.com] failed: another network error, wait for 60 seconds
    28114:20131205:143410.546 resuming SNMP checks on host host1.1.com]: connection restored
    IMHO: focus on your network health.
    Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
    My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

    Comment

    • tchjts1
      Senior Member
      • May 2008
      • 1605

      #3
      In addition to the excellent tips that jan has offered above, also take a look at my post regarding overall system health here: https://www.zabbix.com/forum/showthread.php?t=41219

      In particular, the last paragraph and the graphs that I posted. What are yours looking like over a 24 hour period?

      Comment

      • syndeysider
        Senior Member
        • Oct 2013
        • 115

        #4
        As always, thanks for the comments.

        1. The zabbix instance and mysql db is local.
        2. Network comms is intermittent for some hosts as they are on unreliable links. This is acceptable in this particular instance.
        3. I apologise for not posting the Screens from the Zabbix Performance charts to start with.

        I've since found that this occurs due to my housekeeper process (currently running for 5+ days) locking tables etc. It's always at 100%

        I've disabled Housekeeping and will report back in the next 5 days to see if this problem has been alleviated. From my reading, I'm almost certain this is the cause.

        I'm currently running 2.0.8 on MySQL 5.5.... If I'd only known about the complexities of MySQL, Foreign Keys and Partitioning before I started I would have gone the Postgres Route.

        I did manage to find some articles on partitioning SOME of the tables off, but it's not clean solution for 2.0.8. My option now is to move my Zabbix instance to a "beefer" server and upgrade to 2.2 in the hopes that my database remains fairly well maintained. Not being a MySQL guy really hurt me this time

        How would I go about suggesting that the documentation be updated to "Prefer" Postgress for med->large environments as clearly MySQL has it's limitations for this particular application?

        Comment

        • jan.garaj
          Senior Member
          Zabbix Certified Specialist
          • Jan 2010
          • 506

          #5
          Notes:
          - PostgreSQL is not always better DB, see
          http://www.zabbix.com/img/zabconf201...for_Zabbix.pdf

          - check webinar Tune your Zabbix for Better Performance and threads about Zabbix for large environment for some performance tips.
          Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.
          My DevOps stack: Docker / Kubernetes / Mesos / ECS / Terraform / Elasticsearch / Zabbix / Grafana / Puppet / Ansible / Vagrant

          Comment

          • syndeysider
            Senior Member
            • Oct 2013
            • 115

            #6
            thank you jan.garaj.

            That's a good article for performance based implications of choosing MySQL > Postgres, however I was referring to the limited constraints around Partioning off historical data when comparing DB engines.

            I've since rectified this issue for those that hit this through google :

            1. RAM... I was on 8GB and suffering. I've upgraded to 32GB with a 2GB RAMDISK for tmpdir (mysql). I increased the innodb buffer pool too.

            2. Changed my.cnf to use innodb_file_per_table, rebuilt my MySQL DB after changes. Took a new DB dump, dropped the current zabbix DB, stopped mysqld, deleted ibdata1 and all ib_logfile*, restarted mysql, reimport DB dump.

            3. Rebuilt my RAID 5 to RAID 1. Do not underestimate the importance of Disk Speed, Configuration (Raid 5 vs 1) and RAM. I've got a relatively medium sized installation. 3 proxies, 1 core (Node 0 and MySQL running on same box), 700 hosts, 500nvps.

            *** This step requires a rebuilt. You cannot go from RAID 5 to RAID1.

            This was my first implementation of Zabbix and i've learn some valuable lessons. In my DEV, TEST I was not able to generate 500vps and therefore heavly undershot on the hardware and Architecture. Reading :

            One of the questions for those of us that use Zabbix on a large scale is “Just how much data can Zabbix ingest before it blows up spectacularly?” Some of the work I’ve been doing lately revolves around that question. I have an extremely large environment (around 32000+ devices) that could potentially be monitored entirely […]


            is an absolute necessity!

            Cheers again for the help.

            Comment

            Working...