Ad Widget

Collapse

[Z3005] query failed: [2013] Lost connection to MySQL server

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • asteroidyorkton
    Member
    • Aug 2016
    • 53

    #16
    On it. Trying it out right away.
    Last edited by asteroidyorkton; 30-08-2016, 11:03.

    Comment

    • asteroidyorkton
      Member
      • Aug 2016
      • 53

      #17
      Originally posted by glebs.ivanovskis
      Sorry to interrupt you guys. Have you seen my second message in this thread? leftlanef4, have you tried 3.0.5rc1?

      Connection between poller, discovery items and DB is the following. Every Zabbix server process has it's own independent connection to the database and use it if necessary. Pollers don't use their connections most of the time since they communicate mostly with agents and history write cache. By the time they need DB next time DB server usually considers connection timed out and therefore pollers need to reconnect. One of very few number of occasions when poller needs DB is when it processes LLD data. And Zabbix had a reconnection bug on MySQL specifically which has been fixed very recently, I suspect it may be related.
      After updating to 3.0.5 i still got that error. But reducing Pollers helped a bit.

      Pollers reconnecting could be the reason for the query error "[select hostid,key_,state". It was happening every hour (and in between) because vfs.fs.discovery update interval was set to 1 hour. Everytime I restart zabbix server, there will be no Lost connection for 1 hour. Decreasing the update interval produced more errors.

      I had 256 pollers running, I reduced the number of pollers from 256 to 120 and the query error which had select statement NEVER occurred till now. Monitored for 4 hours now.

      But I still see this one once a while, [Z3005] query failed: [2013] Lost connection to MySQL server during query [begin;]
      This also starts after 1 hour after a restart. Less amount of errors though. usually 4 to 5 errors within 10 minutes and then probably after a while and a long break.

      What's the wait_timeout you have set on your DB? I'm on default 8 hours and want try reducing it. I should try setting it to 600.
      Last edited by asteroidyorkton; 31-08-2016, 08:15.

      Comment

      • asteroidyorkton
        Member
        • Aug 2016
        • 53

        #18
        Update: Workaround is reducing tcp timeout on linux. Check next post for why its happening
        Could be network issue. The first request never reaches the DB.

        Client requests the query, Client gets a reset ack.
        Second try - Re-Login happens, use zabbix_db query runs, and then failed query is tried again successfully, though the response packet always says its malformed, But i guess that's okay?

        Need to find out if its network firewall causing any issue or if its the Database server. When both DB and server are put in the same VLAN (for 5 hours just to test), these error never occurred.


        Reset, ACK


        Code:
        2087 13.646936758 Zabbix_Server -> Mysql_DB  MySQL 77 Request Query <- select hostid,key_,state,evaltype,formula,error,lifetime from items where itemid=23278
        2088 13.647469097  Mysql_DB -> Zabbix_Server TCP 77 3306 → 48241 [RST, ACK] Seq=1 Ack=12 Win=229 Len=11 TSval=87420570 TSecr=40022071
        2089 13.648100050  Mysql_DB -> Zabbix_Server MySQL 117 Response OK
        2090 13.649004957 Zabbix_Server -> Mysql_DB  MySQL 794 Request Query
        2091 13.652714816  Mysql_DB -> Zabbix_Server MySQL 117 Response OK
        2092 13.652978835 Zabbix_Server -> Mysql_DB  TCP 74 52992 → 3306 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=87420572 TSecr=0 WS=128
        2093 13.654013359  Mysql_DB -> Zabbix_Server TCP 74 3306 → 52992 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1380 SACK_PERM=1 TSval=41139130 TSecr=87420572 WS=128
        2094 13.654093447 Zabbix_Server -> Mysql_DB  TCP 66 52992 → 3306 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=87420572 TSecr=41139130
        2095 13.654697658 Zabbix_Server -> Mysql_DB  MySQL 78 Request Query
        2096 13.655540138  Mysql_DB -> Zabbix_Server MySQL 148 Server Greeting proto=10 version=5.7.14-log
        2097 13.655767188 Zabbix_Server -> Mysql_DB  TCP 66 52992 → 3306 [ACK] Seq=1 Ack=83 Win=29312 Len=0 TSval=87420573 TSecr=41139130
        2098 13.656227833 Zabbix_Server -> Mysql_DB  MySQL 243 Login Request user=zabbix db=zabbix
        2099 13.656692019  Mysql_DB -> Zabbix_Server TCP 66 3306 → 52992 [ACK] Seq=83 Ack=178 Win=30080 Len=0 TSval=41139130 TSecr=87420573
        2100 13.657359209  Mysql_DB -> Zabbix_Server MySQL 88 Response OK
        2101 13.657533669 Zabbix_Server -> Mysql_DB  MySQL 85 Request Query
        2102 13.659093789  Mysql_DB -> Zabbix_Server MySQL 168 Response OK
        2103 13.659437646 Zabbix_Server -> Mysql_DB  MySQL 77 Request Use Database
        2104 13.661070543  Mysql_DB -> Zabbix_Server MySQL 88 Response OK
        2105 13.661777445 Zabbix_Server -> Mysql_DB  MySQL 77 Request Query <- select hostid,key_,state,evaltype,formula,error,lifetime from items where itemid=23278
        2106 13.662639706  Mysql_DB -> Zabbix_Server MySQL 77 Response OK[Malformed Packet]
        2107 13.664955260 Zabbix_Server -> Mysql_DB  MySQL 77 Request Query
        2108 13.665126899  Mysql_DB -> Zabbix_Server MySQL 77 Response OK
        2109 13.666563745  Mysql_DB -> Zabbix_Server MySQL 77 Response OK
        Last edited by asteroidyorkton; 21-09-2016, 20:57.

        Comment

        • asteroidyorkton
          Member
          • Aug 2016
          • 53

          #19
          Solved

          Seems fixed. No errors for the last 2 days..

          Just change the net.ipv4.tcp_keepalive_time to 300 seconds instead of default 7200 seconds on Database server (Only for remote DB this is needed). Then there won't be that sequence #1 causing tcp ack reset. These hourly discovery rules TCP connections starts off with sequence number 1 which then gets reset by network devices.

          sysctl -a | grep keepalive
          Last edited by asteroidyorkton; 21-09-2016, 20:51.

          Comment

          Working...