Ad Widget

Collapse

Zabbix Proxy Performance in AWS VPC

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bunkzilla
    Junior Member
    • Feb 2017
    • 6

    #1

    Zabbix Proxy Performance in AWS VPC

    Version: Zabbbix 2.4.2

    Been trying to debug this for a while, and have not really had any luck. Have done packet captures, and haven't seen anything glaringly obvious.


    We have two different zabbix masters. One that is running in EC2 Classic (non VPC), that zabbix_proxy hands about 725 hosts and 123,000 checks, it's poller busy process is about 40% using 60 pollers.


    The same size machine in AWS VPC, monitoring across accounts in same region using AWS peering connections, can't seem to get over 300 hosts before the zabbix_proxy busy poller process hits 100% when running 60 pollers. I start to see connection errors in the zabbix_proxy logs. If I move 70 hosts onto another zabbix_proxy, they do stabilize somewhat. I've tried running in debug mode, and that wasn't enlightening.

    I'm leaning towards it being a network connection issue, but nothing is standing out as a smoking gun.

    Code:
    23977:20170131:040020.101 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     23977:20170131:040020.103 Zabbix agent item "mysql.Created_tmp_disk_tables" on host "dke4-dbtxbs01b.aue1p" failed: first network error, wait for 15 seconds
     23980:20170131:040020.103 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     23980:20170131:040020.104 Zabbix agent item "proc.num[,,run]" on host "dke4-roapp01e.aue1p" failed: first network error, wait for 15 seconds
     23997:20170131:040020.130 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     23997:20170131:040020.131 Zabbix agent item "net.if.out[eth0,bytes]" on host "dke4-dbrpss01a.aue1p" failed: first network error, wait for 15 seconds
     23982:20170131:040020.131 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     23994:20170131:040020.132 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     24026:20170131:040020.132 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     23982:20170131:040020.132 Zabbix agent item "vfs.fs.size[/,free]" on host "dke4-smcn01c.aue1p" failed: first network error, wait for 15 seconds
     24026:20170131:040020.133 Zabbix agent item "mailq.queue_size" on host "dke4-dbtxss01c.aue1p" failed: first network error, wait for 15 seconds
     23994:20170131:040020.133 Zabbix agent item "net.if.in[eth0,bytes]" on host "dke4-esenm01b.aue1p" failed: first network error, wait for 15 seconds
     24006:20170131:040020.139 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     24006:20170131:040020.141 Zabbix agent item "system.cpu.util[,softirq,avg1]" on host "dke4-lbmo02b.aue1p" failed: first network error, wait for 15 seconds
     23987:20170131:040020.142 [Z3005] query failed: [2006] MySQL server has gone away [begin;]
     23987:20170131:040020.144 Zabbix agent item "custom.vfs.dev.read.sectors[xvdb]" on host "dke4-dbenss01c.aue1p" failed: first network error, wait for 15 seconds
     23967:20170131:040023.404 received configuration data from server, datalen 5490406
     24038:20170131:040024.714 cannot send list of active checks to [10.21.123.108]: host [dke-crtr01a.aue1m] not monitored
     24037:20170131:040027.061 cannot send list of active checks to [10.21.75.39]: host [dke5-monorpc01d.aue1m] not monitored
     24039:20170131:040028.927 cannot send list of active checks to [10.21.75.18]: host [dke5-mossfs02d.aue1m] not monitored
     23967:20170131:040029.804 received configuration data from server, datalen 5490406
     24038:20170131:040031.703 cannot send list of active checks to [10.21.79.124]: host [dke5-mosspm01e.aue1m] not monitored
  • JohnSnow
    Junior Member
    • Jul 2017
    • 1

    #2
    Any solution ?

    Hey, were you able to fix this issue? If so, how ?

    Thanks

    Comment

    • bunkzilla
      Junior Member
      • Feb 2017
      • 6

      #3
      Not as of yet. I've been working with AWS support to try to determine what the root cause is. I'll post back if I come back with anything conclusive. I was hoping 3.2 upgrade would resolve it, but nope.

      Comment

      Working...