Ad Widget

Collapse

suddenly my Zabbix monitoring went haywire -High Queues

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bercerobry
    Junior Member
    • Oct 2019
    • 17

    #1

    suddenly my Zabbix monitoring went haywire -High Queues

    To start off:

    My server is 2vcpu 2GB RAM centos VPS hosted in Singapore
    Current DB size is 17GB
    The server has been running well for the last 2 years since deployment.

    I have 3 proxy server in different locations.
    - proxy1 is 4vcpu 4GB RAM SSD
    - proxy2 is 2vcpu 2GB RAM HDD
    - proxy3 is 2vcpu 2GB RAM SSD

    Suddenly got massive alerts yesterday 10/23/2019 8:00PM about agents not checking in.
    So I went into those remote locations and confirmed everything is well EXCEPT the zabbix proxy and server just suddenly stopped talking!

    In one of my proxy server log this keeps coming, I can confirm there is good ping connectivity and latency of 40-50 ms between the proxy and zabbix server
    I have been scratching my head what is wrong.

    There was virtually nothing changed, the problem just popped out.
    Got really high queues for hours and hours and the graphs the latest data in zabbix server is confirmed incomplete.

    Agents that don't go through any proxy is working fine.

    In one of my proxy server this is the logs, In zabbix server logs I see time to time slow query logs.
    7760:20191024:210943.025 housekeeper [deleted 157585 records in 1.894099 sec, idle for 1 hour(s)]
    7757:20191024:211436.640 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7759:20191024:211922.719 cannot send proxy data to server at "zabbix.bryanit.net": ZBX_TCP_WRITE() timed out
    7759:20191024:211933.789 cannot send proxy data to server at "zabbix.bryanit.net": ZBX_TCP_WRITE() failed: [32] Broken pipe
    7757:20191024:211956.776 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7757:20191024:212516.775 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7759:20191024:212656.165 cannot send proxy data to server at "zabbix.bryanit.net": ZBX_TCP_WRITE() failed: [32] Broken pipe
    7757:20191024:213112.790 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7759:20191024:213203.214 cannot send proxy data to server at "zabbix.bryanit.net": ZBX_TCP_WRITE() failed: [104] Connection reset by peer
    7757:20191024:213614.468 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7757:20191024:214116.723 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7759:20191024:214205.555 cannot send proxy data to server at "zabbix.bryanit.net": ZBX_TCP_WRITE() timed out
    7759:20191024:214216.063 cannot send proxy data to server at "zabbix.bryanit.net": ZBX_TCP_WRITE() failed: [32] Broken pipe
    7757:20191024:214618.423 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7757:20191024:215120.484 received configuration data from server at "zabbix.bryanit.net", datalen 1115427
    7759:20191024:215130.160 cannot send proxy data to server at "zabbix.bryanit.net": ZBX_TCP_WRITE() failed: [32] Broken pipe

    Querying backlogs in proxy server gives this result:
    MariaDB [zabbix]> select max(id)-(select nextid from ids where table_name = "proxy_history" limit 1) from proxy_history;
    +-----------------------------------------------------------------------------+
    | max(id)-(select nextid from ids where table_name = "proxy_history" limit 1) |
    +-----------------------------------------------------------------------------+
    | 230366 |
    +-----------------------------------------------------------------------------+
    1 row in set (0.00 sec)
    Could you please help me where else to look at?
    Attached Files
  • gofree
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2017
    • 400

    #2
    ping will not tell you if the proxy can talk via port 10051 to the server - try telnet server_ip 10051 from your proxy

    Comment

    • bercerobry
      Junior Member
      • Oct 2019
      • 17

      #3
      Hello Guys,

      Just to update on this case. It seems that my VPS resources are quite not enough to the amount of database data it is currently holding.

      After adding a 1GB swap file to my VPS, all services and functions went back to normal.

      I am considering migrating to a 2vcpu/4GB/SSD vps in the next couple weeks.

      Comment

      Working...