Ad Widget

Collapse

dial tcp: lookup zabbixserver: i/o timeout logged on active agent

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • troffasky
    Senior Member
    • Jul 2008
    • 567

    #1

    dial tcp: lookup zabbixserver: i/o timeout logged on active agent

    A single active agent [out of many logs] messages like this:
    Code:
    2025/09/09 07:57:18.430756 [101] history upload to [zabbixserver:10051] [webserver] is working again
    2025/09/09 07:57:35.001695 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 07:57:35.001781 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 07:58:26.020860 [101] cannot connect to [zabbixserver:10051]: dial tcp :0->198.51.100.156:10051: i/o timeout
    2025/09/09 07:58:26.020916 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 07:58:29.021350 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 07:58:29.021461 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 07:58:42.712875 [101] history upload to [zabbixserver:10051] [webserver] is working again
    2025/09/09 07:58:48.002899 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 07:58:48.002963 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 07:59:29.030794 [101] history upload to [zabbixserver:10051] [webserver] is working again
    2025/09/09 07:59:32.033567 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 07:59:32.033623 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 08:00:30.026128 [101] cannot connect to [zabbixserver:10051]: dial tcp :0->198.51.100.156:10051: i/o timeout
    2025/09/09 08:00:30.026185 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 08:00:30.053666 [101] active check configuration update from [zabbixserver:10051] is working again
    2025/09/09 08:00:33.055204 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 08:00:33.055251 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 08:00:38.002707 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 08:00:38.002776 [101] active check configuration update from host [webserver] started to fail
    2025/09/09 08:01:45.715660 [101] history upload to [zabbixserver:10051] [webserver] is working again
    2025/09/09 08:02:29.002684 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 08:02:29.002740 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 08:02:36.004770 [101] cannot connect to [zabbixserver:10051]: dial tcp :0->198.51.100.156:10051: i/o timeout
    2025/09/09 08:02:36.004821 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 08:02:39.006052 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 08:02:39.006099 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    2025/09/09 08:02:45.166155 [101] history upload to [zabbixserver:10051] [webserver] is working again
    2025/09/09 08:02:54.003149 [101] cannot connect to [zabbixserver:10051]: dial tcp: lookup zabbixserver: i/o timeout
    2025/09/09 08:02:54.003217 [101] history upload to [zabbixserver:10051] [webserver] started to fail
    As far as I can tell it's only the Zabbix metrics which seem to be affected.

    Click image for larger version

Name:	image.png
Views:	265
Size:	75.2 KB
ID:	507025


    The web service continues to serve pages. Nginx metrics polled by the Zabbix server have no holes in:

    Click image for larger version

Name:	image.png
Views:	211
Size:	123.8 KB
ID:	507026


    What would explain this odd behaviour? Is this a DNS issue?
  • Answer selected by troffasky at 09-09-2025, 11:46.
    troffasky
    Senior Member
    • Jul 2008
    • 567


    Yes, this was a DNS issue. For some reason, 'dig zabbixserver' never showed any issue, always responds instantly. A colleague tried 'nslookup zabbixserver' and it always shows an error after the A record is returned:

    Code:
    # nslookup zabbixserver
    Server: 127.0.0.53
    Address: 127.0.0.53#53
    
    Non-authoritative answer:
    Name: zabbixserver
    Address: 198.51.100.156
    ;; communications error to 127.0.0.53#53: timed out
    ;; communications error to 127.0.0.53#53: timed out
    ;; communications error to 127.0.0.53#53: timed out
    ;; no servers could be reached
    The fix was to use proper DNS servers instead of the random DNS server that the cloud provider set.
    I am not sure exactly why the Zabbix agent was choking after getting the A record though. Is it doing a reverse DNS lookup as well?​

    Comment

    • troffasky
      Senior Member
      • Jul 2008
      • 567

      #2

      Yes, this was a DNS issue. For some reason, 'dig zabbixserver' never showed any issue, always responds instantly. A colleague tried 'nslookup zabbixserver' and it always shows an error after the A record is returned:

      Code:
      # nslookup zabbixserver
      Server: 127.0.0.53
      Address: 127.0.0.53#53
      
      Non-authoritative answer:
      Name: zabbixserver
      Address: 198.51.100.156
      ;; communications error to 127.0.0.53#53: timed out
      ;; communications error to 127.0.0.53#53: timed out
      ;; communications error to 127.0.0.53#53: timed out
      ;; no servers could be reached
      The fix was to use proper DNS servers instead of the random DNS server that the cloud provider set.
      I am not sure exactly why the Zabbix agent was choking after getting the A record though. Is it doing a reverse DNS lookup as well?​

      Comment

      Working...