Ad Widget

Collapse

Linux host with active agent isn't work

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    #16
    Thanks for the data again.

    This line in the agent configuration is incorrect (that's why your proxy didn't see the fe2 agent connecting):

    ServerActive=127.0.0.1;192.168.24.169;192.168.24.1 70;192.168.24.178

    (It lists all the IPs as HA node addresses and that's wrong)

    When testing the active agent functionality only list the specific test there:
    - if testing with the HA servers, set it to ServerActive=192.168.24.169;192.168.24.170
    - if testing with the proxy, set it to ServerActive=192.168.24.178 (127.0.0.1 only works if testing the agent on the server or on the proxy)

    On which OS are you running these btw?

    Did you already test an active agent on the proxy, reporting directly to the server?

    Markku

    Comment

    • Spectator
      Member
      • Sep 2021
      • 71

      #17
      Dear Markku,

      All hosts run Almalinux 9 operating system.

      Currently, the situation is that there are two Zabbix servers in native HA:​
      Code:
      test-nw-zbx-srv1 (192.168.24.169)
      test-nw-zbx-srv2 (192.168.24.170)​
      There is an active Zabbix proxy:
      Code:
      test-zbx-proxy1 (192.168.24.178)
      And there is a linux host:
      Code:
      test-zbx-fe2 (192.168.24.176)
      The Zabbix proxy works well, it can also measure itself, these are its settings on the Zabbix web interface:
      Code:
      host: test-zbx-proxy1
      Agent: 127.0.0.1
      Template: Linux by Zabbix agent active
      Monitored by: test-zbx-proxy1​
      The /etc/zabbix/zabbix_proxy.conf on the test-zbx-proxy1:
      Code:
      Server=192.168.24.169;192.168.24.170
      Hostname=test-zbx-proxy1
      LogFile=/var/log/zabbix/zabbix_proxy.log
      LogFileSize=0
      PidFile=/run/zabbix/zabbix_proxy.pid
      SocketDir=/run/zabbix
      DBName=zabbix_proxy
      DBUser=zabbix
      DBPassword=password
      ProxyOfflineBuffer=48
      ConfigFrequency=100
      StartPollers=5
      StartPollersUnreachable=5
      StartTrappers=5
      StartPingers=5
      StartDiscoverers=2
      StartHTTPPollers=2
      SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
      CacheSize=128M
      HistoryCacheSize=64M
      HistoryIndexCacheSize=32M
      Timeout=15
      LogSlowQueries=3000
      StatsAllowedIP=127.0.0.1​
      The /etc/zabbix/zabbix_agent2.conf on the test-zbx-proxy1:
      Code:
      PidFile=/run/zabbix/zabbix_agent2.pid
      LogFile=/var/log/zabbix/zabbix_agent2.log
      LogFileSize=0
      Server=127.0.0.1,192.168.24.169,192.168.24.170,192.168.24.178
      ServerActive=192.168.24.178
      Hostname=test-zbx-proxy1
      Include=/etc/zabbix/zabbix_agent2.d/*.conf
      ControlSocket=/tmp/agent.sock
      Include=./zabbix_agent2.d/plugins.d/*.conf​
      Unfortunately, when I want to measure the test-zbx-fe2 linux host, with an active agent, via proxy, it doesn't work.

      The /etc/zabbix/zabbix_agent2.conf on the test-zbx-fe2:
      Code:
      PidFile=/run/zabbix/zabbix_agent2.pid
      LogFile=/var/log/zabbix/zabbix_agent2.log
      LogFileSize=0
      Server=192.168.24.178
      ServerActive=127.0.0.1,192.168.24.178
      Hostname=test-zbx-fe2
      Include=/etc/zabbix/zabbix_agent2.d/*.conf
      ControlSocket=/tmp/agent.sock
      Include=./zabbix_agent2.d/plugins.d/*.conf​
      On the Zabbix web, configuration, hosts, create host:
      Host: test-zbx-fe2
      Templates: Linux by Zabbix agent active
      Agent: 192.168.24.176
      Monitored by proxy: test-zbx-proxy1​

      On the Zabbix web, monitoring, hosts, test-zbx-fe2, latest data:
      Only the "active agent availability" is working, this is "available".
      The other measurements are missing and do not indicate an error.​

      The zabbix_proxy.log on the test-zbx-proxy1:
      Code:
      458384:20221025:132316.015 Starting Zabbix Proxy (active) [test-zbx-proxy1]. Zabbix 6.2.3 (revision 98ee88fc19d).
      458384:20221025:132316.015 **** Enabled features ****
      458384:20221025:132316.015 SNMP monitoring:       YES
      458384:20221025:132316.015 IPMI monitoring:       YES
      458384:20221025:132316.015 Web monitoring:        YES
      458384:20221025:132316.015 VMware monitoring:     YES
      458384:20221025:132316.015 ODBC:                  YES
      458384:20221025:132316.015 SSH support:           YES
      458384:20221025:132316.015 IPv6 support:          YES
      458384:20221025:132316.015 TLS support:           YES
      458384:20221025:132316.015 **************************
      458384:20221025:132316.015 using configuration file: /etc/zabbix/zabbix_proxy.conf
      458384:20221025:132316.029 current database version (mandatory/optional): 06020000/06020002
      458384:20221025:132316.029 required mandatory version: 06020000
      458384:20221025:132316.034 proxy #0 started [main process]
      458388:20221025:132316.034 proxy #1 started [configuration syncer #1]
      458388:20221025:132316.052 Unable to connect to [192.168.24.169]:10051 [cannot connect to [[192.168.24.169]:10051]: [111] Connection refused]
      458389:20221025:132316.052 proxy #2 started [trapper #1]
      458390:20221025:132316.052 proxy #3 started [trapper #2]
      458391:20221025:132316.053 proxy #4 started [trapper #3]
      458393:20221025:132316.055 proxy #6 started [trapper #5]
      458394:20221025:132316.058 proxy #7 started [preprocessing manager #1]
      458395:20221025:132316.058 proxy #8 started [preprocessing worker #1]
      458398:20221025:132316.059 proxy #11 started [heartbeat sender #1]
      458396:20221025:132316.059 proxy #9 started [preprocessing worker #2]
      458397:20221025:132316.059 proxy #10 started [preprocessing worker #3]
      458408:20221025:132316.059 proxy #21 started [history syncer #4]
      458392:20221025:132316.060 proxy #5 started [trapper #4]
      458403:20221025:132316.060 proxy #16 started [discoverer #1]
      458400:20221025:132316.061 proxy #13 started [housekeeper #1]
      458399:20221025:132316.061 proxy #12 started [data sender #1]
      458406:20221025:132316.063 proxy #19 started [history syncer #2]
      458401:20221025:132316.063 proxy #14 started [http poller #1]
      458405:20221025:132316.063 proxy #18 started [history syncer #1]
      458402:20221025:132316.064 proxy #15 started [http poller #2]
      458410:20221025:132316.066 proxy #23 started [task manager #1]
      458422:20221025:132316.066 proxy #35 started [icmp pinger #2]
      458423:20221025:132316.067 proxy #36 started [icmp pinger #3]
      458424:20221025:132316.067 proxy #37 started [icmp pinger #4]
      458420:20221025:132316.067 proxy #33 started [unreachable poller #5]
      458425:20221025:132316.067 proxy #38 started [icmp pinger #5]
      458409:20221025:132316.068 proxy #22 started [self-monitoring #1]
      458427:20221025:132316.068 proxy #40 started [odbc poller #1]
      458419:20221025:132316.068 proxy #32 started [unreachable poller #4]
      458413:20221025:132316.069 proxy #26 started [poller #3]
      458388:20221025:132316.069 received configuration data from server at "192.168.24.170", datalen 28880
      458421:20221025:132316.071 proxy #34 started [icmp pinger #1]
      458414:20221025:132316.072 proxy #27 started [poller #4]
      458417:20221025:132316.074 proxy #30 started [unreachable poller #2]
      458404:20221025:132316.074 proxy #17 started [discoverer #2]
      458418:20221025:132316.075 proxy #31 started [unreachable poller #3]
      458407:20221025:132316.075 proxy #20 started [history syncer #3]
      458412:20221025:132316.076 proxy #25 started [poller #2]
      458411:20221025:132316.076 proxy #24 started [poller #1]
      458415:20221025:132316.080 proxy #28 started [poller #5]
      458416:20221025:132316.083 proxy #29 started [unreachable poller #1]
      458426:20221025:132316.083 proxy #39 started [availability manager #1]
      458399:20221025:132316.186 Unable to connect to [192.168.24.169]:10051 [cannot connect to [[192.168.24.169]:10051]: [111] Connection refused]
      458388:20221025:132456.177 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:132636.277 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:132816.405 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:132956.448 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:133136.573 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:133316.658 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:133456.764 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:133636.860 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:133816.961 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:133957.085 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:134137.200 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:134317.260 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:134457.382 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:134637.413 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:134817.526 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:134957.578 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:135137.695 received configuration data from server at "192.168.24.170", datalen 28880
      458400:20221025:135316.160 executing housekeeper
      458400:20221025:135316.185 housekeeper [deleted 3986 records in 0.023747 sec, idle for 1 hour(s)]
      458388:20221025:135317.812 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:135457.880 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:135637.990 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:135818.070 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:135958.197 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:140138.322 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:140318.438 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:140458.568 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:140638.694 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:140818.738 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:140958.868 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:141138.905 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:141318.995 received configuration data from server at "192.168.24.170", datalen 28880
      458388:20221025:141459.106 received configuration data from server at "192.168.24.170", datalen 28880​
      The zabbix_agent2.log on the test-zbx-fe2:
      Code:
      2022/10/25 13:23:25.064311 Starting Zabbix Agent 2 (6.2.3)
      2022/10/25 13:23:25.065763 OpenSSL library (OpenSSL 3.0.1 14 Dec 2021) initialized
      2022/10/25 13:23:25.065832 using configuration file: /etc/zabbix/zabbix_agent2.conf
      2022/10/25 13:23:25.065927 using plugin 'Agent' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.065949 using plugin 'Ceph' (built-in) providing following interfaces: exporter, runner, configurator
      2022/10/25 13:23:25.065965 using plugin 'Cpu' (built-in) providing following interfaces: exporter, collector, runner
      2022/10/25 13:23:25.065978 using plugin 'DNS' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.065989 using plugin 'Docker' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.065998 using plugin 'File' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066006 using plugin 'Hw' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066013 using plugin 'Kernel' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066027 using plugin 'Log' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066032 using plugin 'MQTT' (built-in) providing following interfaces: watcher, configurator
      2022/10/25 13:23:25.066038 using plugin 'Memcached' (built-in) providing following interfaces: exporter, runner, configurator
      2022/10/25 13:23:25.066045 using plugin 'Memory' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066051 using plugin 'Modbus' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066056 using plugin 'Mysql' (built-in) providing following interfaces: exporter, runner, configurator
      2022/10/25 13:23:25.066067 using plugin 'NetIf' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066073 using plugin 'Oracle' (built-in) providing following interfaces: exporter, runner, configurator
      2022/10/25 13:23:25.066081 using plugin 'Postgres' (built-in) providing following interfaces: exporter, runner, configurator
      2022/10/25 13:23:25.066108 using plugin 'Proc' (built-in) providing following interfaces: exporter, collector
      2022/10/25 13:23:25.066116 using plugin 'ProcExporter' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066123 using plugin 'Redis' (built-in) providing following interfaces: exporter, runner, configurator
      2022/10/25 13:23:25.066135 using plugin 'Smart' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066146 using plugin 'Sw' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066154 using plugin 'Swap' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066161 using plugin 'SystemRun' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066167 using plugin 'Systemd' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066177 using plugin 'TCP' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066185 using plugin 'UDP' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066191 using plugin 'Uname' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066197 using plugin 'Uptime' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066203 using plugin 'Users' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066210 using plugin 'VFSDev' (built-in) providing following interfaces: exporter, collector
      2022/10/25 13:23:25.066216 using plugin 'VFSDir' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066223 using plugin 'VfsFs' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066229 using plugin 'WebCertificate' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066235 using plugin 'WebPage' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066241 using plugin 'ZabbixAsync' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066248 using plugin 'ZabbixStats' (built-in) providing following interfaces: exporter, configurator
      2022/10/25 13:23:25.066254 lowering the plugin ZabbixSync capacity to 1 as the configured capacity 100 exceeds limits
      2022/10/25 13:23:25.066260 using plugin 'ZabbixSync' (built-in) providing following interfaces: exporter
      2022/10/25 13:23:25.066408 Plugin support version 1.0
      2022/10/25 13:23:25.066462 Zabbix Agent2 hostname: [test-zbx-fe2]
      2022/10/25 13:23:26.067413 [101] cannot connect to [127.0.0.1:10051]: dial tcp :0->127.0.0.1:10051: connect: connection refused
      2022/10/25 13:23:26.067430 [101] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:23:26.067601 [101] cannot connect to [127.0.0.1:10051]: dial tcp :0->127.0.0.1:10051: connect: connection refused
      2022/10/25 13:23:26.067613 [101] sending of heartbeat message for [test-zbx-fe2] started to fail
      2022/10/25 13:23:29.068385 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:60073->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:23:29.068415 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:25:33.068791 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:33369->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:25:33.068833 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:27:37.068896 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:53541->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:27:37.068959 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:29:41.068839 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:55379->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:29:41.068877 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:31:45.068416 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:51043->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:31:45.068453 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:33:49.068438 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:60513->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:33:49.068491 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:35:53.068159 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:38761->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:35:53.068209 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:37:57.069618 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:39929->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:37:57.069678 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:40:01.068272 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:49175->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:40:01.068311 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:42:05.068863 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:32781->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:42:05.068933 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:44:09.068356 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:35413->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:44:09.068398 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:46:13.068254 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:44967->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:46:13.068300 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:48:17.068471 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:37899->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:48:17.068509 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:50:21.068970 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:47819->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:50:21.069014 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:52:25.068455 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:36241->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:52:25.068498 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:54:29.068345 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:56217->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:54:29.068385 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:56:33.068407 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:45463->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:56:33.068445 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 13:58:37.068909 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:48389->192.168.24.178:10051: i/o timeout'
      2022/10/25 13:58:37.068959 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:00:41.068438 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:49349->192.168.24.178:10051: i/o timeout'
      2022/10/25 14:00:41.068486 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:02:45.068100 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:40641->192.168.24.178:10051: i/o timeout'
      2022/10/25 14:02:45.068143 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:04:49.068663 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:46557->192.168.24.178:10051: i/o timeout'
      2022/10/25 14:04:49.068719 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:06:53.068500 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:34423->192.168.24.178:10051: i/o timeout'
      2022/10/25 14:06:53.068549 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:08:57.068963 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:49617->192.168.24.178:10051: i/o timeout'
      2022/10/25 14:08:57.069008 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:11:01.069943 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:39431->192.168.24.178:10051: i/o timeout'
      2022/10/25 14:11:01.069996 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:13:05.068650 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:52549->192.168.24.178:10051: i/o timeout'
      2022/10/25 14:13:05.068788 [102] active check configuration update from host [test-zbx-fe2] started to fail
      2022/10/25 14:15:09.068872 [102] cannot receive data from [192.168.24.178:10051]: Cannot read message: 'read tcp 192.168.24.176:44599->192.168.24.178:10051: i/o timeout'​
      And finally the zabbix_server.log on the active Zabbix server:​
      Code:
        1064:20221025:131959.642 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1065:20221025:132139.677 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1062:20221025:132316.068 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1063:20221025:132456.176 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:132636.276 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1065:20221025:132816.404 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1062:20221025:132956.447 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1063:20221025:133136.572 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1062:20221025:133316.657 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1062:20221025:133456.763 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1063:20221025:133636.859 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1065:20221025:133816.960 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:133957.084 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:134137.199 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1063:20221025:134317.259 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1064:20221025:134457.382 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1065:20221025:134637.412 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1062:20221025:134817.525 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1064:20221025:134957.577 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:135137.694 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1064:20221025:135317.811 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1063:20221025:135457.879 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:135637.989 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1065:20221025:135818.069 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:135958.196 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1064:20221025:140138.321 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1064:20221025:140318.437 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1065:20221025:140458.567 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1064:20221025:140638.694 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:140818.737 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1062:20221025:140958.868 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1062:20221025:141138.905 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1043:20221025:141251.526 executing housekeeper
        1043:20221025:141251.568 housekeeper [deleted 0 hist/trends, 0 items/triggers, 0 events, 0 problems, 0 sessions, 0 alarms, 0 audit, 0 records in 0.019694 sec, idle for 1 hour(s)]
        1064:20221025:141318.995 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:141459.105 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1063:20221025:141639.235 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1063:20221025:141819.363 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:141959.427 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9
        1061:20221025:142139.487 sending configuration data to proxy "test-nw-zbx-kak-proxy1" at "10.36.24.178", datalen 28880, bytes 4874 with compression ratio 5.9​

      After 30 minutes, this alert will appear on the Zabbix interface (after all, this is understandable based on what was described above):
      Zabbix agent is not available (or nodata for 30m)​

      In theory, that's how this design should work, right?
      Did I configure something wrong?​

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #18
        Thanks again for good data. Configs looks fine. The only small glitch (but doesn't affect the problem) is the ServerActive=127.0.0.1 on the fe2 agent config, that's just wrong, but the agent reverts to using the next address anyway as it should.

        You have now demonstrated that
        - proxy-agent can connect to proxy (but that's a local connection in the server, proves anyway that the trapper process on proxy is working)
        - proxy can connect to server (trapper process on server is working, and connectivity from proxy to server is fine)
        - fe2-agent can NOT connect to proxy properly (connects but cannot communicate properly)
        - fe2-agent can NOT connect to server properly (connects but cannot communicate properly)

        For me it looks like there is something wrong with test-zbx-fe2, or it's connectivity.

        If you could still take a tcpdump at the same time from the fe2 agent (filtering with the IP address of proxy) and proxy (filtering with the test-zbx-fe2 IP address) while fe2 is attempting to connect to proxy? If possible, raw capture file is preferred, but a textual dump (like you did earlier) is maybe enough. Also attach the agent log from the same time to be sure. That will hopefully tell us more about the connection problem.

        Markku
        Last edited by Markku; 25-10-2022, 17:02. Reason: Fix typos

        Comment

        • Spectator
          Member
          • Sep 2021
          • 71

          #19
          Dear Markku,

          Thank you very much for your informative answer.
          You are right, I removed 127.0.0.1 from the ActiveServer= line of the test-zbx-fe2 agent configuration.

          So, as a reminder, I have these hosts in my Zabbix environment:​
          Code:
          test-nw-zbx-srv1 (192.168.24.169)
          test-nw-zbx-srv2 (192.168.24.170)​
          test-zbx-proxy1 (192.168.24.178)
          test-zbx-fe2 (192.168.24.176)​
          I restarted the zabbix-agent2 service on the test-zbx-fe2 host and the zabbix-proxy service on the test-zbx-proxy1 at the same time, while tcpdumping the traffic on the hosts for five minutes.

          On the test-zbx-proxy1 machine with this command (filtered to the IP address of test-zbx-fe2):
          tcpdump -nni any host 192.168.24.176 -vvv

          On the test-zbx-fe2 host with this command (filtered to the IP address of test-zbx-proxy1):
          tcpdump -nni any host 192.168.24.178 -vvv

          I attached the zabbix_agent2.log, the zabbix_proxy.log and the tcpdump outputs in one zipped file due to the size limit.
          The situation is the same as before: measurements of the test-zbx-fe2 host on the Zabbix web surface are still missing, only "Active agent availability" appears with the value "available (1)".

          The SELinux and the firewalld are disabled on all hosts of course.

          Thank you very much for your help in advance!​
          Attached Files

          Comment

          • Markku
            Senior Member
            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
            • Sep 2018
            • 1781

            #20
            Thanks for the data. The text-based capture files are very hard to read (pcap files are much easier), but this is what I found. This is one TCP session. It looks the same from both agent and proxy perspective, so these are from the agent dump only.

            09:57:09.052663 ens192 Out IP (tos 0x0, ttl 64, id 20969, offset 0, flags [DF], proto TCP (6), length 60)
            192.168.24.176.33009 > 192.168.24.178.10051: Flags [S], cksum 0x45d8 (incorrect -> 0xfd6f), seq 939378532, win 64240, options [mss 1460,sackOK,TS val 2474233573 ecr 0,nop,wscale 7], length 0
            09:57:09.052917 ens192 In IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
            192.168.24.178.10051 > 192.168.24.176.33009: Flags [S.], cksum 0x8d5b (correct), seq 324344112, ack 939378533, win 65160, options [mss 1460,sackOK,TS val 2522917253 ecr 2474233573,nop,wscale 7], length 0
            09:57:09.052943 ens192 Out IP (tos 0x0, ttl 64, id 20970, offset 0, flags [DF], proto TCP (6), length 52)
            192.168.24.176.33009 > 192.168.24.178.10051: Flags [.], cksum 0x45d0 (incorrect -> 0xb8ba), seq 1, ack 1, win 502, options [nop,nop,TS val 2474233573 ecr 2522917253], length 0
            09:57:09.053281 ens192 Out IP (tos 0x0, ttl 64, id 20971, offset 0, flags [DF], proto TCP (6), length 142)
            192.168.24.176.33009 > 192.168.24.178.10051: Flags [P.], cksum 0x462a (incorrect -> 0x2bb7), seq 1:91, ack 1, win 502, options [nop,nop,TS val 2474233574 ecr 2522917253], length 90
            09:57:09.053407 ens192 In IP (tos 0x0, ttl 64, id 3489, offset 0, flags [DF], proto TCP (6), length 52)
            192.168.24.178.10051 > 192.168.24.176.33009: Flags [.], cksum 0xb858 (correct), seq 1, ack 91, win 509, options [nop,nop,TS val 2522917253 ecr 2474233574], length 0
            09:57:12.053305 ens192 Out IP (tos 0x0, ttl 64, id 20972, offset 0, flags [DF], proto TCP (6), length 52)
            192.168.24.176.33009 > 192.168.24.178.10051: Flags [F.], cksum 0x45d0 (incorrect -> 0xaca6), seq 91, ack 1, win 502, options [nop,nop,TS val 2474236574 ecr 2522917253], length 0
            09:57:12.094374 ens192 In IP (tos 0x0, ttl 64, id 3490, offset 0, flags [DF], proto TCP (6), length 52)
            192.168.24.178.10051 > 192.168.24.176.33009: Flags [.], cksum 0xa0be (correct), seq 1, ack 92, win 509, options [nop,nop,TS val 2522920294 ecr 2474236574], length 0

            By this far the agent has sent 90 bytes to the proxy, proxy got them, and then there is the 3-second timeout. But after 7 seconds (10 seconds in total) the proxy continues:

            09:57:19.068568 ens192 In IP (tos 0x0, ttl 64, id 3491, offset 0, flags [DF], proto TCP (6), length 616)
            192.168.24.178.10051 > 192.168.24.176.33009: Flags [P.], cksum 0x01c9 (correct), seq 1:565, ack 92, win 509, options [nop,nop,TS val 2522927268 ecr 2474236574], length 564
            09:57:19.068598 ens192 Out IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
            192.168.24.176.33009 > 192.168.24.178.10051: Flags [R], cksum 0xc244 (correct), seq 939378624, win 0, length 0
            09:57:19.068627 ens192 In IP (tos 0x0, ttl 64, id 3492, offset 0, flags [DF], proto TCP (6), length 52)
            192.168.24.178.10051 > 192.168.24.176.33009: Flags [F.], cksum 0x834b (correct), seq 565, ack 92, win 509, options [nop,nop,TS val 2522927268 ecr 2474236574], length 0
            09:57:19.068631 ens192 Out IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
            192.168.24.176.33009 > 192.168.24.178.10051: Flags [R], cksum 0xc244 (correct), seq 939378624, win 0, length 0

            Essentially it took the proxy 10 seconds to respond with 564 bytes, but it was already too late, agent was not there anymore.

            Still no idea why the proxy holds the response. But if you could take a full packet capture this time (with -w filename.pcap option in tcpdump) then we can see, what the agent requested (90 bytes) and what the proxy responded (564 bytes). Maybe that tells us more.

            By all means, if someone else has some idea, go ahead.

            Markku

            Comment

            • Spectator
              Member
              • Sep 2021
              • 71

              #21
              Dear Markku,

              Thank you very much for your help and answers.
              The problem seems very mystical.
              This Zabbix system actually consists of:
              3 node PostgreSQL cluster for Zabbix database
              2 nodes Zabbix Server with native HA
              2 node PCS cluster for the Zabbix web frontend
              1 node Zabbix proxy

              Please note that in my previous posts I masked the real IP addresses, which I cannot do in the pcap files. The real IP addresses of the hosts, which you will also find in the pcap files:

              fe2: 10.26.24.176
              proxy: 10.36.24.178
              zabbix-server1: 10.36.24.169
              zabbix-server2: 10.36.24.170

              Today, on my own test machine, in Virtualbox, I built the same Zabbix environment consisting of 8 nodes.
              I created the same (proxy and agent2) configuration settings for the proxy and one of the frontend hosts - and it worked right away! I did everything the same as before in VMware where it doesn't work. Same OS, same Zabbix version...
              In VMware, the machines are in the same subnet and in the same VLAN... - we don't see anything wrong with that. I welcome any further help or ideas!

              Thank you for your help!​
              Attached Files

              Comment

              • Markku
                Senior Member
                Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                • Sep 2018
                • 1781

                #22
                Looking at the capture at the proxy, at 19:12:37, agent sent this:

                {"request":"active checks","host":"test-nw-zbx-fe2","version":"6.2"}

                Then again, agent closed the connection after the 3-second timeout, and the proxy responsed 7 seconds later:

                {"response":"success","data":[{"key":"vfs.file.cksum[/etc/passwd,sha256]","itemid":44816,"delay":"1h","lastlogsize":0, "mti me":0},{"key":"agent.hostname","itemid":44817,"d el ay":"1h","lastlogsize":0,"mtime":0},{"key":"syst em .swap.size[,free]","itemid":44818,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,system]","itemid":44819,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,user]","itemid":44820,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.hostname","itemid":44821," de lay":"1h","lastlogsize":0,"mtime":0},{"key":"syst e m.localtime","itemid":44822,"delay":"1m","lastlogs ize":0,"mtime":0},{"key":"system.sw.arch","itemid " :44823,"delay":"1h","lastlogsize":0,"mtime":0},{"k ey":"system.sw.os","itemid":44824,"delay":"1h"," la stlogsize":0,"mtime":0},{"key":"system.sw.packages ","itemid":44825,"delay":"1h","lastlogsize":0, "mti me":0},{"key":"system.swap.size[,pfree]","itemid":44826,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,softirq]","itemid":44827,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.swap.size[,total]","itemid":44828,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.uname","itemid":44829,"del ay ":"15m","lastlogsize":0,"mtime":0},{"key":"sys tem. uptime","itemid":44830,"delay":"30s","lastlogsize" :0,"mtime":0},{"key":"system.users.num","itemid": 4 4831,"delay":"1m","lastlogsize":0,"mtime":0},{"key ":"vm.memory.size[available]","itemid":44832,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"vm.memory.size[pavailable]","itemid":44833,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"vm.memory.size[total]","itemid":44834,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,steal]","itemid":44835,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,nice]","itemid":44836,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"agent.ping","itemid":44837,"delay ": "1m","lastlogsize":0,"mtime":0},{"key":"system .cpu .load[all,avg1]","itemid":44838,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"agent.version","itemid":44839,"de la y":"1h","lastlogsize":0,"mtime":0},{"key":"kern el. maxfiles","itemid":44840,"delay":"1h","lastlogsize ":0,"mtime":0},{"key":"kernel.maxproc","itemid ":44 841,"delay":"1h","lastlogsize":0,"mtime":0},{"key" :"proc.num","itemid":44842,"delay":"1m","lastlo gsi ze":0,"mtime":0},{"key":"proc.num[,,run]","itemid":44843,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.boottime","itemid":44844," de lay":"15m","lastlogsize":0,"mtime":0},{"key":"sys t em.cpu.intr","itemid":44845,"delay":"1m","lastlogs ize":0,"mtime":0},{"key":"system.cpu.load[all,avg5]","itemid":44846,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,iowait]","itemid":44847,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.load[all,avg15]","itemid":44848,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.num","itemid":44849,"d el ay":"1m","lastlogsize":0,"mtime":0},{"key":"syst em .cpu.switches","itemid":44850,"delay":"1m","lastlo gsize":0,"mtime":0},{"key":"system.cpu.util[,guest]","itemid":44851,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,guest_nice]","itemid":44852,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,idle]","itemid":44853,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"system.cpu.util[,interrupt]","itemid":44854,"delay":"1m","lastlogsize":0, "mti me":0},{"key":"net.if.discovery","itemid":44858, "d elay":"1h","lastlogsize":0,"mtime":0},{"key":"vfs. dev.discovery","itemid":44859,"delay":"1h","lastlo gsize":0,"mtime":0},{"key":"vfs.fs.discovery","ite mid":44860,"delay":"1h","lastlogsize":0,"mtime":0 }]}

                The data was compressed, that's why the 636 bytes of response was actually that much data. But the data looks just fine.

                I have no idea why the proxy (or the server, in your earlier tests) takes 10 second to send that response.

                Today, on my own test machine, in Virtualbox, I built the same Zabbix environment consisting of 8 nodes.
                I created the same (proxy and agent2) configuration settings for the proxy and one of the frontend hosts - and it worked right away!​
                Yeah, that's how it should go.

                What you could do is to increase the agent timeout (Timeout=15 or something like that), but that's just a kludge, not a real solution.

                Markku

                Comment

                • Markku
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                  • Sep 2018
                  • 1781

                  #23
                  In case someone is interested, I used the Wireshark dissector to see the uncompressed data: https://github.com/markkuleinio/wire...bix-dissectors

                  Markku

                  Comment

                  • Spectator
                    Member
                    • Sep 2021
                    • 71

                    #24
                    Dear Markku,

                    Increasing the agent timeout from 3 to 20 solved the problem.

                    Let's look for the reason, which is almost certainly related to WMware ESXi. But it is possible that the initiation of network communication between VMs within ESXi is that much slower, which did not occur in my own Virtualbox environment.
                    I will definitely write if we find the reason why it was necessary to increase the timeout value.​

                    Thanks again for all your help so far!​

                    Comment

                    • Spectator
                      Member
                      • Sep 2021
                      • 71

                      #25
                      Dear Markku,

                      After a few days, my colleagues and I found out what the problem was.
                      For my zabbix machines, the operators said that the IP address of the DNS server is the same as the IP address of the gateway, so I entered this IP during installation.
                      In fact, it was just an IP address acting as a DNS forwarder, which forwarded requests with a small delay to the real DNS servers.
                      So I actually didn't have exact DNS problems, I was able to install the necessary packages, etc. However, this already caused a timeout in Zabbix communication. That's why everything apparently worked well in communication when I increased the timeout values ​​- because then the answer could already come through the forwarder from the real DNS servers.
                      It's interesting, since I've included all zabbix host names in the /etc/hosts files anyway, so I don't know why Zabbix turned to DNS.

                      In any case, it was quite a rare problem, but in the end we managed to find out the reason

                      Maybe this will help others in the future.
                      I think a slower DNS server would give the exact same error.

                      Thank you again for your help!​

                      Comment

                      Working...