I have a test setup with Zabbix server and Zabbix host running on a Raspberry Pi, using a MySql database, and a second Zabbix host running on a Windows laptop. I can't get active checks working on the Windows laptop ("Active check got (ZAB_TCP_READ) time out"), unless I increase the timeout value in the host config to 30 seconds instead of the default of 3 seconds. I would obviously prefer to know what's going on and how I can avoid this huge delay.
Debug logging on the server seems to indicate there is no delay in the network communication, but rather in the database query (?) :
5472:20210428:143134.378 trapper got '{"request":"active checks","host":"teacherPC_61302"}'
5472:20210428:143134.378 In send_list_of_active_checks_json()
5472:20210428:143134.378 In get_hostid_by_host() host:'teacherPC_61302' metadata:''
5472:20210428:143134.378 query [txnlev:0] [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_ subject,h.tls_psk_identity,a.host_metadata,a.liste n_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='teacherPC_61302' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
... [28 seconds] ...
5472:20210428:143202.416 End of get_hostid_by_host():SUCCEED
5472:20210428:143202.417 send_list_of_active_checks_json() sending [{"response":"success","data":[...]}]
5472:20210428:143202.418 End of send_list_of_active_checks_json():SUCCEED
5472:20210428:143202.418 zbx_setproctitle() title:'trapper #1 [processed data in 28.040228 sec, waiting for connection]'
The host running on the Raspberry Pi has exactly the same configuration, and doesn't have any delay when requesting the active checks from the server :
5472:20210428:143118.710 End of send_list_of_active_checks_json():SUCCEED
5472:20210428:143118.710 zbx_setproctitle() title:'trapper #1 [processed data in 0.005681 sec, waiting for connection]'
The issue is reproducable: the Windows host always has a connection delay of exactly 28.0 seconds, the local host a delay of 0.0 seconds
Any hints ?
Debug logging on the server seems to indicate there is no delay in the network communication, but rather in the database query (?) :
5472:20210428:143134.378 trapper got '{"request":"active checks","host":"teacherPC_61302"}'
5472:20210428:143134.378 In send_list_of_active_checks_json()
5472:20210428:143134.378 In get_hostid_by_host() host:'teacherPC_61302' metadata:''
5472:20210428:143134.378 query [txnlev:0] [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_ subject,h.tls_psk_identity,a.host_metadata,a.liste n_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='teacherPC_61302' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
... [28 seconds] ...
5472:20210428:143202.416 End of get_hostid_by_host():SUCCEED
5472:20210428:143202.417 send_list_of_active_checks_json() sending [{"response":"success","data":[...]}]
5472:20210428:143202.418 End of send_list_of_active_checks_json():SUCCEED
5472:20210428:143202.418 zbx_setproctitle() title:'trapper #1 [processed data in 28.040228 sec, waiting for connection]'
The host running on the Raspberry Pi has exactly the same configuration, and doesn't have any delay when requesting the active checks from the server :
5472:20210428:143118.710 End of send_list_of_active_checks_json():SUCCEED
5472:20210428:143118.710 zbx_setproctitle() title:'trapper #1 [processed data in 0.005681 sec, waiting for connection]'
The issue is reproducable: the Windows host always has a connection delay of exactly 28.0 seconds, the local host a delay of 0.0 seconds
Any hints ?