Ad Widget

Collapse

refreshActiveChecks i/o timeout

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • tikkami
    Member
    • May 2018
    • 71

    #1

    refreshActiveChecks i/o timeout

    I'm dealing with Zabbix server 5.0.7 on top of Debian buster. Zabbix agent2 5.0.7 is used at server.

    Windows PC has agent2 5.0.7 without encryption.
    Linux PC has agent1 5.0.7 without encryption.


    Passive items works fine.

    Active checks are working with Zabbix server.

    When I try to use active items with Windows or Linux agents, It will end up to timeout. What would be the reason for this timeout?


    Agent2 log from WinPC:
    2021/01/20 16:29:43.695983 [101] In refreshActiveChecks() from [192.168.0.199:10051]
    2021/01/20 16:29:43.695983 connecting to [192.168.0.199:10051]
    2021/01/20 16:29:43.695983 sending [{"request":"active checks","host":"WINPC","version":"5.0"}] to [192.168.0.199:10051]
    2021/01/20 16:29:43.696959 receiving data from [192.168.0.199:10051]
    2021/01/20 16:29:46.696544 cannot receive data from [192.168.0.199:10051]: Cannot read message: 'read tcp 192.168.0.2:12439->192.168.0.199:10051: i/o timeout'
    2021/01/20 16:29:46.696544 [101] active check configuration update from [192.168.0.199:10051] started to fail (Cannot read message: 'read tcp 192.168.0.2:12439->192.168.0.199:10051: i/o timeout')
    2021/01/20 16:29:46.696544 [101] End of refreshActiveChecks() from [192.168.0.199:10051]


    Trapper log from Zabbix server:
    725:20210120:162943.697 trapper got '{"request":"active checks","host":"WINPC","version":"5.0"}'
    725:20210120:162943.697 In send_list_of_active_checks_json()
    725:20210120:162943.697 In is_ip4() ip:'192.168.0.2'
    725:20210120:162943.697 End of is_ip4():SUCCEED
    725:20210120:162943.697 In get_hostid_by_host() host:'WINPC' metadata:''
    725:20210120:162943.697 query [txnlev:0] [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_ subject,h.tls_psk_identity,a.host_metadata,a.liste n_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='WINPC' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]

    What's happening here??
    725:20210120:163003.710 query [txnlev:1] [begin;]
    725:20210120:163003.710 In DBregister_host_flush()
    725:20210120:163003.710 In DBregister_host_active()
    725:20210120:163003.710 query [txnlev:1] [select null from actions where eventsource=2 and status=0]
    725:20210120:163003.710 End of DBregister_host_active():FAIL
    725:20210120:163003.710 End of DBregister_host_flush()
    725:20210120:163003.710 query [txnlev:1] [commit;]
    725:20210120:163003.710 End of get_hostid_by_host():SUCCEED
    725:20210120:163003.710 query [txnlev:0] [select itemid from items where type=7 and flags<>2 and hostid=10353]
    725:20210120:163003.710 In substitute_key_macros_impl() data:'agent.hostname'
    725:20210120:163003.710 End of substitute_key_macros_impl():SUCCEED data:'agent.hostname'
    725:20210120:163003.710 send_list_of_active_checks_json() sending [{"response":"success","data":[{"key":"agent.hostname","itemid":32660,"delay": "1m ","lastlogsize":0,"mtime":0}],"refresh_unsupported":600}]
    725:20210120:163003.710 End of send_list_of_active_checks_json():SUCCEED
    725:20210120:163003.710 zbx_setproctitle() title:'trapper #2 [processed data in 20.013714 sec, waiting for connection]'



    Direct DB query does not take so much time:
    MariaDB [zabbix]> select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_ subject,h.tls_psk_identity,a.host_metadata,a.liste n_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host wherehost='WINPC' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null;
    +--------+--------+------------+------------+-------------+------------------+---------------+-----------+------------+-------------+-------+
    | hostid | status | tls_accept | tls_issuer | tls_subject | tls_psk_identity | host_metadata | listen_ip | listen_dns | listen_port | flags |
    +--------+--------+------------+------------+-------------+------------------+---------------+-----------+------------+-------------+-------+
    | 10353 | 0 | 1 | | | | NULL | NULL | NULL | NULL | NULL |
    +--------+--------+------------+------------+-------------+------------------+---------------+-----------+------------+-------------+-------+
    1 row in set (0.000 sec)




    Trapper log from case when agent2 running on Zabbix server does not have that 20s pause after db query.
    728:20210120:163904.829 trapper got '{"request":"active checks","host":"Zabbix server","version":"5.0"}'
    728:20210120:163904.829 In send_list_of_active_checks_json()
    728:20210120:163904.829 In is_ip4() ip:'127.0.0.1'
    728:20210120:163904.829 End of is_ip4():SUCCEED
    728:20210120:163904.829 In get_hostid_by_host() host:'Zabbix server' metadata:''
    728:20210120:163904.829 query [txnlev:0] [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_ subject,h.tls_psk_identity,a.host_metadata,a.liste n_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='Zabbix server' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null]
    728:20210120:163904.830 query [txnlev:1] [begin;]
    728:20210120:163904.830 In DBregister_host_flush()
    728:20210120:163904.830 In DBregister_host_active()
    728:20210120:163904.830 query [txnlev:1] [select null from actions where eventsource=2 and status=0]
    728:20210120:163904.830 End of DBregister_host_active():FAIL
    728:20210120:163904.830 End of DBregister_host_flush()
    728:20210120:163904.830 query [txnlev:1] [commit;]
    728:20210120:163904.830 End of get_hostid_by_host():SUCCEED
    728:20210120:163904.830 query [txnlev:0] [select itemid from items where type=7 and flags<>2 and hostid=10084]
    728:20210120:163904.831 In substitute_key_macros_impl() data:'vfs.file.contents[/etc/network/interfaces]'
    728:20210120:163904.831 End of substitute_key_macros_impl():SUCCEED data:'vfs.file.contents[/etc/network/interfaces]'
    728:20210120:163904.831 send_list_of_active_checks_json() sending [{"response":"success","data":[{"key":"vfs.file.contents[/etc/network/interfaces]","itemid":32670,"delay":"1m","lastlogsize":0, "mti me":0}],"refresh_unsupported":600}]
    728:20210120:163904.831 End of send_list_of_active_checks_json():SUCCEED
    728:20210120:163904.831 zbx_setproctitle() title:'trapper #5 [processed data in 0.001345 sec, waiting for connection]'
  • tikkami
    Member
    • May 2018
    • 71

    #2
    Dirty workaround would be increasing timeout value in agent.conf from 3 to 25.

    Still wondering what the server is doing during that 20 second delay?

    The delay is present even I have only one host with one item configured. So this should not be a performance issue.

    Comment

    • judgewolf
      Junior Member
      • Aug 2022
      • 8

      #3
      I have literally the same issue. Timeout value increase works, but that is not "clean".
      How did you get the trapper log?
      Last edited by judgewolf; 14-03-2023, 15:54.

      Comment

      • tikkami
        Member
        • May 2018
        • 71

        #4
        I just increased log level of trapper processes and then looked zabbix_server.log.

        Comment

        Working...