Ad Widget

Collapse

Zabbix agent becomes "not available" after some weeks

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • muelli
    Member
    • Jun 2021
    • 68

    #1

    Zabbix agent becomes "not available" after some weeks

    Hello,

    I have a strange problem with an active agent, that was working fine for some weeks. It has been configured with PSK encryption.
    Since this morning it became "not available" in the GUI and somehow I cannot fix this.

    The funny part is, the agent is running and I can query it with zabbix_get (e.g. zabbix_get -s **redacted** -k "system.cpu.load[all,avg1]" --tls-connect=psk --tls-psk-identity="ZBX-AGENT-PSK-ID" --tls-psk-file=/etc/zabbix/zabbix_agent.psk ) and it returns the correct values.
    If I raise the DebugLevel for the agent I can see successfull TLS connects:


    62932:20210614:105359.617 In zbx_tls_connect(): psk_identity:"ZBX-AGENT-PSK-ID"
    62932:20210614:105359.617 zbx_psk_client_cb() requested PSK identity "ZBX-AGENT-PSK-ID"
    62932:20210614:105359.618 End of zbx_tls_connect():SUCCEED (established TLSv1.2 PSK-AES128-CBC-SHA)
    62932:20210614:105359.618 JSON before sending [{"request":"agent data","session":"4995aa78ba164fc7cddac4a2a6c7d6c0" ,"data":[{"host":"**redacted**","key":"net.if.in["eth0",dropped]","value":"169","id":1,"clock":1623660829,"ns" :605 890716}],"clock":1623660839,"ns":618254293}]
    62932:20210614:105359.618 JSON back [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000034"}]
    62932:20210614:105359.618 In check_response() response:'{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000034"}'
    62932:20210614:105359.618 info from server: 'processed: 1; failed: 0; total: 1; seconds spent: 0.000034'
    62932:20210614:105359.618 End of check_response():SUCCEED
    62932:20210614:105359.618 OK
    62932:20210614:105359.618 End of send_buffer():SUCCEED

    On the Zabbix server I can also see the items that are not supported or become available:

    1960140:20210614:104819.284 item "**redacted**:vfs.dev.read.await[sda]" became supported
    1960140:20210614:104821.290 item "**redacted**:vfs.dev.write.await[sda]" became supported

    So I guess the communication basically is OK and working.

    The Zabbix GUI shows a red availability symbol and the tooltip reads "Get value from agent failed: SSL_read() timed out"
    Deleting the host and setting it up again leads to the same problem. Re-installing the agent did not help as well.
    I am just puzzled how this can happen after weeks of it working flawlessly.... Last week I installed some updates on the servers, but no SSL package has been touched.

    Can someone give me a hint how I can debug this problem further?
    Thanks!

    Stefan
  • Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    #2
    Originally posted by muelli
    Last week I installed some updates on the servers, but no SSL package has been touched.
    Any chance to reboot the server? I'm asking because I had very similar problem some time ago and after reboot there weren't problems anymore, even though all Zabbix components were restarted multiple times. There were some upgrades involved, I don't remember details anymore. Only TLS-configured agents were failing occasionally.

    Markku

    Comment

    • muelli
      Member
      • Jun 2021
      • 68

      #3
      Hi Markku,

      this morning the agent/connection suddenly started working again without any action taken beforehand. However I rebooted the server, thanks for the hint.

      Regards
      Stefan

      Comment

      • mansourh12120
        Junior Member
        • Jun 2021
        • 10

        #4
        Hi every one
        i'm trying to undrestand how macro {ICMP_RESPONSE_TIME_WARN} in zabbix is working and what is means

        Comment

        Working...