Ad Widget

Collapse

SSL_ERROR_SYSCALL & TLS handshake set result code to 5

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bjoern123
    Junior Member
    • Nov 2023
    • 8

    #1

    SSL_ERROR_SYSCALL & TLS handshake set result code to 5

    SSL_ERROR_SYSCALL & TLS handshake set result code to 5


    Hello all,
    first of all my setup:
    Zabbix Server and Proxy: v7.0.4
    OS: AlmaLinux 9
    OpenSSL Version: 3.0.7
    Zabbix Agent2 Versions: 7.0.4 and 6.0.x (Clients are Windows and Linux Servers)
    Agent Encryption with CERT

    Problem: There are many SSL/TLS errors in the logs of Zabbix Agents and Zabbix Proxies which cause the Trigger to execute an alert, that the host is not available.

    Example Zabbix Proxy Log:
    Code:
    28839:20241001:063743.072 SSL_shutdown() with x.x.x.x set result code to 5 (SSL_ERROR_SYSCALL):[32] Broken pipe
    28839:20241001:063743.105 SSL_shutdown() with x.x.x.x set result code to 5 (SSL_ERROR_SYSCALL):[32] Broken pipe
    28839:20241001:063743.106 SSL_shutdown() with x.x.x.x set result code to 5 (SSL_ERROR_SYSCALL):[32] Broken pipe
    28839:20241001:063743.120 SSL_shutdown() with x.x.x.x set result code to 5 (SSL_ERROR_SYSCALL):[32] Broken pipe
    28839:20241001:063743.122 SSL_shutdown() with x.x.x.x set result code to 5 (SSL_ERROR_SYSCALL):[32] Broken pipe
    Later you get
    Code:
    28839:20241001:063746.445 resuming Zabbix agent checks on host "abcdef": connection restored
    Example Zabbix Agent2 (7.0.4) Log from a Windows Server:
    Code:
    1164:20241001:063521.712 failed to accept an incoming connection: from x.x.x.x: TLS handshake set result code to 5:
    1940:20241001:063521.712 failed to accept an incoming connection: from x.x.x.x: TLS handshake set result code to 5:
    3404:20241001:063522.712 failed to accept an incoming connection: from x.x.x.x: TLS handshake set result code to 5:
    1940:20241001:063522.727 failed to accept an incoming connection: from x.x.x.x: TLS handshake set result code to 5:
    Example Zabbix Agent2 (7.0.4) Log from a Linux Server:
    Code:
    2024/10/01 05:59:37.838007 failed to process an incoming connection from x.x.x.x: EOF
    2024/10/01 05:59:37.841839 failed to process an incoming connection from x.x.x.x: EOF
    2024/10/01 05:59:37.848623 failed to process an incoming connection from x.x.x.x: EOF
    2024/10/01 05:59:37.855338 failed to process an incoming connection from x.x.x.x: EOF
    This results in (especially for Windows Clients) to cause the Trigger be executed because there are (in my case) 15 minutes no communication with the Agent:
    Code:
    Windows: Zabbix agent is not available
    Does anyone have an idea what could cause this problem?
    Please let me know if you need any additional information...

    Thanks!
  • Answer selected by bjoern123 at 16-10-2024, 16:42.
    bjoern123
    Junior Member
    • Nov 2023
    • 8

    The problem is solved for me.

    In the Zabbix Proxy Log there were many "Zabbix agent item on host "host123" failed: first network error, wait for 15 seconds" messages. During this period also the TLS handshake errors occurred.
    In the end, the solution was not related to Zabbix itself at all. After migrating the Zabbix proxy virtual machine to a different ESXi-Host all problems were gone. After that the affected ESXi-Host was rebooted and is now working properly again.

    So, if you encounter any error messages mentioned in this thread, include network and hypervisor infrastructure to the scope of your troubleshooting.

    Comment

    • tim.mooney
      Senior Member
      • Dec 2012
      • 1427

      #2
      You've said that the clients are Windows and Linux, but what versions of either? Are these recent versions of Windows and Linux, or are they mostly older?

      On your Zabbix server, what does

      Code:
      sudo update-crypto-policies --show
      output? "DEFAULT"?


      I'm asking about OS versions because my first suspicion would be that it's a failure to negotiate because your Zabbix server wants to use only recent, secure protocol versions and ciphers and your clients are just old enough that they may not support what your server wants to use. That's just a guess, it could be something completely different, but RHEL 9 (and respins like AlmaLinux) defaults to not supporting a lot of older "legacy" ciphers.

      Comment

      • bjoern123
        Junior Member
        • Nov 2023
        • 8

        #3
        Hi tim.mooney,
        thank you for your reply!

        Crypto Policy (on the Zabbix Proxy) is
        Code:
        DEFAULT:AD-SUPPORT-LEGACY
        Clients are all up to date operating systems. Windows Server 2019 & 2022, AlmaLinux 8/9, Ubuntu 22.04/24.04.

        Comment

        • tim.mooney
          Senior Member
          • Dec 2012
          • 1427

          #4
          Originally posted by bjoern123
          Clients are all up to date operating systems. Windows Server 2019 & 2022, AlmaLinux 8/9, Ubuntu 22.04/24.04.
          Then my initial thought that it's a negotiation issue between new proxy or server and old client isn't likely the problem.

          If you increase the log_level on one of the agents, do you get anything additional in the logs that useful to further narrow down what the problem might be?

          Comment

          • cyber
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Dec 2006
            • 4807

            #5
            I had some issues connecting v7 agents... Had to specify TLSCipherAll13 parameter... Then it started to work... of course error message was absolutely not related to TLS version, but also something about handshakes etc (If I remember it now correctly).

            Comment

            • bjoern123
              Junior Member
              • Nov 2023
              • 8

              #6
              Originally posted by tim.mooney

              Then my initial thought that it's a negotiation issue between new proxy or server and old client isn't likely the problem.

              If you increase the log_level on one of the agents, do you get anything additional in the logs that useful to further narrow down what the problem might be?
              I already did that, but for me it is not useful, it shows just that it is working sometime and sometimes not...e.g.:
              Code:
              11996:20241014:102834.292 zbx_tls_accept() peer certificate issuer:"emailAddress=...,CN=...,OU=...,O=..." subject:"..."
               11996:20241014:102834.292 End of zbx_tls_accept():SUCCEED (established TLSv1.3 TLS_AES_256_GCM_SHA384)
               11996:20241014:102834.292 Requested [perf_counter_en["\PhysicalDisk(0 C:)\Disk Writes/sec",60]]
               11996:20241014:102834.293 In PERF_COUNTER_EN()
               11996:20241014:102834.293 In get_perf_counter_value_by_path() path:\PhysicalDisk(0 C:)\Disk Writes/sec interval:60 lang:1
               11996:20241014:102834.293 In add_perf_counter() counter:'\PhysicalDisk(0 C:)\Disk Writes/sec' interval:60
               11996:20241014:102834.293 add_perf_counter(): PerfCounter '\PhysicalDisk(0 C:)\Disk Writes/sec' successfully added
               11996:20241014:102834.293 End of add_perf_counter(): SUCCEED
              A few seconds later:
              Code:
               13924:20241014:102835.557 In zbx_tls_accept()
               13924:20241014:102835.558 End of zbx_tls_accept():FAIL error:'TLS handshake set result code to 5:'
               13924:20241014:102835.558 failed to accept an incoming connection: from x.x.x.x: TLS handshake set result code to 5:
               15216:20241014:102836.323 In collect_perfstat()
               11996:20241014:102836.323 In zbx_tls_accept()
               11996:20241014:102836.324 End of zbx_tls_accept():FAIL error:'TLS handshake set result code to 5:'
               11996:20241014:102836.324 failed to accept an incoming connection: from x.x.x.x: TLS handshake set result code to 5:
               15216:20241014:102836.326 End of collect_perfstat()
               11704:20241014:102836.401 In send_buffer() host:'zabbix proxy' port:10051 entries:0/100
               11704:20241014:102836.401 End of send_buffer():SUCCEED​
              On Zabbix Proxy side I can see messages like this with debug level 4:
              Code:
              613773:20241014:104223.311 failed to accept an incoming connection: from x.x.x.x: TLS handshake set result code to 1: file ssl/record/rec_layer_s3.c line 320 func ssl3_read_n: error:0A000126:SSL routines::unexpected eof while reading: TLS write fatal alert "decode error"
              I already set up a new proxy to exclude any problems with the Zabbix Proxy host itself, but same behaviour...

              Comment

              • bjoern123
                Junior Member
                • Nov 2023
                • 8

                #7
                The problem is solved for me.

                In the Zabbix Proxy Log there were many "Zabbix agent item on host "host123" failed: first network error, wait for 15 seconds" messages. During this period also the TLS handshake errors occurred.
                In the end, the solution was not related to Zabbix itself at all. After migrating the Zabbix proxy virtual machine to a different ESXi-Host all problems were gone. After that the affected ESXi-Host was rebooted and is now working properly again.

                So, if you encounter any error messages mentioned in this thread, include network and hypervisor infrastructure to the scope of your troubleshooting.

                Comment

                Working...