Ad Widget

Collapse

Proxy briefly forgets secrets from HashiCorp Vault when config data update is larger

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • freiheit
    Junior Member
    • Dec 2022
    • 11

    #1

    Proxy briefly forgets secrets from HashiCorp Vault when config data update is larger

    This is what I see in the zabbix_proxy.log
    Code:
    1272715:20240912:170033.355 received configuration data from server at "1.2.3.4", datalen 497
    1272715:20240912:170043.795 received configuration data from server at "1.2.3.4", datalen 2784836
    1272715:20240912:170044.231 cannot get secrets for path "redacted/secrets/that/include/snmp/community/for/redacted.example.com": no data
    1272715:20240912:170057.531 received configuration data from server at "1.2.3.4", datalen 22357858
    1272715:20240912:170100.839 cannot get secrets for path "fedacted/secrets/that/include/snmp/community/for/redacted.example.com": no data
    1272770:20240912:170101.970 SNMP response from host "redacted.example.com" contains too few variable bindings
    All I see in the zabbix_server.log is a "sending configuration data to proxy" message with an approximately matching timestamp and exactly matching "data len".

    The vault server logs don't show anything useful, just the Zabbix server frequently reading those secrets and no failures. We have lots of other things that use vault constantly and we are unable to reproduce anything like that with the vault CLI, curl, or other methods of acessing the vault API, so it seems very unlikely to be a problem on the Vault side.

    (server IP, device hostname and secret paths redacted. And it shows this for basically all vault secrets in use at the same time, not just a single secret)

    Randomly we will get auth failures from things using those secrets (for a password or community string) and a corresponding gap in the item history. Those "no data" errors seem to happen approximiately every 60 seconds, but failures of items that use them are much less frequent.

    It seems like there's a brief window of time when the Zabbix server sends a configuration update to the Zabbix proxy that the vault secrets are unavailable to the proxy. And seems like we only see auth failures if it happens to randomly happen right when an item that uses a vault secret is running. And seems like the larger the update the longer that window is, such that the larger the amount of configuration data sent the more likely it causes a failure of an item that uses the macro.

    Anybody else encountered this? Or have any ideas how to eliminate or mitigate the problem? Even just some specifics on how to track down more detail on this kind of failure might be useful.

    I scoured the documentation and really doesn't seem like there's any options to control caching of those secrets (except in the web frontend), which could be a significant mitigation method.
  • GChmurka
    Junior Member
    • Jun 2024
    • 21

    #2
    I have the same problems.
    Are you using group proxies to monitor your hosts?
    Do proxy servers have Vault access data in their configuration files?
    Are you using zabbix HA?

    In my case:
    1. I use proxy group for monitoring my hosts
    2. cannot get secrets for path "XXXXX/host.example.com": no data I see for all host assign on log check proxy server
    3. Proxy does not have Vault access data in its configuration file (in documentation is mandatory but, work without it, so for security i don't configure on proxy). Maybe there is a problem here, I'll check it out.
    4. I use zabbix HA

    Proxy version 7.0.4, Server version 7.0.4
    Of course, despite the errors in the logs, all data is collected, connections work and secret is used corectly (I use this for IPMI macros). I also noticed that after this error my hosts become "GRAY" in the Monitoring -> Host -> column Availability, but all data collected well - all I need to do is reset the proxy server and the hosts will be "GREEN" again.
    Last edited by GChmurka; 24-10-2024, 07:31.

    Comment

    • GChmurka
      Junior Member
      • Jun 2024
      • 21

      #3
      I add VaultToken and VaultURL to proxy.conf and same problem... ;(

      Comment

      • GChmurka
        Junior Member
        • Jun 2024
        • 21

        #4
        I investigated the issue and found in the Zabbix DebugLevel=4 logs that the token I'm using lacks permissions for:
        Code:
        # Allow tokens to look up their own properties
        path "auth/token/lookup-self" {
            capabilities = ["read"]
        }
        
        # Allow tokens to renew themselves
        path "auth/token/renew-self" {
            capabilities = ["update"]
        }
        In my setup, I created a token for Zabbix without the default policies:
        Code:
        vault token create -no-default-policy
        The HashiCorp default policies include these permissions. I added these two permissions, and now in DebugLevel=4 logs, I no longer see any Vault errors.

        We'll see if this fix resolves:
        Code:
        cannot get secrets for path "XXXXX/host.example.com": no data
        These errors appear sporadically, making them difficult to debug, and DebugLevel=4 generates a large amount of data (with 300 monitored hosts, this results in many GBs).
        Last edited by GChmurka; 29-10-2024, 13:36.

        Comment

        • freiheit
          Junior Member
          • Dec 2022
          • 11

          #5
          Originally posted by GChmurka
          I have the same problems.
          Are you using group proxies to monitor your hosts?
          Do proxy servers have Vault access data in their configuration files?
          Are you using zabbix HA?
          1. We have some hosts assigned to a proxy group, but most of the hosts are directly assigned to a specific proxy.
          2. Yes. Except we set VAULT_TOKEN in /etc/sysconfig/zabbix-proxy. I can use that same token on the proxy server and on CLI can
            Code:
            vault kv get ...
            the same path and it works.
          3. Yes, but not really. It's configured, but at the moment there is only a single server so there's no HA changes happening.

          Originally posted by GChmurka
          Proxy version 7.0.4, Server version 7.0.4
          Of course, despite the errors in the logs, all data is collected, connections work and secret is used corectly (I use this for IPMI macros). I also noticed that after this error my hosts become "GRAY" in the Monitoring -> Host -> column Availability, but all data collected well - all I need to do is reset the proxy server and the hosts will be "GREEN" again.
          What we see is randomly a single auth failure related to the vault secret. Not nearly as often as the error shows up in the proxy logs, but definitely happens. For IPMI, that means a SEL log entry. The next check succeeds, so the impact to monitoring is very minimal.

          Comment

          • GChmurka
            Junior Member
            • Jun 2024
            • 21

            #6
            I caught where the problem is at debuglevel=4:

            Code:
            13187:20241031:070056.946 In zbx_dc_sync_kvs_paths()
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f379.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f44.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f279.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f131.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f65.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/fa236.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/fb51.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f317.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f144.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f329.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f88.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f94.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/fa109.example.com": no data
             13187:20241031:070056.946 cannot get secrets for path "XXXX/f47.example.com": no data
            ...
             13187:20241031:070056.948 End of zbx_dc_sync_kvs_paths()​

            Comment

            • GChmurka
              Junior Member
              • Jun 2024
              • 21

              #7
              Additional question: do you have such a warning when starting the proxy:
              Code:
               19030:20241031:110605.577 connection with Zabbix proxy "inf-zabbix-krk-xxx02" should not be unencrypted when using Vault
               19030:20241031:110605.577 connection with Zabbix proxy "inf-zabbix-krk-xxx04" should not be unencrypted when using Vault
               19030:20241031:110605.577 connection with Zabbix proxy "inf-zabbix-krk-xxx03" should not be unencrypted when using Vault
               19030:20241031:110605.577 connection with Zabbix proxy "inf-zabbix-krk-xxx01" should not be unencrypted when using Vault
              I do, even though my proxies are running in active+psk mode and have the following configuration:
              Code:
              Hostname=inf-zabbix-krk-xxx01
              
              Server=10.0.0.1;10.0.0.2
              StatsAllowedIP=10.0.0.1,10.0.0.2
              TLSConnect=psk
              TLSPSKFile=/etc/zabbix/zabbix_proxy.psk
              TLSPSKIdentity=inf-zabbix-krk-xxx01
              
              DBName=/opt/zabbix-proxy/zabbix-proxy.sqlite
              DBUser=zabbix
              
              ProxyBufferMode=hybrid
              ProxyMemoryBufferSize=512M
              CacheSize=512M
              The only place in the entire Zabbix code where this error occurs is:
              https://git.zabbix.com/projects/ZBX/...25ed2ac39459eed974#1482

              so host->tls_accept setting includes the ZBX_TCP_SEC_UNENCRYPTED flag ;(
              All my proxys is configured in zabbix-server as Active+PSK (only)
              Last edited by GChmurka; 31-10-2024, 13:38.

              Comment

              • freiheit
                Junior Member
                • Dec 2022
                • 11

                #8
                Originally posted by GChmurka
                Additional question: do you have such a warning when starting the proxy:
                Code:
                 19030:20241031:110605.577 connection with Zabbix proxy "inf-zabbix-krk-xxx02" should not be unencrypted when using Vault
                Yes, I see entries like that in the zabbix proxy logfiles when the proxy is restarted. And we all the proxies configured for encryption with PSK.

                Comment

                • GChmurka
                  Junior Member
                  • Jun 2024
                  • 21

                  #9
                  Hi
                  I create2 issue on this errors:

                  https://support.zabbix.com/browse/ZBX-25483 - Vault using proxy in PSK mode warning in startup

                  https://support.zabbix.com/browse/ZBX-25499 - Zabbix Proxy - Issues Retrieving Secrets from Vault

                  but, nobody fix this errors yet.

                  Comment

                  Working...