Ad Widget

Collapse

active proxy wont receive cfg via ipsec

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • SANDMAN
    Junior Member
    • Jun 2023
    • 14

    #1

    active proxy wont receive cfg via ipsec

    Having issues at a remote site post upgrade to Zabbix server 7.0.1 + Zabbix Proxy sqlite 7.0.1

    The active proxy connects to server via IPSEC tunnel and was working for last year+ over this tunnel on 6.4. And I use a unique PSK for proxy connection through that tunnel as well. Agents at that site use auto registration PSK.

    Now with v7 the proxy connects to server and proxy attempts to send configuration data and but proxy does not receive config. Server shows write timeout.
    Reviewing firewall logs there is nothing blocked between proxy to server. And full ICMP connectivity via IPSEC in both directions between proxy container + server container

    Thinking this maybe a packet size / fragmentation issue with IPSEC + PSK, I setup a new proxy with no encryption over IPSEC tunnel + added 1 SNMP device to be monitored by proxy.
    This new unencrypt proxy also shows green and server acknowledges connection just like the PSK proxy. But proxy still doesn't receive config.

    Proxy shows: cannot obtain configuration data from server at "x.x.x.x": read timeout

    Proxy last seen is never more than last 5 seconds. Connectivity is here WTF.

    So I temporarily setup external WAN access via proxy using PSK. Its works and proxy receives configuration data from server. WTF?!

    Because I updated the zbx_server_host= FQDN to resolve external and it worked I tried setting proxy container to resolve FQDN via ipsec tunnel.
    Still won't download config through tunnel when using FQDN....

    I do not believe there is any issue with this IPSEC tunnel. Reviewed every network device in path. Connectivity is here. No other issues over VPN.

    Got many hours into this - Please - any ideas?
  • vsergione
    Junior Member
    • Oct 2023
    • 28

    #2
    Have you tried running a tcpdump on the proxy to check for incoming packets from the server? Do this also when you are doing the ping.

    Comment

    • SANDMAN
      Junior Member
      • Jun 2023
      • 14

      #3
      Originally posted by vsergione
      Have you tried running a tcpdump on the proxy to check for incoming packets from the server? Do this also when you are doing the ping.
      No didn't run tcpdump but I will tomorrow. But likely I won't see return packets based on my findings earlier.
      I did a packet capture firewalls LAN interfaces which brought me to investigate more on L3 switch ACLs.

      I discovered that the outbound ACLs on switch cause issue with return traffic to proxy.
      If I temporarily remove the outbound ACL the proxy will receive config. And will continue doing so even after re-enabling ACL.
      So I do have site monitoring over IPsec right now. But if proxy stops have to toggle ACL.

      Not ideal - I don't understand why yet. This same ACL which was actually stricter before only allowing tcp 10051 works just fine when coming in via WAN. But not IPSEC. Opened ACLs up to full IP but still blocks it.
      And at the server's site, the server is in a DMZ vlan. Another proxy which monitors local site is in a management vlan routed with same L3 switch. The exact same outbound ACL rules defined and that proxy receives config.
      Something with the IPSEC + Stateless ACLs on switch + How ZABBIX SVR & PRX handle TCP states.

      Opnsense IPSEC has tunable = net.inet.ipsec.dfbit (Do not fragment bit on encap)
      Wondering if this could cause issue with how zabbix is doing tcp states. Not sure - never ran into this on Zabbix 6.4.3 running for over a year.

      ACL-INBOUND

      80 permit ip ZBX_SVR_IP 0.0.0.0 ZBX_PRX_IP 0.0.0.0
      203 permit ip ZBX_SVR_IP 0.0.0.0 any
      300 deny every log

      ACL-OUTBOUND (PROBLEM ACL)

      1 permit tcp any DMZ_SUBNET 0.0.0.255 flag established
      16 permit ip ZBX_PRX_IP 0.0.0.0 ZBX_SVR_IP 0.0.0.0
      300 deny every log

      Topology:

      ZBX_PRX --> L3-Switch --> OPNSENSE FW --> IPSEC --> OPNSENSE FW --> FORTIGATE TANSPARENT FW --> L3-Switch (ACLs above) --> ZBX_SVR

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #4
        The "established" keyword in a stateless ACL is only able to check the TCP flags in the packets. With fragmented packets there isn't TCP header in the subsequent fragments, so it doesn't work in practise when fragmentation occurs. But obviously you now have the proxy-to-server IPs permitted unconditionally as well, so that should work. If it doesn't, sounds strange. (Now it isn't unheard that switch ASIC programming has failed and causes weird things but maybe it shouldn't start when Zabbix is upgraded.)

        But yes, tcpdumps on both proxy and server at the same time (and on the middleboxes as you can) give you plenty information what's happening packet-wise (just be aware of the possible TCP offloading pecularities when interpreting the dumps).

        Markku

        Comment

        • vsergione
          Junior Member
          • Oct 2023
          • 28

          #5
          I would play around with the TCP packet size on the proxy to validate the assumption that fragmentation is the root cause.

          Comment

          • Markku
            Senior Member
            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
            • Sep 2018
            • 1781

            #6
            Yes, and if the OS (what is it?) was upgraded at the same time as Zabbix, it is well possible that some kernel/TCP setting was changed at the same time, causing slight change in the behavior.

            Markku

            Comment

            • SANDMAN
              Junior Member
              • Jun 2023
              • 14

              #7
              Originally posted by Markku
              Yes, and if the OS (what is it?) was upgraded at the same time as Zabbix, it is well possible that some kernel/TCP setting was changed at the same time, causing slight change in the behavior.

              Markku
              Debian 12 running docker. I did a minor upgrade from 12.5 to 12.6 prior to zabbix 7.0.1
              Stack::
              postgres:16.3-alpine
              zabbix/zabbix-server-pgsql:alpine-7.0-latest
              zabbix/zabbix-web-nginx-pgsql:7.0-alpine-latest
              zabbix/zabbix-agent:alpine-7.0-latest
              zabbix/zabbix-java-gateway:alpine-7.0-latest

              After encountered issues with proxy yesterday, I did just upgrade that Debian docker proxy host from 11.x to 12.6
              zabbix/zabbix-proxy-sqlite3:7.0-alpine-latest

              What I don't understand is that disabling the ACL allows the proxy to receive config and the session remains active after re-enabling ACL. Can see proxy continuously receiving config data (+ latest items current)
              That same outbound switch ACL does not need to be disabled if I come inbound via WAN. So I feel this behavior isolates it to being related to IPSEC.


              Comment

              • SANDMAN
                Junior Member
                • Jun 2023
                • 14

                #8
                Unapproved post
                Edit:
                Outbound ACL rule used when coming inbound via WAN which works.
                permit tcp any ZBX_SVR_IP 0.0.0.0 eq 10051

                Comment


                • tim.mooney
                  tim.mooney commented
                  Editing a comment
                  Don't you also need to allow ports for IKE when doing IPSEC, or is that not needed because of where you're terminating IPSEC?
              • SANDMAN
                Junior Member
                • Jun 2023
                • 14

                #9
                Only each firewall's WAN interfaces need IKE ports + ESP rules as that is where IPSEC terminates. The other filtering devices in path toward server only require TCP 10051 (active proxy) to be allowed.
                I'm going to be re-visiting this week and will share any new developments. By toggling ACL off temporarily, then establishing connection, then re-enabling switch ACL, the proxy has remained functional (updating config) all weekend.
                Issue is a strange one...

                Comment

                • SANDMAN
                  Junior Member
                  • Jun 2023
                  • 14

                  #10
                  I ran tcpdump as suggested and reviewed with wireshark. Difficult to interpret.

                  SERVER PCAP
                  sudo tcpdump -i ens192 -nn -s0 -v dst ZBX_PRX_IP and port 10051 -w server-cap.pcap

                  Proxy PCAP
                  sudo tcpdump -i ens192 -nn -s0 -v src ZBX_SVR_IP and port 10051 -w proxy-cap.pcap

                  Communication was their for the return traffic from server hitting proxy. Return traffic shows Zabbix server src port tcp 10051 --> Zabbix Proxy dst port 40000-50000

                  Many RST - reset packets when ACL enabled.
                  Upon disabling switch ACL mentioned above (which would allow proper connection) their is a 10 packets with TCP out of order and a few TCP re-transmission.

                  So knowing that the connection worked outside of IPSEC tunnel with switch ACL enabled I decided to remove the IPSEC tunable - net.inet.ipsec.dfbit back to 0 (Off / disabled) and re-establish VPN.
                  And what do ya know...
                  It works with switch ACL enabled (just like WAN access). So the IPSEC do not fragment flag screwing with the TCP states / session. Triple combo of IPSEC with dfbit + stateless switch ACL + Zabbix 7.x

                  This site to site VPN connects 3 separate vlans between sites with many app services.
                  No other applications have this issue with df flag. And as mentioned I ran Zabbix 6.4.3 server + proxy for over a year. So IMO the combination of my environment + a change in zabbix 7.x caused this.
                  Zabbix 6.4.3 did not care about IPSEC DF flag. The VPN has been set with this df flag for ~2 years.

                  Any ideas?

                  Here is the Opnsense documentation regarding net.inet.ipsec.dfbit


                  Path MTU Discovery
                  When trying to enforce path mtu discovery (PMTU), you need to make sure packets leave the network with the DF set. The kernel offers a tunable net.inet.ipsec.dfbit which offers 3 options, 0, clear the bit on packets leaving the firewall (default), 1, set the DF bit or 2 to copy the bit from the inner header.

                  Comment

                  • Markku
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                    • Sep 2018
                    • 1781

                    #11
                    In order to the packet captures be useful, you'll need full TCP sessions = bidirectional captures, I see your tcpdump filters only take one direction. And actually, including ICMP packets as well is needed because the unreachables (caused by small MTU) should happen, and TCP port is not working with fragments. So I'd use "host PROXY_IP or icmp" on the server and "host SERVER_IP or icmp" on the proxy. Yes, it will capture all ICMP (like pings etc) but trying to IP-filter ICMP is complex to prevent missing some information.

                    Otherwise, hard to say what happened during the Zabbix upgrade. It's not Zabbix but the OS networking stack that decides how the data is constructed as packets and frames in the wire. If time permits, maybe install a small 6.4 setup besides the 7.0 setup to compare the traffic. Yes, requires some extra effort, so I understand if it is not so interesting to you.

                    Markku

                    Comment

                    Working...