Ad Widget

Collapse

MQTT plugin in Zabbix Agent 2 - subscribe was broken by timeout

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • vchrizz
    Junior Member
    • Jul 2017
    • 18

    #1

    MQTT plugin in Zabbix Agent 2 - subscribe was broken by timeout

    Hi,

    I have some weird issue with the MQTT plugin in Zabbix Agent 2 not getting data from MQTT broker (mosquitto) via plain tcp.

    Some time ago I had this problem and the fix was to restart the agent, so I tried it this time too, but unfortunately it does not fix the issue.
    I was using Zabbix 6.3 until I got to this issue and now I tried upgrading to 6.4 but the issue persists.

    In /etc/zabbix/zabbix_agent2.d/plugins.d/mqtt.conf I tried setting Plugins.MQTT.Timeout=30 but this does not help.
    Even tried to debug with tcpdump, it seems there are packets going in both directions, so I don't understand why the agent tells there is a timeout.
    How could I debug why the agent tells, that the connection timed out, altough using mosquitto_sub it works perfectly fine?

    zabbix_agent2.conf DebugLevel=3
    Code:
    ​# tail -f zabbix_agent2.log
    ...
    2022/12/19 01:29:16.910583 using plugin 'MQTT' (built-in) providing following interfaces: watcher, configurator
    ...
    2022/12/19 01:29:16.911186 Plugin support version 1.1
    2022/12/19 01:29:16.911223 Zabbix Agent2 hostname: [Zabbix server]
    2022/12/19 01:29:48.016759 [MQTT] cannot subscribe topic 'revpi75099/io/PS_Pumpenanl2_Alarm' to [tcp://10.34.0.208:1883]: timed out while subscribing
    ^C
    I tried raising DebugLevel to 5 to see what causes the timeout but there it seems the agent is able to subscribe, but still gets no data?
    zabbix_agent2.conf DebugLevel=5
    Code:
    2022/12/19 01:31:54.493723 [MQTT] established connection to [tcp://10.34.0.208:1883]
    ...
    2022/12/19 01:31:54.495890 [MQTT] creating new subscriber on topic 'revpi75099/io/PS_Pumpenanl1_Status' for [tcp://10.34.0.208:1883]
    ...
    2022/12/19 01:31:54.505217 [MQTT] subscribing 'revpi75099/io/KOL_Kollektor2_Leckwarnung' to [tcp://10.34.0.208:1883]
    2022/12/19 01:31:54.548337 [MQTT] subscribed 'revpi75099/io/KOL_Kollektor2_Leckwarnung' to [tcp://10.34.0.208:1883]
    ​
    ​​
    Still in Zabbix Frontend all MQTT items have red [i] sign with error: "subscribe was broken by timeout". But still mosquitto_sub works fine:
    Code:
    ​​# mosquitto_sub -h 10.34.0.208 -p 1883 -v -t "revpi75099/io/PS_Pumpenanl2_Alarm" -u client -P r34dMyD4t4
    revpi75099/io/PS_Pumpenanl2_Alarm 1
    ​
    Is this a bug in the MQTT plugin of agent2 or how could I debug/fix this?

    Thank you and kind regards!
  • Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    #2
    Hi, an obvious first step is to use a version that is actually supported and publicly released, like 6.2.6 or 6.0.12 (you are using alpha/beta versions).

    If the problem persists, I'd still recommend capturing with tcpdump (sudo tcpdump port 1883 -w capture.pcap -v) and analyzing the capture file with Wireshark. If you need help with checking the capture, you can post the file here (provided that your security practices allow it).

    Markku

    Comment

    • vchrizz
      Junior Member
      • Jul 2017
      • 18

      #3
      Thank you for your response!
      I can not recall exactly, I might be wrong but I think the version I selected was because of zabbix-agent2 with MQTT plugin beeing available from this version on.
      Just checked and saw that this is available in 6.2 too.. However.. I downgraded to 6.2 and the issue persists.
      Looking with wireshark in the pcap file generated by tcpdump I notice a RST,ACK coming back from the MQTT broker (=10.34.0.208) to the agent (=10.34.0.211).
      But I don't understand why this happens, I noticed this happens reproduceable after I restart the agent, after 45 or 50 seconds, after that nothing more happens.
      Attached is the pcap file within a zip file as pcap files are not allowed as attachments here.
      Attached Files

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #4
        Thanks for the capture. Let's see (t = time in seconds since start of the capture):

        - t = 0: agent connects and subscribes some topics
        - t = 0 .. 18: broker publishes stuff, agent ACKs everything ok
        - t = 30: one more subscribe
        - t = 35: ping request from agent, ping response from broker
        - t = .. 48: more publishes, looks fine this far
        - t = 50: agent just resets (RST) the connection, "I'm done with this"

        - t = 158: agent connects again, subscribes, publishes received
        - t = 193: ping request + ping response (session length = 35 s)
        - t = 208: agent RST (session length = 50 s)

        - t = 445: agent connects again, subscribes, publishes received
        - t = 480: ping request + ping response (session length = 35 s)
        - t = 490: agent RST (session length = 45 s)

        - t = 601: agent connects again, subscribes, publishes received
        - t = 636: ping request + ping response (session length = 35 s)
        - t = 646: agent RST (session length = 45 s)
        ​​
        Looks consistent, just like you said, 50, 50, 45 and 45 seconds sessions, agent resets them all.

        Hmm, why would the agent reset the connection at that point?

        Markku

        Comment

        • Markku
          Senior Member
          Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
          • Sep 2018
          • 1781

          #5
          [MQTT] cannot subscribe topic 'revpi75099/io/PS_Pumpenanl2_Alarm' to [tcp://10.34.0.208:1883]: timed out while subscribing
          Do you still get that error in the agent log? Can you correlate the log timestamp to the capture file?

          As far as I can tell, in the capture, in all four connections, I see 6 subscribe requests (mqtt.msgtype==8 in Wireshark filter) and 6 subscribe ACKs (mqtt.msgtype==9).

          Markku

          Comment

          • Markku
            Senior Member
            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
            • Sep 2018
            • 1781

            #6
            Interesting detail btw:

            - the first 5 subscribes happen at once
            - but the 6th subscribe happens 30 seconds later

            Click image for larger version

Name:	image.png
Views:	1380
Size:	114.1 KB
ID:	456317

            What does it mean, I don't know

            Markku

            Comment

            • Markku
              Senior Member
              Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
              • Sep 2018
              • 1781

              #7
              FWIW, I tested with Emonpi MQTT feed, with the 6.4.0beta4 agent2, and didn't have any issues. MQTT pings+responses happened every 30 or 35 seconds (wonder what affects that), and received ~2500 messages in about 6 minutes. Just one TCP session, no resets.

              My item was a simple wildcard item mqtt.get[<hostname-of-mqtt-broker>,emon/#,<username>,<password>], did not do any configuration in the agent2.

              Markku

              Comment

              • vchrizz
                Junior Member
                • Jul 2017
                • 18

                #8
                Thank you very much for your efforts!
                Yes, indeed interesting behaviour, especially because it worked fine in the past, I also noticed from agent's debug-log that the first few items seem to successfully get subscribed (though no data of those items arrive at the server).
                My items are not any special: mqtt.get["{$MQTT_HOST}","{$MQTT_TOPIC_PATH}NSHV2_Abg_Ba ust2 _Res1_Status","{$MQTT_USERNAME}","{$MQTT_PASSWORD} "]
                Macros from the template with the MQTT items:
                {$MQTT_HOST} = 10.34.0.208
                {$MQTT_PASSWORD} = r3adMyData
                {$MQTT_TOPIC_PATH} = revpi75099/io/
                {$MQTT_USERNAME} = client
                I also did not configure anything special in agent config, other items seem to work fine though, just the MQTT items do not work..
                Regarding the 30 seconds, maybe this is because I had Plugins.MQTT.Timeout=30 in /etc/zabbix/zabbix_agent2.d/plugins.d/mqtt.conf configured? As this is the only option for this plugin, I thought I try that to see if it makes any difference.
                Attached another pcap without the timeout setting for comparison where I also restarted the agent a few times.
                Attached Files
                Last edited by vchrizz; 19-12-2022, 21:21.

                Comment

                • Markku
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                  • Sep 2018
                  • 1781

                  #9
                  How many MQTT items you have?

                  It's interesting that in the capture file those six subscribes (that are done before the reset) are different each time (and your NSHV2_Abg_Baust2_Res1_Status is not there at all). Why isn't agent subscribing to all the items?

                  Markku

                  Comment

                  • vchrizz
                    Junior Member
                    • Jul 2017
                    • 18

                    #10
                    one template with 108, another one with 7 and a third one with 44 items. Good point, maybe its just too many items? I'll try to clear the two larger templates and try if the smallest one works.

                    Comment

                    • Markku
                      Senior Member
                      Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                      • Sep 2018
                      • 1781

                      #11
                      How about the timeouts in the agent log file, are they still coming, did you get some during either of the packet captures? It would be interesting to see the exact timing related to the packet captures.

                      Markku

                      Comment

                      • Markku
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                        • Sep 2018
                        • 1781

                        #12
                        In your second capture file the message identifiers (in the subscribe messages) behave very differently: using number 1 many times, then start increasing. After the reset it continues using the (same) next number again for a while, then increases it, and so on.

                        Again, I don't know what to make of it, just saying.

                        In the second capture it is clear that you first stopped the agent manually, the TCP session was cleanly closed with mutual FINs (I saw the same here). In the subsequent sessions they ended with client resets.

                        Yeah, interesting case from packet analysis point of view

                        If you can find out that this only happens with many MQTT items, you might be able to open an issue in https://support.zabbix.com .

                        Markku

                        Comment


                        • Markku
                          Markku commented
                          Editing a comment
                          Oops, only realized now that you changed the agent2, now it was 6.2.6 instead of the 6.4.0beta4(?) that was used earlier, maybe that explains the different message identifiers.

                          Markku

                        • Markku
                          Markku commented
                          Editing a comment
                          Also, the agent (6.2.6 in your capture) seems to increase the message identifier from 1 to 2 (and up from there) only when there is a small break, like 2-3 seconds. (Again, I don't know what it means, just saying)

                          Markku
                      • Markku
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                        • Sep 2018
                        • 1781

                        #13
                        Also, back to basics: you said that this worked earlier without problems, so maybe you will figure out if something was changed around the time of the problems starting.

                        Markku

                        Comment

                        • vchrizz
                          Junior Member
                          • Jul 2017
                          • 18

                          #14
                          So it is working again (now still using v6.2) but I don't understand why it suddenly works...
                          I cleared one by one the larger templates until I had only one template with only 7 items. Suddenly those 7 items got data again and I thought, ok well then there might be just too much items.
                          Then I wanted to reproduce the issue and added those templates back again one by one, looking if the item get data until all templates were added back and all items got data from the broker.
                          This is driving me nuts, not be able to reproduce the issue is worse than having the issue...
                          However, thank you very much for helping to debug this issue. As this is the second time this issue happened uncontrolled, I'll keep an eye on that and report here if it happens again.
                          Kind regards!

                          Comment

                          • vchrizz
                            Junior Member
                            • Jul 2017
                            • 18

                            #15
                            I'm sorry to say that, but I'm having the issue again; it worked after I posted last time on 22.12.2022 until 18.03.2023 and without changing anything, today I noticed it does not work...
                            All the items have a red [i] box which tells: "timed out while subscribing"
                            The log says (for all MQTT items): "[MQTT] cannot subscribe topic 'revpi75099/io/PS_Pumpenstube_Temperatur' to [tcp://10.34.0.208:1883]: subscribe was broken by timeout"
                            but getting data manually works:
                            Code:
                            # mosquitto_sub -h 10.34.0.208 -p 1883 -v -t "revpi75099/io/PS_Pumpenstube_Temperatur" -u client -P ************
                            revpi75099/io/PS_Pumpenstube_Temperatur 94
                            at the moment these are the current versions installed:
                            Code:
                            # dpkg -l | grep zabbix
                            ii zabbix-agent 1:6.4.0~beta4-1+debian11 amd64 Zabbix network monitoring solution - agent
                            ii zabbix-agent2 1:6.2.6-1+debian11 amd64 Zabbix network monitoring solution - agent
                            ii zabbix-apache-conf 1:6.4.0~beta4-1+debian11 all Zabbix network monitoring solution - apache configuration for front-end
                            ii zabbix-frontend-php 1:6.4.0~beta4-1+debian11 all Zabbix network monitoring solution - PHP front-end
                            ii zabbix-release 1:6.3-3+debian11 all Zabbix official repository configuration
                            ii zabbix-server-pgsql 1:6.4.0~beta4-1+debian11 amd64 Zabbix network monitoring solution - server (PostgreSQL)
                            ii zabbix-sql-scripts 1:6.4.0~beta4-1+debian11 all Zabbix network monitoring solution - sql-scripts

                            So I looked up the actual supported versions and saw, that currently 6.0 and 6.4 are supported, so I upgraded (and removed stale zabbix-agent because zabbix-agent2 is installed and in use):
                            Code:
                            # dpkg -l | grep zabbix
                            ii zabbix-agent2 1:6.4.1-1+debian11 amd64 Zabbix network monitoring solution - agent
                            ii zabbix-apache-conf 1:6.4.1-1+debian11 all Zabbix network monitoring solution - apache configuration for front-end
                            ii zabbix-frontend-php 1:6.4.1-1+debian11 all Zabbix network monitoring solution - PHP front-end
                            ii zabbix-release 1:6.4-1+debian11 all Zabbix official repository configuration
                            ii zabbix-server-pgsql 1:6.4.1-1+debian11 amd64 Zabbix network monitoring solution - server (PostgreSQL)
                            ii zabbix-sql-scripts 1:6.4.1-1+debian11 all Zabbix network monitoring solution - sql-scripts
                            Then rebooted the whole machine and hoped for any improvment/change but the same issue persists..
                            ​What should I do? It can not be that I have to clear/readd the templates every few months?

                            Comment

                            Working...