Ad Widget

Collapse

zabbix Agent sending errors

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Atoudam
    Junior Member
    • Feb 2019
    • 11

    #1

    zabbix Agent sending errors

    HI

    I wrote a custom version of the zabbix agent for usage on BrightScript (since there is not a version already)

    it works well and is deployed to 100+ devices

    however three of my devices have trouble sending in their data

    here is an example of what they sent in to the proxy (they are in active mode)

    and the response from the proxy
    i cant figure out the issue with this
    the issue only shows on three devices
    it does not always show , ie they can send data in for a few hours and then randomly start showing this error
    other devices running the same code (at least 30 units) are sending in data to the same proxy without issue
    it is always and only these three units
    is there someone more familiar with the zabbix protocol that can see where i have gone wrong?
    lifting the log level in the proxy does not show any more info



    {"clock":1656646286,"data":[{"clock":1656646281,"host":"MB-AV-HO-04","id":6298,"key":"agent.ping","ns":77053975," va lue":1},{"clock":1656646281,"host":"MB-AV-HO-04","id":6299,"key":"net.if.in[eth0]","ns":77053975,"value":25830628},{"clock":165 6646 281,"host":"MB-AV-HO-04","id":6300,"key":"net.if.ip4[eth0]","ns":77053975,"value":"172.30.104.53"},{"clo ck": 1656646281,"host":"MB-AV-HO-04","id":6301,"key":"net.if.out[eth0]","ns":77053975,"value":4538921},{"clock":1656 6462 81,"host":"MB-AV-HO-04","id":6302,"key":"proc.num[,,run]","ns":77053975,"value":"1"},{"clock":16566462 81," host":"MB-AV-HO-04","id":6303,"key":"proc.num[]","ns":77053975,"value":"224"},{"clock":165664 6281 ,"host":"MB-AV-HO-04","id":6304,"key":"system.cpu.load[percpu,avg10]","ns":77053975,"value":"0.00"},{"clock":16566 4628 1,"host":"MB-AV-HO-04","id":6305,"key":"system.cpu.load[percpu,avg1]","ns":77053975,"value":"0.00"},{"clock":16566 4628 1,"host":"MB-AV-HO-04","id":6306,"key":"system.cpu.load[percpu,avg5]","ns":77053975,"value":"0.00"},{"clock":16566 4628 1,"host":"MB-AV-HO-04","id":6307,"key":"system.localtime","ns":7705 39 75,"value":1656646281},{"clock":1656646281,"host" : "MB-AV-HO-04","id":6308,"key":"vfs.fs.size[SD:,free]","ns":77053975,"value":14652},{"clock":165664 6281 ,"host":"MB-AV-HO-04","id":6309,"key":"vfs.fs.size[SD:,pfree]","ns":77053975,"value":96.49},{"clock":165664 6281 ,"host":"MB-AV-HO-04","id":6310,"key":"vfs.fs.size[SD:,used]","ns":77053975,"value":533},{"clock":16566462 81," host":"MB-AV-HO-04","id":6311,"key":"vm.memory.size[available]","ns":77053975,"value":277860}],"ns":77053975,"request":"agent data","session":"MB-AV-HO-04000000000000946684838"}
    #TRACE (Zabbix) zabbixAgent response to agent data recieved 92 bytes

    [HASHTAG="t68"]error[/HASHTAG] (Zabbix) zabbixAgent agent data result processed: 0; failed: 14; total: 14; seconds spent: 0.000340

    cheers
    Adam
  • Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    #2
    The data looks fine, and obviously the proxy was able to parse the data so that it found all 14 data items from it. But for some reason it figured out that the data should not be processed.

    One reason can be that it didn't know the host name, but I guess the name MB-AV-HO-04 is fine?

    Other reason can be that the item names are not correct for that host. But you said this is random, so how could those three hosts lose their items (according to the proxy) every now and then?

    1656646281 = Friday, 1st July 2022, 03.31 = over 11 days ago, is that the actual time you took that log? I mean, one reason for discarding the data could be (but I'm not sure) that it is too old.

    Can you figure out any common features/properties for these three hosts that could explain their similar behaviour? Do they fail at the same time?

    Markku

    Comment

    • Markku
      Senior Member
      Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
      • Sep 2018
      • 1781

      #3
      Originally posted by Atoudam
      lifting the log level in the proxy does not show any more info
      Btw this is interesting, are you sure about this? I would have assumed that the proxy would log the reason for any failure.

      Markku

      Comment

      • Atoudam
        Junior Member
        • Feb 2019
        • 11

        #4
        thanks for your response Markku

        I have had a further look into your suggestions

        it looks like both units fail at the same time ie the start of the day when they (and 30 odd other units that don't fail) are powered on

        I have re checked the proxy log and it does show more info, i may not have restarted the proxy when i first lifted the log level

        attached is a section from the log which i think shows everything to do with a failure to process data of host MB-AV-HO-03

        unfortunately i still cant see anything that stands out as the cause

        Adam



        MB-AV-HO-03 fault.txt

        Comment

        • Markku
          Senior Member
          Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
          • Sep 2018
          • 1781

          #5
          Thanks for the log. So:

          Code:
          1744:20220714:090946.387 In parse_history_data()
          1744:20220714:090946.387 End of parse_history_data():SUCCEED processed:14/14
          1744:20220714:090946.387 In process_history_data()
          1744:20220714:090946.387 End of process_history_data() processed:0
          It parsed the data, 14 items, then tried to process but didn't "find" anything to process.

          Can you take another log from a successful case about the same agent?

          Markku

          Comment

          • Atoudam
            Junior Member
            • Feb 2019
            • 11

            #6
            thanks

            as attached

            i have extracted two log entries the process was

            proxy (always on)
            host booted
            multiple host fail log entry's (one attached )
            proxy rebooted
            host success entry as attached (there were a few in there)
            just now as a write this i see it has started to fail again

            Cheers
            Adam
            Attached Files

            Comment

            • Markku
              Senior Member
              Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
              • Sep 2018
              • 1781

              #7
              I see that all your values have the same ns (nanoseconds) value (77053975), that is perhaps some of your design choices, and shoudn't affect here.

              In the documentation (https://www.zabbix.com/documentation...collected-data) it says about the "id" field:

              This ID is used to discard duplicate values that might be sent in poor connectivity environments.
              Can it be that your agent reuses the value ID values in some circumstances and therefore the proxy discards the values? (That could explain why proxy reboot made it work again, maybe proxy does not save the ID values permanently)

              Markku

              Comment

              • Atoudam
                Junior Member
                • Feb 2019
                • 11

                #8
                iv checked the agent code

                the value IDs are incremented with each item, so they don't get re used until a reboot of the agent where they reset to 0

                re the session (sessionID in v6) I only change this on boot of the agent (it never reused) , should this be changed on each set of data being sent in?

                i was under the understanding that the ID and the session was used to identify duplicate values?

                Comment

                • Markku
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                  • Sep 2018
                  • 1781

                  #9
                  Originally posted by Atoudam
                  i was under the understanding that the ID and the session was used to identify duplicate values?
                  Yes, that's what the documentation says: sessionid+id together must be unique, so make sure that your agent uses different sessionid each time it restarts. (Oops just realized that you maybe already said that by talking about agent code, I first thought you checked Zabbix' agent code)

                  Good catch, they really changed the session property to sessionid, according to the documentation.

                  Markku

                  Comment

                  • Markku
                    Senior Member
                    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                    • Sep 2018
                    • 1781

                    #10
                    Originally posted by Markku
                    Good catch, they really changed the session property to sessionid, according to the documentation.
                    ... but the reality is that Zabbix Agent and Zabbix Agent 2 still use session and not sessionid (I tested with 6.2.1 agents), so the documentation is incorrect, I've commented that in issue ZBX-21346.

                    Update: the documentation was promptly fixed by the Zabbix team: https://www.zabbix.com/documentation...collected-data

                    Markku
                    Last edited by Markku; 02-08-2022, 11:02.

                    Comment

                    • Atoudam
                      Junior Member
                      • Feb 2019
                      • 11

                      #11
                      i have figured it out

                      it is the session value

                      i create the session on Agent boot

                      i use the host name and the system boot time (secondsSince) as this will always be unique

                      however these embedded POS (pieces of ...) drop their time overnight due to their poorly designed RTC backup power being "super" caps rather then batteries

                      so on a cold boot sometimes the time has be reset back to the default (Saturday, January 1, 2000 12:00:38 AM) and so my ID is now no longer unique

                      thanks for your help with this

                      Comment

                      • Markku
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                        • Sep 2018
                        • 1781

                        #12
                        Nice, thanks for reporting back. This is valuable information for anyone having to develop their own agents.

                        Markku

                        Comment

                        Working...