Ad Widget

Collapse

How to disable combining SNMP varbinds (getbulk)?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jpka
    Junior Member
    • Sep 2013
    • 24

    #1

    How to disable combining SNMP varbinds (getbulk)?

    Hi!
    I notice that sometimes Zabbix combine (concatenate) some SNMP requests to same host in one large packet.
    Look like it happens due to that mechanism

    Low-RAM devices like microcontrollers, not compatible with it.
    How to completely disable this feature and let Zabbix use exactly one packet per value ?
    Thank you so much.
  • BDiE8VNy
    Senior Member
    • Apr 2010
    • 680

    #2
    I encounter a similar issue.
    By using bulk retrieval CPU utilization of NetApps reised by ~50 times (beside other issues).

    Unfortunately the only way is going back to release 2.2.2.

    Comment

    • tatapoum
      Senior Member
      • Jan 2014
      • 185

      #3
      According to the docs, Zabbix should be able to detect when the host cannot return multiple values per request :

      However, there is a technical issue that not all devices are capable of returning 128 values per request. Some always return a proper response, but others either respond with a “tooBig(1)” error or do not respond at all once the potential response is over a certain limit.

      In order to find an optimal number of objects to query for a given device, Zabbix uses the following strategy. It starts cautiously with querying 1 value in a request. If that is successful, it queries 2 values in a request. If that is successful again, it queries 3 values in a request and continues similarly by multiplying the number of queried objects by 1.5, resulting in the following sequence of request sizes: 1, 2, 3, 4, 6, 9, 13, 19, 28, 42, 63, 94, 128.

      However, once a device refuses to give a proper response (for example, for 42 variables), Zabbix does two things.

      First, for the current item batch it halves the number of objects in a single request and queries 21 variables. If the device is alive, then the query should work in the vast majority of cases, because 28 variables were known to work and 21 is significantly less than that. However, if that still fails, then Zabbix falls back to querying values one by one. If it still fails at this point, then the device is definitely not responding and request size is not an issue.

      The second thing Zabbix does for subsequent item batches is it starts with the last successful number of variables (28 in our example) and continues incrementing request sizes by 1 until the limit is hit. For example, assuming the largest response size is 32 variables, the subsequent requests will be of sizes 29, 30, 31, 32, and 33. The last request will fail and Zabbix will never issue a request of size 33 again. From that point on, Zabbix will always query 32 variables for this device.

      Comment

      • jpka
        Junior Member
        • Sep 2013
        • 24

        #4
        If that is successful, it queries 2 values in a request. If that is successful again
        In that case, my particular device returns only one (first) value. It is successful or not at the Zabbix point of view?
        And how should such (partial) answer be treated at all (at generic point of view), successful or not?
        In my case Zabbix continues combined requests, so half of data is lost.
        Looks like not too difficult to implement this feature as optional (controlled via config file) as i comment at ZBXNEXT-98 ticket.
        Thanks!

        Comment

        • f.koch
          Member
          Zabbix Certified Specialist
          • Feb 2010
          • 85

          #5
          Originally posted by BDiE8VNy
          I encounter a similar issue.
          By using bulk retrieval CPU utilization of NetApps reised by ~50 times (beside other issues).

          Unfortunately the only way is going back to release 2.2.2.
          hm normally with snmp bulk requests, the cpu utilistation should be lower, and i am no sure if the memory usage is much higher.

          If an Value is requested, the device needs to parse the oid tree to find the right value, and this for each request, if you combine the requests , the oid tree needs to be parsed only one time, and this should be much cheaper, sue it depends on CPU Power of the device, but for this, Zabbix have the build in detection method described by tatapoum

          regards f.koch
          Last edited by f.koch; 18-04-2014, 20:07.

          Comment

          • asaveljevs
            Zabbix developer
            • Feb 2010
            • 36

            #6
            jpka, would it be possible to attach some tcpdump of the traffic that goes between Zabbix and your device?

            Comment

            • jpka
              Junior Member
              • Sep 2013
              • 24

              #7
              1.1.1.98 - server. 1.1.1.37 - microcontroller device. Interval 5 s for both items. Capture is for 15 s, so three cycle captured. Sometimes order of items in combined packet is reversed, device then answers with 2nd item (which first in packet).
              sudo tcpdump -i eth0 "udp and (port 161 or 162)" -f -n -XX
              Code:
              sudo tcpdump -i eth0 "udp and (port 161 or 162)" -f -n -XX
              tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
              listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
              18:52:03.280553 IP 1.1.1.98.49830 > 1.1.1.37.161:  C=passwd GetRequest(42)  .1.3.6.1.2.1.1.1.17 .1.3.6.1.2.1.1.1.15
              	0x0000:  0008 dce7 3a49 ac22 0b50 ba3e 0800 4500  ....:I.".P.>..E.
              	0x0010:  0055 b38d 4000 4011 8282 0101 0162 0101  .U..@[email protected]..
              	0x0020:  0125 c2a6 00a1 0041 ec6a 3037 0201 0104  .%.....A.j07....
              	0x0030:  0670 6173 7377 64a0 2a02 0430 05c2 2a02  .passwd.*..0..*.
              	0x0040:  0100 0201 0030 1c30 0c06 082b 0601 0201  .....0.0...+....
              	0x0050:  0101 1105 0030 0c06 082b 0601 0201 0101  .....0...+......
              	0x0060:  0f05 00                                  ...
              18:52:03.281486 IP 1.1.1.37.161 > 1.1.1.98.49830:  C=passwd GetResponse(29)  .1.3.6.1.2.1.1.1.17=1
              	0x0000:  ac22 0b50 ba3e 0008 dce7 3a49 0800 4500  .".P.>....:I..E.
              	0x0010:  0048 0019 4000 8011 f603 0101 0125 0101  .H..@........%..
              	0x0020:  0162 00a1 c2a6 0034 30fb 302a 0201 0104  .b.....40.0*....
              	0x0030:  0670 6173 7377 64a2 1d02 0430 05c2 2a02  .passwd....0..*.
              	0x0040:  0100 0201 0030 0f30 0d06 082b 0601 0201  .....0.0...+....
              	0x0050:  0101 1102 0101                           ......
              18:52:08.281280 IP 1.1.1.98.46967 > 1.1.1.37.161:  C=passwd GetRequest(42)  .1.3.6.1.2.1.1.1.17 .1.3.6.1.2.1.1.1.15
              	0x0000:  0008 dce7 3a49 ac22 0b50 ba3e 0800 4500  ....:I.".P.>..E.
              	0x0010:  0055 b38e 4000 4011 8281 0101 0162 0101  .U..@[email protected]..
              	0x0020:  0125 b777 00a1 0041 0e81 3037 0201 0104  .%.w...A..07....
              	0x0030:  0670 6173 7377 64a0 2a02 0439 99d1 7f02  .passwd.*..9....
              	0x0040:  0100 0201 0030 1c30 0c06 082b 0601 0201  .....0.0...+....
              	0x0050:  0101 1105 0030 0c06 082b 0601 0201 0101  .....0...+......
              	0x0060:  0f05 00                                  ...
              18:52:08.282236 IP 1.1.1.37.161 > 1.1.1.98.46967:  C=passwd GetResponse(29)  .1.3.6.1.2.1.1.1.17=1
              	0x0000:  ac22 0b50 ba3e 0008 dce7 3a49 0800 4500  .".P.>....:I..E.
              	0x0010:  0048 001a 4000 8011 f602 0101 0125 0101  .H..@........%..
              	0x0020:  0162 00a1 b777 0034 5311 302a 0201 0104  .b...w.4S.0*....
              	0x0030:  0670 6173 7377 64a2 1d02 0439 99d1 7f02  .passwd....9....
              	0x0040:  0100 0201 0030 0f30 0d06 082b 0601 0201  .....0.0...+....
              	0x0050:  0101 1102 0101                           ......
              18:52:13.282146 IP 1.1.1.98.39084 > 1.1.1.37.161:  C=passwd GetRequest(42)  .1.3.6.1.2.1.1.1.17 .1.3.6.1.2.1.1.1.15
              	0x0000:  0008 dce7 3a49 ac22 0b50 ba3e 0800 4500  ....:I.".P.>..E.
              	0x0010:  0055 b38f 4000 4011 8280 0101 0162 0101  .U..@[email protected]..
              	0x0020:  0125 98ac 00a1 0041 1565 3037 0201 0104  .%.....A.e07....
              	0x0030:  0670 6173 7377 64a0 2a02 0430 05c2 2b02  .passwd.*..0..+.
              	0x0040:  0100 0201 0030 1c30 0c06 082b 0601 0201  .....0.0...+....
              	0x0050:  0101 1105 0030 0c06 082b 0601 0201 0101  .....0...+......
              	0x0060:  0f05 00                                  ...
              18:52:13.283096 IP 1.1.1.37.161 > 1.1.1.98.39084:  C=passwd GetResponse(29)  .1.3.6.1.2.1.1.1.17=1
              	0x0000:  ac22 0b50 ba3e 0008 dce7 3a49 0800 4500  .".P.>....:I..E.
              	0x0010:  0048 001b 4000 8011 f601 0101 0125 0101  .H..@........%..
              	0x0020:  0162 00a1 98ac 0034 59f5 302a 0201 0104  .b.....4Y.0*....
              	0x0030:  0670 6173 7377 64a2 1d02 0430 05c2 2b02  .passwd....0..+.
              	0x0040:  0100 0201 0030 0f30 0d06 082b 0601 0201  .....0.0...+....
              	0x0050:  0101 1102 0101                           ......
              ^C
              6 packets captured
              6 packets received by filter
              0 packets dropped by kernel
              sudo zabbix_server --version
              Code:
              Zabbix server v2.2.3 (revision 44105) (7 April 2014)
              Compilation time: Apr 10 2014 05:41:41
              Thanks.

              Comment

              • tatapoum
                Senior Member
                • Jan 2014
                • 185

                #8
                jpka, I think it would be better to capture the whole packets into a PCAP file and attach this file to this forum thread.

                Comment

                • jpka
                  Junior Member
                  • Sep 2013
                  • 24

                  #9
                  Pcap file

                  Pcap file attached, but it contains new (not exactly same) data as in previous post.
                  Attached Files

                  Comment

                  • asaveljevs
                    Zabbix developer
                    • Feb 2010
                    • 36

                    #10
                    jpka, indeed, your device only returns one variable binding, even though the request has two. Could you please a register a new ZBX? It would also be wonderful if you could specify which type of device and model that is.

                    Comment

                    • jpka
                      Junior Member
                      • Sep 2013
                      • 24

                      #11
                      Zbx-8145

                      I create

                      Thank you.
                      P.S. My devices is OSCaR-based smarthome and industrial sensors, they use W5100 and ATmega32A and assembler-written realtime SNMP stack, it is open-source, and code is here http://bdyssh.ru/?p=988

                      Comment

                      • roby
                        Junior Member
                        • Feb 2013
                        • 13

                        #12
                        NetApp problem visualised.

                        values are arriving like this (although actually they should be steady around 50%):
                        Timestamp Value
                        2014.Apr.29 12:17:55 0
                        2014.Apr.29 12:16:54 79
                        2014.Apr.29 12:15:54 0
                        2014.Apr.29 12:14:55 0
                        2014.Apr.29 12:13:54 0
                        2014.Apr.29 12:12:54 0
                        2014.Apr.29 12:11:54 60
                        2014.Apr.29 12:10:54 71
                        2014.Apr.29 12:09:54 76
                        2014.Apr.29 12:08:54 76
                        2014.Apr.29 12:07:54 66
                        2014.Apr.29 12:06:55 99
                        2014.Apr.29 12:05:55 71
                        2014.Apr.29 12:04:54 0
                        Attached Files

                        Comment

                        • roby
                          Junior Member
                          • Feb 2013
                          • 13

                          #13
                          here is tcpdump of Zabbix and NetApp communication.
                          The snmp oid that corresponds to CPU busy is .1.3.6.1.4.1.789.1.2.1.3.0
                          Attached Files

                          Comment

                          • asaveljevs
                            Zabbix developer
                            • Feb 2010
                            • 36

                            #14
                            I have rearranged a bit the netapp.tcpdump.txt attachment. There, it can clearly be seen that the number of variables in responses is the same as in requests. So it is not the same problem as claimed in ZBX-8145.
                            Attached Files

                            Comment

                            • richlv
                              Senior Member
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Oct 2005
                              • 3112

                              #15
                              as for netapp device, would be interesting to gather data for an hour with each version and compare the average.
                              wild guess - preparing an answer for getbulk makes cpu very busy for some period of time, but overall busy rates might be lower - the load is more concentrated, but the device also spends more time doing nothing
                              Zabbix 3.0 Network Monitoring book

                              Comment

                              Working...