Ad Widget

Collapse

Too much data from discovery so items gets unavailable

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • MikeJensen
    Junior Member
    • Apr 2024
    • 19

    #1

    Too much data from discovery so items gets unavailable

    I have created a template to discover the access points on my WLC with SNMP, i discover the AP-NAME, AP-LOCATION and AP-SITETAG. What i want it to, is tell me as soon as one of the AP's are down on the WLC and then tell me what AP is down, on what SITE and in which ROOM.
    I got that covered and got the output i wanted.
    But as there is a lot of data (abit over 900 Access Points on one of my WLC's) the data can not keep up, and i then get loads of the triggers since it loses data.

    So, what can i do different so i can get this to work as i want to? And is it even possible with over 900 AP's?

    I have attached screenshots of the Discovery rule, Item prototype, Trigger prototype and the output of when hell breaks loose.
    Attached Files
  • Answer selected by MikeJensen at 26-07-2024, 11:39.
    Markku
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Sep 2018
    • 1781

    I can only suggest increasing SNMP agent timeout to test if if affects, and testing the new walk[] item (with dependent items) in Zabbix 7: https://www.zabbix.com/documentation...itemtypes/snmp

    Markku

    Comment

    • MikeJensen
      Junior Member
      • Apr 2024
      • 19

      #2
      I have tried splitting my Discovery rule up into 3 Discovery rules and then filter each rule so they would get less data. Still the same problem.

      Comment

      • Markku
        Senior Member
        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
        • Sep 2018
        • 1781

        #3
        But as there is a lot of data (abit over 900 Access Points on one of my WLC's) the data can not keep up, and i then get loads of the triggers since it loses data.
        Can you be more specific, how did you come to this conclusion?

        Markku

        Comment

        • MikeJensen
          Junior Member
          • Apr 2024
          • 19

          #4
          Purely speculation as when i try to do a snmpwalk on the OIDs from a linux machine, it takes some time to display it all... And also ChatGPT.
          I've had the discovery rule running on a WLC with only 53 AP's (a virtual appliance we use for flex connected AP's) and here i don't get any AP triggers. But as soon as i enable the discovery rule on my physical WLC. The Items are beign discovered, inputtet as items under the WLC host in Zabbix and then after the first round of update interval. half of them appears to be offline and i get spammed with my triggers.

          Comment

          • Markku
            Senior Member
            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
            • Sep 2018
            • 1781

            #5
            I can only suggest increasing SNMP agent timeout to test if if affects, and testing the new walk[] item (with dependent items) in Zabbix 7: https://www.zabbix.com/documentation...itemtypes/snmp

            Markku

            Comment

            • Markku
              Senior Member
              Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
              • Sep 2018
              • 1781

              #6
              Also, keep an eye on the poller utilization.

              If you have 900 APs, is it necessary to check their names and locations every 1 minute? (That's 900x3 = 2700 requests every minute = 45 SNMP requests per second just for those APs)

              Markku

              Comment

              • MikeJensen
                Junior Member
                • Apr 2024
                • 19

                #7
                Markku

                No it is not. What i want my rule to do, is discover the AP's on the WLC (doesn't even have to be every hour as we rarely do configuration on them). But trigger when there is no data for the AP on the WLC.
                If only i could find an OID to get AP in the "NOT JOINED" state from the WLC, it would be much better, but i can't seem to find it.

                I have tried making my discovery rule update interval to 1 Hour, Item Prototype to 2m and Trigger Prototype 5m. Which in my head, tells me it checks if the items is there every hour, updates the items every 2 minutes, and if there has been no response from it after 5 minutes, it should create a trigger.

                Comment

                • Markku
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                  • Sep 2018
                  • 1781

                  #8
                  Originally posted by MikeJensen
                  I have tried making my discovery rule update interval to 1 Hour, Item Prototype to 2m and Trigger Prototype 5m. Which in my head, tells me it checks if the items is there every hour, updates the items every 2 minutes, and if there has been no response from it after 5 minutes, it should create a trigger.
                  Correct.

                  You can even make the NAME update to be that 2m (to be used with the nodata() trigger), and LOCATION and SITETAG updates like 1 hour.
                  (Update: Or was it so that you only had an item for the name, ok)

                  Did you find anything in the poller utilization? I take it that you are monitoring the WLC from the server as you didn't mention any proxies.

                  Markku
                  Last edited by Markku; 24-07-2024, 14:57.

                  Comment

                  • MikeJensen
                    Junior Member
                    • Apr 2024
                    • 19

                    #9
                    Markku
                    I did not look at the poller yet. I'm about to go for today but will look at it again tomorrow.I'm running Zabbix server -> WLC without a proxy.
                    The new walk[] i'm not able to try as i'm still on Zabbix 6.0.

                    Comment

                    • MikeJensen
                      Junior Member
                      • Apr 2024
                      • 19

                      #10
                      Markku
                      The Poller Data Collector processes without the discovery rule is at about 10%, with it enabled it goes to 20-25%. Much higher than usual but not more than critical.

                      Comment

                      • markosa
                        Senior Member
                        Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                        • Aug 2022
                        • 104

                        #11
                        Have you tried to poll from server, using snmpget/snmpwalk, some of those AP's? I mean are those responding properly/in time?

                        Comment

                        • MikeJensen
                          Junior Member
                          • Apr 2024
                          • 19

                          #12
                          markosa
                          Yeah that was why i refered to in post #4 "it takes some time to display it all". Just tried to time it as best as i could with snmpwalk from the zabbix server Ubuntu cli, takes around 2.5-3 seconds to display all the AP names. But i get all of the APs connected, no data missing.

                          Comment

                          • Markku
                            Senior Member
                            Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                            • Sep 2018
                            • 1781

                            #13
                            Originally posted by Markku
                            I can only suggest increasing SNMP agent timeout to test if if affects, [...]
                            Did you try this?

                            Markku

                            Comment

                            • MikeJensen
                              Junior Member
                              • Apr 2024
                              • 19

                              #14
                              Markku
                              I did now, and it might actually have fixed my problem! (This or perhaps a reboot of the server).
                              Thank you!

                              In the server config it was already at Timeout=4, i increased it to 5 and now i'm no longer being spammed with down AP's. And i've tested it by taken an AP down and connecting it again, no problem.

                              I noticed in my config, there was 2 Timeout=4, could this have been messing up somehow? What i mean is like this:

                              ### Option: Timeout
                              # Specifies how long we wait for agent, SNMP device or external check (in seconds).
                              #
                              # Mandatory: no
                              # Range: 1-30
                              # Default:
                              Timeout=4
                              Timeout=4

                              Also, what does this do exacly? With a Timeout=5, does it wait for 5 seconds to get all data before it stops the process?
                              Last edited by MikeJensen; 26-07-2024, 11:44.

                              Comment

                              • Markku
                                Senior Member
                                Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                                • Sep 2018
                                • 1781

                                #15
                                Great! I don't believe a duplicate line is a problem, latest one takes precedence anyway (think about the include files, the one seen later in the include process takes effect).

                                Yeah in 6.0 that's how Timeout setting works. In 7.0 the settings are done in GUI and the Timeout directive is just a "connect timeout", not the completion timeout.

                                Markku
                                Last edited by Markku; 26-07-2024, 11:59.

                                Comment

                                Working...