Ad Widget

Collapse

How to manage many endpoints

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • IronG
    Junior Member
    • Jun 2024
    • 3

    #1

    How to manage many endpoints

    Hi,
    I am new to Zabbix but have spent the last week trying to understand and use it.
    I work at a small MSP company and we are testing Zabbix because we intend to move from another software that we have used for years now.

    Today we monitor about:

    100 pcs GSM/3G/4G modems using SNMP.
    50 pcs Synology (DSM6 and DSM7) using SNMP
    20 pcs QNAP (many variants) using SNMP
    20 pcs TRUE Nas Core or Scale
    100 pcs Servers (Dell, HP and Supermicro)
    150 pcs Printers
    75 pcs UPS (Eaton, APC and Power Walker)
    200 Ubiquity routers, switches, AP etc.)

    However, all this equipment is divided between a number of customers and today (using our current system) we have an separate windows agent installed on a VM in each customers network that handles the local SNMP traffic and then relays it to the main server.

    In Zabbix I have created my own template with about 40 measuring points (SNMP) for the GSM/3G/4G modems that works well with both triggers and graphs. I have also played around with MAPS to be able to create a direct HTTPS link to the modem.
    As of now I think I have figured out the technical part of Zabbix so that fine.

    However, I need to get some insight and help about best practice when handling a large number of endpoint as well as how most efficient divide everything between different customers so that easy to work with.
    I am used to work in hierarchical views and I think I am missing this here. I have installed a TreeView plugin I found but it is not enough.
    I have played around with Dashboards but can’t get it the way I want it.

    So my question is how do I manage a large number of customers units and keeping them apart so I not only get a big pile of attached equipment that.

    I know this is a fuzzy question but I find many good thing in Zabbix so I really would like to start using it.

    Thanks
    Marcus
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    Host group(s) per customer...

    Comment


    • IronG
      IronG commented
      Editing a comment
      I have done this but when I bring up the hostlist using the HOST menu in MONITORING it becomes very long list. Compleatly unuseble for us.
      That's why I installed the plugin-in with TreeView (HOST TREE) and I get a better view but the links doesn't work correctly. (Latest data, Problems)
  • markosa
    Senior Member
    Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
    • Aug 2022
    • 104

    #3
    Since you already have different server/vm for different customer network(?),you could replace that with zabbix proxy which monitors that specific customer and if they have servers also those could have zabbix agent, if possible. Zabbix proxy does all polling and communicates with zabbix agents(if used) and returns data to Zabbix server.

    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4807

      #4
      True, by default and without a filter that menu can be long... but you can create filters and save them also for future... Also a dashboard per client can be an option...

      Comment

      • PeterZielony
        Senior Member
        • Nov 2022
        • 146

        #5
        You can use TAGs for filtering.

        Also as other mentioned - proxies
        Last edited by PeterZielony; 04-06-2024, 11:04.

        Hiring in the UK? Drop a message

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4807

          #6
          Proxy is just a way to collect data, it does not change anything from "viewing the data" perspective...
          Tagging things can be useful in many places...

          Comment

          • IronG
            Junior Member
            • Jun 2024
            • 3

            #7
            Thanks for your answers.

            I am not found of TAG´s since they don’t give me a good over view of how good order I have in my system, all the time. Sure, TAG´s can be useful when you want to filter out something, unless you forgot to tag one or two units. It is the Google way, “let’s throw everything in a big box and hope we find it when we search for it”.
            The problem with this philosophy is that you don’t know what you missed until it might be too late.

            I guess I am the old school engineer who likes to keep thing in a hierarchical order from the beginning. When I add something, I always know exactly where it belongs or I create a place for it.

            For example, we monitor toner levels in the printers, and there will always be many printers with low level of toners. We also monitor GSM/3G/4G modems and there will always be many modems with temporary issues.
            Imagine that we have everything we monitor in one big list, about 500 units, and there are a hundred units with a warning. Mentally, it becomes overwhelming. At least for me.

            For my testing I have added about 50 units to my Zabbix and already this I challenging to follow. The TreeView plugin helps a lot for me but I would like more levels.

            I have tried to find how to suppress a triggered problem until it changes again. How do I do that in the “update” menu? Can I do it?
            Last edited by IronG; 04-06-2024, 16:56.

            Comment

            • PeterZielony
              Senior Member
              • Nov 2022
              • 146

              #8
              I'm not sure if any that Zabbix can offer will help you really, also I'm struggling to understand this problem. If there is a problem - why is it not resolved?

              One thing that might help you - is an external ticketing system (via API) to manage those problems with some SLA and a better overview of outstanding issues. In an ideal world, there isn't a single error in Zabbix and its job is to highlight problems that need to be resolved relatively quickly so that out of 1k devices 500 have a problem - this is massive problem from management point of view. Once Zabbix detect there is a change "to ok" or fail will close or open the ticket for you. Ticket system is what you would need really in my opinion along with Zabbix as a "observer and actioner"

              Also as a toner situation based on level you can adapt different problem severity so you will know which need replacing asap. (And update ticket relatively)
              Last edited by PeterZielony; 04-06-2024, 18:19.

              Hiring in the UK? Drop a message

              Comment

              • markosa
                Senior Member
                Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                • Aug 2022
                • 104

                #9
                So, do I understand correctly, you want to hide most of you warnings until problem goes to certain severity or problem is active too long? What do you mean with "suppress a triggered problem until it changes again" ? In our environment, we don't show un classified or information level events, if some problem with info-level has been active longer than, like 100min, then that event is changed to warning(via action), and then it pops to dashboards problem section. Great tool for sorting and doing "complicated" displays is Grafana which might be good for you. Ofcourse you can create dashboards with multiple problems widgets where you have filtered content based on your need, one would show printer related stuff, other would show modems etc. and as said by cyber host groups and tag's are useful.

                Comment


                • IronG
                  IronG commented
                  Editing a comment
                  Here is one example.
                  We monitor a lot of mobile modems and all of them have a RTC Battery inside. It have 3 values 0= bad, 1 = exchange, 2=OK
                  When it goes from OK to Exchange I have set it up with a trigger as a warning. When I get this I klick the PROBLEM and then "Update".

                  Here is the possibility to suppress the problem but only "Indefinitely" or "Until". I would like to be able to suppress until "Change"

                  So when the RTC battery changes from "Exchange" to "Bad" the problem warning will reappear.
              • cyber
                Senior Member
                Zabbix Certified SpecialistZabbix Certified Professional
                • Dec 2006
                • 4807

                #10
                I guess main question has been, how to display your things so it would be easy to view things "per customer". Right?

                Comment

                • markosa
                  Senior Member
                  Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
                  • Aug 2022
                  • 104

                  #11
                  IronG We've done dashboard with problem display so that it won't show acknowledged problems, so you would ack Exchage event and you should have different trigger with different/same severity for BAD, that would show up when correct value is received and when condition is OK, both events should be closed.

                  Comment

                  • PeterZielony
                    Senior Member
                    • Nov 2022
                    • 146

                    #12
                    Originally posted by IronG

                    Here is one example.We monitor a lot of mobile modems and all of them have a RTC Battery inside. It have 3 values 0= bad, 1 = exchange, 2=OK
                    When it goes from OK to Exchange I have set it up with a trigger as a warning. When I get this I klick the PROBLEM and then "Update".


                    Here is the possibility to suppress the problem but only "Indefinitely" or "Until". I would like to be able to suppress until "Change"


                    So when the RTC battery changes from "Exchange" to "Bad" the problem warning will reappear.
                    There is no need to supress those

                    You need just additional trigger with higher severity for this scenario. This way it will generate additional notification and that device with warning trigger will be replaced with high (as warning won't be true so will be resolved but new "high" trigger will be fired)

                    then if you want display only high just in problems choose high problems based on host group which will display all that needs action.

                    additional tags will be helpful if you need futher filtering based on device type and host group.
                    Last edited by PeterZielony; 07-06-2024, 10:00.

                    Hiring in the UK? Drop a message

                    Comment

                    Working...