Ad Widget

Collapse

Thoughts from a Nagios user

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dminstrel
    Member
    • Apr 2005
    • 72

    #1

    Thoughts from a Nagios user

    I'm a long-time open source NMS user. I've been using Nagios since it was called Netsaint and dabbled in Big Brother.

    We're currently monitoring over 80 hosts and 200 services with Nagios and it's been working great.

    BUT...I'm migrating to Zabbix!

    The main reason is that Nagios is mainly a real-time monitoring solution and it's very good at it. Once you want to introduce a trending aspect, you need to slap-in 3rd-party apps like APAN, Perfparse, nagiosgraph, etc. I've even tried an unholy integration with Cacti. It works, but since we're a small IT team at my company, I want to minimize the number of systems we use and manage. We also want to do SLA-related trending and it's a pain to do that Nagios. Another strength on Zabbix is the network map feature – it's a nightmare to create a useful map of more than 20 hosts in Nagios.

    One aspect that's the most confusing for a Nagios admin coming to Zabbix is the Hosts/Items/Triggers/Actions aspect versus the Hosts/Services aspect in Nagios. I find that the tight coupling of hosts and services in Nagios is much simpler to understand, set-up and manage than in Zabbix.

    A suggestion I could make would be to add a Monitor option to each Item added to an Host that would do a basic trigger setup (a Wizard-type dialog maybe?).

    I've also been looking for the Nagios Service Detail screen equivalent in Zabbix, the Overview screen is not quite there yet.

    Does anybody else have migration stories to share?

    Cheers,

    Jonathan
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Hi Jonathan,

    Thanks for sharing your experience! I appreciate it. May I ask you to tell me what, in your opinion, is missing in the Overview screen and how it can be improved? What do you expect from an ideal Overview screen? It would be very useful!
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • dminstrel
      Member
      • Apr 2005
      • 72

      #3
      Here's a quick mock-up of what would be great.
      (39kb per image is really low!)

      What's useful about this screen is that the Host/Service relationships are clear and easy to understand. What I don't like about Alpha 7 is that I don't have a quick, in-your-face status screen as soon as I get into Zabbix.

      It could be customized to show only what's wrong (triggered items) or all items/triggers for all hosts. As another poster suggested, the "Top 10" view of what's wrong could also be shown here. You could have a "Nagios compatibility mode" or a "Top 10 mode" or a "Zabbix mode", etc. This has a different purpose from Screens as Screens is used more for trending purposes (display graphs) than real-time monitoring purposes.

      You would get this screen instead of the little "..." you get when you click View. An admin could then drill down with the rightmost drop-down lists to display groups, triggers, etc.

      Cheers,

      Jonathan
      Attached Files

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        Thanks for the screenshot! Actually I think combination of "Latest values" screen with some sort of trigger status displaying is what you want.

        For example, if there are no freedisk space on volume /var, the item "Free Disk Space on /var" could be in a different color depending on severity of the problem. Does it make sense?
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • dminstrel
          Member
          • Apr 2005
          • 72

          #5
          Yeah, something like that.

          An even more compact display could be

          |HOST|Last check|Latest values|

          The latest values field background (cell background) could be white if the latest value of the item is not linked to a trigger, pale green if linked to a trigger but OK, red if trigger is activated, etc.

          What are the thoughts of other board members on this?

          Cheers,

          Jonathan

          Comment

          • Tonyb
            Junior Member
            • Apr 2005
            • 4

            #6
            Another Nagios User

            I am also a netsaint->nagios user thinking about the move to zabbix.

            I think that would be a great addition to zabbix.

            One other problem i see with zabbix is its lack of external command support.

            We use several custom written plugins for nagios, which are just external programs that exit with a specifc exit code. For example we have one plugin that sends an email through one of our mail servers the first time it is run and the next time it is run it checks the specified account via pop3 to test our entire email system.

            I think if zabbix had a way to define external Item Types it would take it one step higher.

            I don't think it would be very hard, there would need to be a new configuration section for "Item Types" where you could set the path to the external program and define common paramaters (like hostname/IP). Then when zabbix runs the command it could also pass the item key as paramaters.

            Maybe thats outside the bounds of zabbix but i really think that would make zabbix much more customizable.

            Comment

            • charles
              Member
              • Sep 2004
              • 54

              #7
              I already made the move to Zabbix from Nagios a while ago. No regrets, but have not used Zabbix to it's full potential either.

              External Item Types already exists in Zabbix - that are called User Parameters. Search for UserParameter in the docs.

              hth
              charles

              Comment

              • Tonyb
                Junior Member
                • Apr 2005
                • 4

                #8
                One key advantage over nagios exist in that the web based config and monitor are all in one.

                What if you want to monitor the status of a host from the prospective of the NMS server? It would be much more beneficial to be able to define external commands from the web interface that could then be implement into templates than to manually add them to the zabbix_agent.

                If you have to rely on zabbix_agent to run external command from the same server that zabbix_server runs on your going to get into the same situation that nagios is currently in, where the only way to simplify the use of external commands is a third party program. This would greatly reduce the benefits of zabbix’s all in one solution.

                Comment

                • charles
                  Member
                  • Sep 2004
                  • 54

                  #9
                  Yes, it would be nice to be able to define them in the gui, but in many/most cases you still have to configure or install something on the server to be monitored anyway.

                  The script is run on the server monitored, not the monitoring server btw

                  charles

                  Comment

                  • Tonyb
                    Junior Member
                    • Apr 2005
                    • 4

                    #10
                    Many times you want the monitoring server to run the script, for example for checking DNS servers instead of just checking if the process is running the server could be queried from the monitoring server and checked to see if a value response was received.

                    With a plug-in type architecture you would just have to copy the plugin into a specified directory and make sure it is executable. Then the entire configuration would be done through the web interface.

                    I’m not proposing this as a replacement to using agentd to run commands on the remote host, but rather to extend the number of checks the monitoring server can do.

                    Comment

                    • dminstrel
                      Member
                      • Apr 2005
                      • 72

                      #11
                      I second that. This is an aspect where Nagios shines compared to Zabbix as it allows to run external scripts on the monitoring server. So for instance, if I want more detail from my ping checks (to do something similar to SmokePing), I'd just run my own script instead of recompiling Zabbix.

                      Alexei, could the Simple Checks parameter be extended to allow a external_script(/path/to/my/script) Key?

                      Cheers,

                      Jonathan

                      Comment

                      • jyoung
                        Junior Member
                        • Mar 2005
                        • 13

                        #12
                        Originally posted by Tonyb
                        Many times you want the monitoring server to run the script, for example for checking DNS servers instead of just checking if the process is running the server could be queried from the monitoring server and checked to see if a value response was received.
                        I currently do this with Zabbix and is what I think you're asking for.

                        I am a Nagios user as well and love many of the Nagios checks that are not provided yet in Zabbix(NTP, DNS checks as you've described, HTTPS cert checks).

                        I was running Zabbix 1.0 and just went to 1.1alpha7. I can't recall what exact alpha this was included in, perhaps it was 7, but you can pass arguments in User Parameters. The down fall of this has been that they must be in order, IE:
                        UserParameter cust_dns_check[*],/usr/local/zabbix/bin/check_dns
                        (yeah, you might recognize that 'check_dns' script, it's the one you use with Nagios.)

                        Now when I add my items I can add,
                        cust_dns_check[ns1.mynameserver.com]
                        and it's executed as
                        /usr/local/zabbix/bin/check_dns ns1.mynameserver.com

                        likewise I could add the item as
                        cust_dns_check[ns1.mynameserver.com 10.30.0.4]
                        and it would be executed as
                        /usr/local/zabbix/bin/check_dns ns1.mynameserver.com 10.30.0.4

                        Now you say, "Wait, that does nothing for me, it's not even a valid argument nor would it return anything I can use."

                        Here is what I use for my "custom" dns check:

                        UserParameter=cust_check_dns[ns1],/usr/local/nagios/libexec/check_dns -H site.tocheck.net -s ns1.nameserver.com -a 10.30.0.3 |awk '{if ($2=="ok") print "1"; else print"0"};'

                        If you use the nagios check_dns command you know what this does.
                        I'm asking it to check the DNS entry of site.tocheck.net from nameserver ns1.nameserver.com for the addres 10.30.0.3. This returns a line of text that we can use awk on to filter out for good results.

                        But this is completely different from what i was talking about before, right?
                        Yes, because passing many arguments in a UserParameter gets VERY UGLY in the WebUI. (perhaps this is something that can be looked into)


                        Now to combine what the two examples I've just given. When passing just one argument the look of the Zabbix UI is good. So I MUST check my NTP servers because time is Crucial. If I cannot access an NTP server I want to know right away so that I can ensure my time is correct.

                        /etc/zabbix/zabbix_agentd.conf:
                        UserParameter=cust_check_ntp[*],/usr/local/zabbix/bin/check_ntp.sh

                        /usr/local/zabbix/bin/check_ntp.sh:
                        #!/bin/bash

                        SERVER=$1

                        /usr/local/zabbix/bin/check_ntp $SERVER |awk '{if ($2 == "OK:") print "1"; else print "0"}'

                        --//--

                        Now I can add custom checks to dfiferent NTP servers without even editing the agentd.conf file again. All I do is add an item for "cust_Check_ntp[ip.addr.here]" and I'm ready to roll. If this one goes down and out for some odd reason I simply whipe out the item and add a new one for the NTP server I have replaced it with.

                        Of course, with NTP servers you want to be nice, so don't check if you have access more than every 5 minutes. Your clock won't tray too far away in 5 minutes anyways.

                        Ick. I just looked at all that and it's messy. Messy messy.

                        I'm too lazy to edit, it's Friday and time to head home from work. Ask if this isn't clear and I'll try to re-explain it all.

                        Side note: I do all my NTP, DNS and HTTPS checks from one server. Then each of the servers themselves check if the service is up as well.

                        Jesse

                        Comment

                        • klavs
                          Junior Member
                          • Apr 2005
                          • 18

                          #13
                          But doing the checks in the agent - means you have to "assign" the check to a host(could ofcourse be the servers agent), that is not really hosting the service - ie. have a "scriptsserver". That's pretty ugly IMHO.
                          with f.ex. https responsetimes - I'd like to check remotely - and attach the check to the server which actually hosts the http-site. This can only be done, if it's a test run on the server, like simple_check is.

                          Comment

                          • jyoung
                            Junior Member
                            • Mar 2005
                            • 13

                            #14
                            Originally posted by klavs
                            But doing the checks in the agent - means you have to "assign" the check to a host(could ofcourse be the servers agent), that is not really hosting the service - ie. have a "scriptsserver". That's pretty ugly IMHO.
                            Partially ugly, just becuase it required a bit of work. Zabbix has not been around nearly as long as netsaint/nagios thus it must still build on it's offerings. An NTP check or HTTPS cert check could easily be added into Zabbix at a later date because we devise a way to do it. Then it become less ugly.

                            IMHO it is NOT very ugly as it stands. If I wanted to do the same thing in Nagios I would be required to add the services checks to one of my servers. Most of my UserParameter checks are one line in zabbix_agent.conf and I can add many checks off of the sole UserParameter to check the service status of multiple servers.

                            All you're really saying is Zabbix needs a few plug-in packages like Nagios and it won't be ugly anymore.

                            Originally posted by klavs
                            with f.ex. https responsetimes - I'd like to check remotely - and attach the check to the server which actually hosts the http-site. This can only be done, if it's a test run on the server, like simple_check is.
                            How would you even propose to get this? If you're wanting to check the response time you HAVE to have a Zabbix Agent running on the remote side. If you don't you're "response" time will not be qualified. Your HTTPS server does not have any idea when the remote site issued its request for service. Your HTTPS site will only know 2/3s of the story -- that it has acknowledged the connection and the remote host is now requesting a page.
                            Last edited by jyoung; 09-04-2005, 21:30.

                            Comment

                            • klavs
                              Junior Member
                              • Apr 2005
                              • 18

                              #15
                              Well - this patch: http://www.zabbix.com/forum/showthread.php?t=445 seems to support it.

                              In regards to measuring http-responsetimes, it is actually rather common to do it from the server. Usually it is on the same LAN - so there's no network delay, and as such the response-time is an accurate measurement. Then ofcourse, one should check connectivity outwards, but that's another story.

                              Both BigBrother, BigSister, Nagios. etc has checks (or items as they are called in Zabbix), which the server checks for directly, and not through a local agent.

                              I can see no argument for Zabbix not having a patch, for adding checks to the zabbix_server.conf - like userparams are added to zabbix-agentd.conf. Pref. it could be the same code that agentd has for this, reused in the server.

                              Comment

                              Working...