Ad Widget

Collapse

Thoughts from a Nagios user

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Tonyb
    Junior Member
    • Apr 2005
    • 4

    #16
    Originally posted by jyoung
    All you're really saying is Zabbix needs a few plug-in packages like Nagios and it won't be ugly anymore.
    That’s not what we are saying at all. It is ugly because if we want to run a script on the monitoring server to monitor remote host they checks are listed as items on the monitoring server and not the host that you are checking. The items for the monitoring server will quickly grow into the hundreds and become difficult to manage.

    It would be much cleaner if there were a way in Zabbix to define external commands. For example you could define an external command at the server
    like:
    ServerCommand=check_dns[*],/usr/local/nagios/libexec/check_dns -H $1 -s $HOSTNAME -a $2

    This way you can attach the item to the actual host that it is checking and you don't have to write a shell script for each external plug-in.

    This does pose one other problem though. Nagios plug-ins (for example) allow for the plug-in to decide if the check should be Ok, Warning, Critical, or UNKNOWN. One plug-in might check more than one thing, for example the dns plug-in checks to see if the server responds at all and also check to see if it responds with the correct address. I don't know how you could use that with the historical monitoring features of Zabbix. If you wanted to graph DNS server response time then you would have to have the plug-in return the response time. You could then of course setup a trigger to check if the response time was under a specific amount of time but what happens if the server doesn't reply at all?

    Comment

    • charles
      Member
      • Sep 2004
      • 54

      #17
      I see your guys point, and it would be a good addition to Zabbix imo

      Comment

      • jyoung
        Junior Member
        • Mar 2005
        • 13

        #18
        Originally posted by Tonyb
        That’s not what we are saying at all. It is ugly because if we want to run a script on the monitoring server to monitor remote host they checks are listed as items on the monitoring server and not the host that you are checking. The items for the monitoring server will quickly grow into the hundreds and become difficult to manage.
        Okay, I understand now. I deal with a relatively smaller cluster, so the grouping has not exceeded 100 checks on the main monitoring server. For a cluster of machines less than 30-35 I can see having all checks on the monitoring server being easier to mintor, although when the number of servers is greater than that the page would get rather ugly and hard to manage.

        Originally posted by Tonyb
        It would be much cleaner if there were a way in Zabbix to define external commands. For example you could define an external command at the server
        like:
        ServerCommand=check_dns[*],/usr/local/nagios/libexec/check_dns -H $1 -s $HOSTNAME -a $2

        This way you can attach the item to the actual host that it is checking and you don't have to write a shell script for each external plug-in.
        Agreed. I wrote the shell scripts as a hack for something that has net yet been implemented.



        Post #4, it looks like Alexei has plans for this implementation lets hope he's able to get it in the final 1.1 release.

        Originally posted by Tonyb
        This does pose one other problem though. Nagios plug-ins (for example) allow for the plug-in to decide if the check should be Ok, Warning, Critical, or UNKNOWN. One plug-in might check more than one thing, for example the dns plug-in checks to see if the server responds at all and also check to see if it responds with the correct address. I don't know how you could use that with the historical monitoring features of Zabbix. If you wanted to graph DNS server response time then you would have to have the plug-in return the response time. You could then of course setup a trigger to check if the response time was under a specific amount of time but what happens if the server doesn't reply at all?
        Indeed, I ran into a snag here as well. The nagios NTP check monitors both access to the NTP server and offset in relation to that server(among other things). To monitor both of these I was required to make to shell scripts for each.

        No repsonse would be marked as a '-' would it not? I'm unsure how this is/could_be analyzed within triggers. Is it analyzed as a check.last(0)=0?
        Last edited by jyoung; 10-04-2005, 04:59.

        Comment

        • jyoung
          Junior Member
          • Mar 2005
          • 13

          #19
          Originally posted by klavs
          Well - this patch:
          In regards to measuring http-responsetimes, it is actually rather common to do it from the server. Usually it is on the same LAN - so there's no network delay, and as such the response-time is an accurate measurement. Then ofcourse, one should check connectivity outwards, but that's another story.

          Both BigBrother, BigSister, Nagios. etc has checks (or items as they are called in Zabbix), which the server checks for directly, and not through a local agent.

          I can see no argument for Zabbix not having a patch, for adding checks to the zabbix_server.conf - like userparams are added to zabbix-agentd.conf. Pref. it could be the same code that agentd has for this, reused in the server.
          Okay, I believe I just did not see what you were wanting. You want the remote agent to ask the server to carry out the action. Thus the agent on host#2 asks ther server on host#1 to do an HTTP get. The server then stores this response time in the DB under a trigger owned by host#2. Do I have that right?

          That indeed would be very useful. My apologies for not understanding earlier.

          Comment

          • jyoung
            Junior Member
            • Mar 2005
            • 13

            #20
            Repeated alert

            As a Nagios user another "feature" I with was around was the repeated alerting after an aloted amount of time. I found reference to the action found here:

            but it seems like an ugly, however workable, hack to me. This would be excellent if it were built in. It was nice having Nagios re-alert me after 4 hours if the problem had not been dealt with yet.

            In the case of HTTPS Cert checking this re-alerted every 24 hours just as a constant reminder that the cert was about to expire and that I should have started the re-issuing process already.

            Do any other ex-Nagios/wanning-Nagios users miss this functionality as well?

            Comment

            • klavs
              Junior Member
              • Apr 2005
              • 18

              #21
              Originally posted by jyoung
              You want the remote agent to ask the server to carry out the action. Thus the agent on host#2 asks ther server on host#1 to do an HTTP get. The server then stores this response time in the DB under a trigger owned by host#2. Do I have that right?
              Almost. I want to be able to set userparams in the server - so I can add checks, which the server does not even try to retrieve from an agent - but executes itself - with the result owned by #2 as you say - but with #1 being the server - not an agent. Perhaps a dedicated "serveragent" - which is used for "remotechecks".
              Originally posted by jyoung
              That indeed would be very useful. My apologies for not understanding earlier.
              No apologies needed

              Comment

              • hrabbit
                Junior Member
                • Jul 2005
                • 1

                #22
                I realise this thread is rather old now and possibly outdated but recently after having a good nose around Zabbix I have decided to give it a go at the large and ugly task of monitoring our current network.

                Nagios handles the network at the moment.. One centralised server does everything in one hit. We use plugins for everything....

                HTTP
                DNS
                FTP
                SSH

                run from the central Nagios server and request access on the given services port that gives the status of the service the plugin is trying to access.

                This may seem like a problem from the perspective of time differences but in the real world, can a user trying to access a web page from one of your servers or get a DNS result from your server see the difference in speed and time variants? Of course they can but they see it "across" the network, not from localhost.

                We monitor 250 hosts and over 1500 services across these hosts. Configuring monitoring agents on all these machines is a very large headache.

                We use NRPE to allow collection of details on load, memory and disk space to be brought back to the central Nagios server.

                Nagios allows us to have thresholding of services as well.. (eg. We have a web server that may be getting clobbered so we see that it timed out once. Does this mean that the service is unavailable? of course not.. it has a problem however.. we set up some rules that say if after 5 minutes and 5 checks over this period that the server still has a problem, notify somebody about it.)

                We have plugins for notifications, allowing us to set up multiple sms gateways (eg. GSM modem locally, cheap international http gateway for low priority notifications). Jabber and ICQ to name a few.

                Nagios also allows for large scale templating while active. EG. I can specify by default that every host has to be pinged.. I have a rule set that says ping * and its all done.

                Hostgroups allow for notification of all hosts inside this group to get a page about a particular subset of hosts that have issues.

                We have on average about 60 plugins and of these about 45 are custom written and maintained. This number only includes the Nagios local plugins and does not include the many that are written to be run from NRPE itself.

                Saying all this, I have quite taken to Zabbix due to the MySQL backend and Web Interface for configuration (+ the added benefit of the Screens Feature + Map features) but in the grand scale.. without the use of mass plugins I can't see how to implement this without some fairly major headaches.

                I suppose the real reasoning of this post is that unless I can get Zabbix to handle the multitude of plugins I have written to handle the network at present (This include NRPE client -> server connectivity) I would have to pass on this project.

                The main feature that Nagios has that I cannot live without is;
                Tactical Overview of problem only hosts and services
                We have a projector (wallboard) that displays this information 24/7 and without something with as much simplicity I would simply go crazy.

                Anyway, I may be jumping the gun on some of these details and as such, please blatantly shoot me down in a pile of dust.
                I do love the Zabbix project and love the concept overall so please don't anybody take this as a flame war about Nagios vs Zabbix at all.
                Last edited by hrabbit; 29-07-2005, 09:10.

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #23
                  Thanks for the post. I appreciate it.

                  I have a couple of comments though:

                  1. A Tactical Overview Screen will be introduced 1.1. I already have design, it just has to be coded.

                  2. You're saying that configuration of ZABBIX agents is difficult for large number of hosts. I've never used Nagios, just curious, does Nagios require configuration and setup of the plugins on monitored servers? How it works?

                  3. In ZABBIX you may define that an event will be triggerred in case if a WEB server is unavailable for, say, 5 minutes. Use trigger expression {host:http.max(300)}=0.

                  4. ZABBIX does provide interface to SMS, pager, Windows messaging, whatever. Just write your own shell or Perl script, configure new media, and the script will be used for notifications. Easy!
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • Jon
                    Junior Member
                    • Aug 2005
                    • 1

                    #24
                    I followed this thread through because I also have a need to use an external script to do extended monitoring of remote services, e.g. to check that a HTTP response contains a certain string.

                    I came up with this very small patch (attached) to allow simple checks to be defined in the web GUI using the syntax e.g. ext[/usr/bin/myscript], which runs myscript (in zabbix_server) and uses the floating point result on stdout.

                    I'm new to zabbix and don't know the code at all and I've done limited testing on this patch, so USE WITH CAUTION. I just thought it might be useful to put the patch out there so that those who know the code can comment on the wisdom or otherwise of what I've done.

                    (In addition to ext[/usr/bin/myscript] I threw in ext_str[...] for string values but a quick test suggests the latter is not working).
                    Attached Files

                    Comment

                    • primos
                      Member
                      • Jul 2005
                      • 61

                      #25
                      Well known topic, by now I hope resolved by adding external checks to beta1(haven't seen beta yet).

                      Comment

                      Working...