Ad Widget

**Tonyb** · 09-04-2005, 22:21

Originally posted by jyoung

All you're really saying is Zabbix needs a few plug-in packages like Nagios and it won't be ugly anymore.

That’s not what we are saying at all. It is ugly because if we want to run a script on the monitoring server to monitor remote host they checks are listed as items on the monitoring server and not the host that you are checking. The items for the monitoring server will quickly grow into the hundreds and become difficult to manage.

It would be much cleaner if there were a way in Zabbix to define external commands. For example you could define an external command at the server
like:
ServerCommand=check_dns[*],/usr/local/nagios/libexec/check_dns -H $1 -s $HOSTNAME -a $2

This way you can attach the item to the actual host that it is checking and you don't have to write a shell script for each external plug-in.

This does pose one other problem though. Nagios plug-ins (for example) allow for the plug-in to decide if the check should be Ok, Warning, Critical, or UNKNOWN. One plug-in might check more than one thing, for example the dns plug-in checks to see if the server responds at all and also check to see if it responds with the correct address. I don't know how you could use that with the historical monitoring features of Zabbix. If you wanted to graph DNS server response time then you would have to have the plug-in return the response time. You could then of course setup a trigger to check if the response time was under a specific amount of time but what happens if the server doesn't reply at all?

**charles** · 10-04-2005, 04:17

I see your guys point, and it would be a good addition to Zabbix imo

**jyoung** · 10-04-2005, 04:40

Originally posted by Tonyb

That’s not what we are saying at all. It is ugly because if we want to run a script on the monitoring server to monitor remote host they checks are listed as items on the monitoring server and not the host that you are checking. The items for the monitoring server will quickly grow into the hundreds and become difficult to manage.

Okay, I understand now. I deal with a relatively smaller cluster, so the grouping has not exceeded 100 checks on the main monitoring server. For a cluster of machines less than 30-35 I can see having all checks on the monitoring server being easier to mintor, although when the number of servers is greater than that the page would get rather ugly and hard to manage.

Originally posted by Tonyb

It would be much cleaner if there were a way in Zabbix to define external commands. For example you could define an external command at the server
like:
ServerCommand=check_dns[*],/usr/local/nagios/libexec/check_dns -H $1 -s $HOSTNAME -a $2

This way you can attach the item to the actual host that it is checking and you don't have to write a shell script for each external plug-in.

Agreed. I wrote the shell scripts as a hack for something that has net yet been implemented.

We’ll be back soon!

http://www.zabbix.com/forum/showthread.php?t=419&highlight=UserParameter

Post #4, it looks like Alexei has plans for this implementation lets hope he's able to get it in the final 1.1 release.

Originally posted by Tonyb

This does pose one other problem though. Nagios plug-ins (for example) allow for the plug-in to decide if the check should be Ok, Warning, Critical, or UNKNOWN. One plug-in might check more than one thing, for example the dns plug-in checks to see if the server responds at all and also check to see if it responds with the correct address. I don't know how you could use that with the historical monitoring features of Zabbix. If you wanted to graph DNS server response time then you would have to have the plug-in return the response time. You could then of course setup a trigger to check if the response time was under a specific amount of time but what happens if the server doesn't reply at all?

Indeed, I ran into a snag here as well. The nagios NTP check monitors both access to the NTP server and offset in relation to that server(among other things). To monitor both of these I was required to make to shell scripts for each.

No repsonse would be marked as a '-' would it not? I'm unsure how this is/could_be analyzed within triggers. Is it analyzed as a check.last(0)=0?

**jyoung** · 10-04-2005, 05:03

Originally posted by klavs

Well - this patch:
In regards to measuring http-responsetimes, it is actually rather common to do it from the server. Usually it is on the same LAN - so there's no network delay, and as such the response-time is an accurate measurement. Then ofcourse, one should check connectivity outwards, but that's another story.

Both BigBrother, BigSister, Nagios. etc has checks (or items as they are called in Zabbix), which the server checks for directly, and not through a local agent.

I can see no argument for Zabbix not having a patch, for adding checks to the zabbix_server.conf - like userparams are added to zabbix-agentd.conf. Pref. it could be the same code that agentd has for this, reused in the server.

Okay, I believe I just did not see what you were wanting. You want the remote agent to ask the server to carry out the action. Thus the agent on host#2 asks ther server on host#1 to do an HTTP get. The server then stores this response time in the DB under a trigger owned by host#2. Do I have that right?

That indeed would be very useful. My apologies for not understanding earlier.

**jyoung** · 10-04-2005, 05:08

Repeated alert

As a Nagios user another "feature" I with was around was the repeated alerting after an aloted amount of time. I found reference to the action found here:

We’ll be back soon!

http://www.zabbix.com/forum/showthread.php?t=309&highlight=cron

but it seems like an ugly, however workable, hack to me. This would be excellent if it were built in. It was nice having Nagios re-alert me after 4 hours if the problem had not been dealt with yet.

In the case of HTTPS Cert checking this re-alerted every 24 hours just as a constant reminder that the cert was about to expire and that I should have started the re-issuing process already.

Do any other ex-Nagios/wanning-Nagios users miss this functionality as well?

**klavs** · 10-04-2005, 11:48

Originally posted by jyoung

You want the remote agent to ask the server to carry out the action. Thus the agent on host#2 asks ther server on host#1 to do an HTTP get. The server then stores this response time in the DB under a trigger owned by host#2. Do I have that right?

Almost. I want to be able to set userparams in the server - so I can add checks, which the server does not even try to retrieve from an agent - but executes itself - with the result owned by #2 as you say - but with #1 being the server - not an agent. Perhaps a dedicated "serveragent" - which is used for "remotechecks".

Originally posted by jyoung

That indeed would be very useful. My apologies for not understanding earlier.

No apologies needed

**hrabbit** · 29-07-2005, 09:07

I realise this thread is rather old now and possibly outdated but recently after having a good nose around Zabbix I have decided to give it a go at the large and ugly task of monitoring our current network.

Nagios handles the network at the moment.. One centralised server does everything in one hit. We use plugins for everything....

HTTP
DNS
FTP
SSH

run from the central Nagios server and request access on the given services port that gives the status of the service the plugin is trying to access.

This may seem like a problem from the perspective of time differences but in the real world, can a user trying to access a web page from one of your servers or get a DNS result from your server see the difference in speed and time variants? Of course they can but they see it "across" the network, not from localhost.

We monitor 250 hosts and over 1500 services across these hosts. Configuring monitoring agents on all these machines is a very large headache.

We use NRPE to allow collection of details on load, memory and disk space to be brought back to the central Nagios server.

Nagios allows us to have thresholding of services as well.. (eg. We have a web server that may be getting clobbered so we see that it timed out once. Does this mean that the service is unavailable? of course not.. it has a problem however.. we set up some rules that say if after 5 minutes and 5 checks over this period that the server still has a problem, notify somebody about it.)

We have plugins for notifications, allowing us to set up multiple sms gateways (eg. GSM modem locally, cheap international http gateway for low priority notifications). Jabber and ICQ to name a few.

Nagios also allows for large scale templating while active. EG. I can specify by default that every host has to be pinged.. I have a rule set that says ping * and its all done.

Hostgroups allow for notification of all hosts inside this group to get a page about a particular subset of hosts that have issues.

We have on average about 60 plugins and of these about 45 are custom written and maintained. This number only includes the Nagios local plugins and does not include the many that are written to be run from NRPE itself.

Saying all this, I have quite taken to Zabbix due to the MySQL backend and Web Interface for configuration (+ the added benefit of the Screens Feature + Map features) but in the grand scale.. without the use of mass plugins I can't see how to implement this without some fairly major headaches.

I suppose the real reasoning of this post is that unless I can get Zabbix to handle the multitude of plugins I have written to handle the network at present (This include NRPE client -> server connectivity) I would have to pass on this project.

The main feature that Nagios has that I cannot live without is;
Tactical Overview of problem only hosts and services
We have a projector (wallboard) that displays this information 24/7 and without something with as much simplicity I would simply go crazy.

Anyway, I may be jumping the gun on some of these details and as such, please blatantly shoot me down in a pile of dust.
I do love the Zabbix project and love the concept overall so please don't anybody take this as a flame war about Nagios vs Zabbix at all.

**Alexei** · 20-08-2005, 09:44

Thanks for the post. I appreciate it.

I have a couple of comments though:

1. A Tactical Overview Screen will be introduced 1.1. I already have design, it just has to be coded.

2. You're saying that configuration of ZABBIX agents is difficult for large number of hosts. I've never used Nagios, just curious, does Nagios require configuration and setup of the plugins on monitored servers? How it works?

3. In ZABBIX you may define that an event will be triggerred in case if a WEB server is unavailable for, say, 5 minutes. Use trigger expression {host:http.max(300)}=0.

4. ZABBIX does provide interface to SMS, pager, Windows messaging, whatever. Just write your own shell or Perl script, configure new media, and the script will be used for notifications. Easy!

**Jon** · 21-08-2005, 15:31

I followed this thread through because I also have a need to use an external script to do extended monitoring of remote services, e.g. to check that a HTTP response contains a certain string.

I came up with this very small patch (attached) to allow simple checks to be defined in the web GUI using the syntax e.g. ext[/usr/bin/myscript], which runs myscript (in zabbix_server) and uses the floating point result on stdout.

I'm new to zabbix and don't know the code at all and I've done limited testing on this patch, so USE WITH CAUTION. I just thought it might be useful to put the patch out there so that those who know the code can comment on the wisdom or otherwise of what I've done.

(In addition to ext[/usr/bin/myscript] I threw in ext_str[...] for string values but a quick test suggests the latter is not working).

Attached Files

patch.txt (1.3 KB, 518 views)

**primos** · 21-08-2005, 15:39

Well known topic, by now I hope resolved by adding external checks to beta1(haven't seen beta yet).

Ad Widget

Thoughts from a Nagios user

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment