Hi all!
We've decided to make the switch to Zabbix and I'm currently looking at recreating everything we have running in Nagios. I've spent the last few
weeks looking at different parts of Zabbix and I really like that I see in most
areas, except one. The "checks".
I've googled like a mad man for information regarding how to create checks "the Zabbix way" and I've come to the conclusion that I'm either missing something really important (very likely, hence this post) or that we're gonna have huge difficulties in recreating all our current checks.
An example:
I've got this Nagios-check that connects to our VMware Virtual Center-machines and looks at every datastore, determining (using warning and critical thresholds) whether we need to alert the techies or not. This check is run every X minutes against each of our VC-servers, indifferent of how many datastores exist on each VC.
Now, AFAIKT, the Zabbix way consists of creating some kind of "magic" autodiscovery of datastores on each VC, which creates the appropriate items/triggers/whatnot for _each_ enumerated datastore.
So, assuming I have a script on the Zabbix-server which takes hostname/IP, credentials and datastore as arguments, it ought to run this check for each datastore, on each VC. Is this correct? If so, I'ts a terrible, terrible solution which must perform catastrophically performance-wise and I'm sincerely hoping there's a smarter solution which you might help me out with!
The other thing which I'm missing from Nagios big-time is the fact that a check can return both a message and a return-code. Is there any way to do the same in Zabbix? I'll show you an example:
Say that I want to check whether a host is NTP-synchronized or not. Issuing a remote script which in turn runs for example "ntpdate -q server" and extracting the offset is easy enough, that can be used as an item. But say for example that something unexpected happens, maybe the NTP-server cannot be reached for the time being. ntpdate will return offset 0.00000, so potentially this could be interpreted as OK. In Nagios, I would of course check the exitcode from ntpdate and use that to compose the correct check-returncode and message, but how would I correctly recreate this check in Zabbix? I would like the trigger to alert us if the time difference is too big or if the NTP-server is unreachable. On really ugly way would be if the script returns a predetermined bogus-item-data if ntpdate fails (say -666 or anything else less likely to happen IRL) and use that in the trigger somehow.
Cheers!
-- Andy
We've decided to make the switch to Zabbix and I'm currently looking at recreating everything we have running in Nagios. I've spent the last few
weeks looking at different parts of Zabbix and I really like that I see in most
areas, except one. The "checks".
I've googled like a mad man for information regarding how to create checks "the Zabbix way" and I've come to the conclusion that I'm either missing something really important (very likely, hence this post) or that we're gonna have huge difficulties in recreating all our current checks.
An example:
I've got this Nagios-check that connects to our VMware Virtual Center-machines and looks at every datastore, determining (using warning and critical thresholds) whether we need to alert the techies or not. This check is run every X minutes against each of our VC-servers, indifferent of how many datastores exist on each VC.
Now, AFAIKT, the Zabbix way consists of creating some kind of "magic" autodiscovery of datastores on each VC, which creates the appropriate items/triggers/whatnot for _each_ enumerated datastore.
So, assuming I have a script on the Zabbix-server which takes hostname/IP, credentials and datastore as arguments, it ought to run this check for each datastore, on each VC. Is this correct? If so, I'ts a terrible, terrible solution which must perform catastrophically performance-wise and I'm sincerely hoping there's a smarter solution which you might help me out with!
The other thing which I'm missing from Nagios big-time is the fact that a check can return both a message and a return-code. Is there any way to do the same in Zabbix? I'll show you an example:
Say that I want to check whether a host is NTP-synchronized or not. Issuing a remote script which in turn runs for example "ntpdate -q server" and extracting the offset is easy enough, that can be used as an item. But say for example that something unexpected happens, maybe the NTP-server cannot be reached for the time being. ntpdate will return offset 0.00000, so potentially this could be interpreted as OK. In Nagios, I would of course check the exitcode from ntpdate and use that to compose the correct check-returncode and message, but how would I correctly recreate this check in Zabbix? I would like the trigger to alert us if the time difference is too big or if the NTP-server is unreachable. On really ugly way would be if the script returns a predetermined bogus-item-data if ntpdate fails (say -666 or anything else less likely to happen IRL) and use that in the trigger somehow.
Cheers!
-- Andy

Comment