THIS VERSION SUPERCEEDED - SEE PHP VERSION BELOW
I plan to put this on the Wiki. I just wanted comments and suggestions prior to doing so.
Complete recipe for monitoring DNS and NTP on your Network
The article assumes that you are using a single *nix Zabbix Server to monitor distributed DNS and/or NTP services running on or accessible from your network. Since I have not figured out a way to run external scripts tied to Items and Triggers from the server, I use zabbix_agentd to run these external checks. The scripts that execute the checks are BASH scripts based on the HOST (for DNS), and NTPQ (for NTP) commands.
The scripts
You may place these scripts wherever it is easiest for you. I placed them in /var/local/www/data/zabbix/scripts.
Use the command which bash to determine the path to your bash environment. In my example the path is /usr/local/bin/bash. Your system may be different.
DNS
NTP
Be sure and set your scripts to the proper owner and use chmod +x to make them executable.
These two lookups may take many seconds to complete and return a value. While they generally respond in less thana second for a successful query, a timeout response may take more than 15 seconds. Before we complete the setup we will extend the timeout period for both zabbix_server and zabbix_agentd so that we will always get some sort of response under normal circumstances.
Zabbix_agentd configuration
Edit /etc/zabbix/zabbix_agentd.conf. Set Timeout=30.
Add the following, assuming your path to the scripts:
Note: "helpdesk" is a defined hostname in DNS on my network.
And for NTP:
You may have as many such tests as you want. Just keep track of the names for when you set up your Triggers in Zabbix.
In order to load your new agent configuration, use ps aux to find the PID of your zabbix_agentd: main process and kill it. Then start the agent again:
(If you need to troubleshoot the agent process for any reason, you should take care to set the log path, owner and permissions to write to the /var/log/zabbix_agent.log.)
Assuming that you are about ready to set up triggers, you must now change the default timeout for Zabbix_server. It was set to 3 (seconds) here, and so when lookups failed I was getting nothing (instead of 0) in my triggers. Edit /etc/zabbix/zabbix_server.conf to set timeout=30.
Kill zabbix_server (sleeping...) and then use ./zabbix_server to start it again with your new values.
Set up the Triggers
The Host
Time to switch to the Zabbix web interface. Login as an administrator, then go to Configuration -> Hosts and create a host. I suggest the name "ExternalTests". I typed in a new group "External" set Use IP address and typed in localhost. Port 10050 (or your configured port).
Triggers
Go to (Configuration) Triggers and Create Trigger. Give your new Trigger a name like BR1 DNS Server and an expression like
I set the Severity to Warning.
Repeat this for as many checks as you set up in the agentd configuration.
Actions
Now to set up an Action to warn you in case one of your services goes down. Go to (Configuration) Actions and Create Action. Select the Action type you want (I use Send message, and the media is a cell phone SMS service), Source must be a Trigger, I set two Conditions:
Host group = External
Trigger name like DNS Server
Set your other options as desired. Since I am sending an email or SMS message I set the subject to {TRIGGER.NAME} Problem, and the message to {TRIGGER.NAME} may be down as of {DATE}-{TIME}.
Once created, this actions will trip when any of your monitored services return a "0".
You may wish to make your triggers a bit smarter or a bit less sensitive depending on your environment or the load on the servers. E.g. a trigger of:
will trip after two out of the last three tests failed.
will trip if any test in the last 120 seconds failed. I think that last one will only trip every 120 seconds in case of an on-going failure.
I plan to put this on the Wiki. I just wanted comments and suggestions prior to doing so.
Complete recipe for monitoring DNS and NTP on your Network
The article assumes that you are using a single *nix Zabbix Server to monitor distributed DNS and/or NTP services running on or accessible from your network. Since I have not figured out a way to run external scripts tied to Items and Triggers from the server, I use zabbix_agentd to run these external checks. The scripts that execute the checks are BASH scripts based on the HOST (for DNS), and NTPQ (for NTP) commands.
The scripts
You may place these scripts wherever it is easiest for you. I placed them in /var/local/www/data/zabbix/scripts.
Use the command which bash to determine the path to your bash environment. In my example the path is /usr/local/bin/bash. Your system may be different.
DNS
Code:
#!/usr/local/bin/bash #dnslookup #DNS lookup scripts for Zabbix monitor. Conditional return # of 1=success | 0=failed DNS_SERVER=$1 HOST_QUERY=$2 if [`host $HOST_QUERY $DNS_SERVER | grep "has address" | wc -l` -eq 0 ]; then #lookup failed, bad DNS lookup echo "0" else echo "1" fi
Code:
#!/usr/local/bin/bash #ntptest #NTP test scripts for Zabbix monitor. Conditional return # of 1=success | 0= for failed response HOST_QUERY=$1 if [`ntpq -pn $HOST_QUERY | grep -E -c '^\*'` -eq 1 ]; then #Sync responded, OK echo "1" else echo "0" fi
These two lookups may take many seconds to complete and return a value. While they generally respond in less thana second for a successful query, a timeout response may take more than 15 seconds. Before we complete the setup we will extend the timeout period for both zabbix_server and zabbix_agentd so that we will always get some sort of response under normal circumstances.
Note: In my initial testing with these scripts, a timeout response would fail to return any value. The result of this is that the Trigger would not trip, as there were no new samples to evaluate. The timeout message appeared in the /var/log/zabbix_agentd.log file. Increasing the agent timeout resolved this problem.
Edit /etc/zabbix/zabbix_agentd.conf. Set Timeout=30.
Add the following, assuming your path to the scripts:
Code:
UserParameter=DNSbr1,/usr/local/www/data/zabbix/scripts/dnslookup 192.168.1.10 helpdesk UserParameter=DNSbr6,/usr/local/www/data/zabbix/scripts/dnslookup 192.168.6.10 helpdesk UserParameter=DNSbr4,/usr/local/www/data/zabbix/scripts/dnslookup 192.168.4.10 helpdesk UserParameter=DNSbr10,/usr/local/www/data/zabbix/scripts/dnslookup 192.168.10.10 helpdesk UserParameter=DNSbr2,/usr/local/www/data/zabbix/scripts/dnslookup 192.168.2.10 helpdesk
And for NTP:
Code:
UserParameter=NTPs1,/usr/local/www/data/zabbix/scripts/ntptest 192.168.1.68
In order to load your new agent configuration, use ps aux to find the PID of your zabbix_agentd: main process and kill it. Then start the agent again:
Code:
>cd /usr/local/bin >./zabbix_agentd
Assuming that you are about ready to set up triggers, you must now change the default timeout for Zabbix_server. It was set to 3 (seconds) here, and so when lookups failed I was getting nothing (instead of 0) in my triggers. Edit /etc/zabbix/zabbix_server.conf to set timeout=30.
Kill zabbix_server (sleeping...) and then use ./zabbix_server to start it again with your new values.
Set up the Triggers
The Host
Time to switch to the Zabbix web interface. Login as an administrator, then go to Configuration -> Hosts and create a host. I suggest the name "ExternalTests". I typed in a new group "External" set Use IP address and typed in localhost. Port 10050 (or your configured port).
Triggers
Go to (Configuration) Triggers and Create Trigger. Give your new Trigger a name like BR1 DNS Server and an expression like
Code:
{ExternalTests:DNSbr1.last(0)}=0
Repeat this for as many checks as you set up in the agentd configuration.
Actions
Now to set up an Action to warn you in case one of your services goes down. Go to (Configuration) Actions and Create Action. Select the Action type you want (I use Send message, and the media is a cell phone SMS service), Source must be a Trigger, I set two Conditions:
Host group = External
Trigger name like DNS Server
Set your other options as desired. Since I am sending an email or SMS message I set the subject to {TRIGGER.NAME} Problem, and the message to {TRIGGER.NAME} may be down as of {DATE}-{TIME}.
Once created, this actions will trip when any of your monitored services return a "0".
You may wish to make your triggers a bit smarter or a bit less sensitive depending on your environment or the load on the servers. E.g. a trigger of:
Code:
{ExternalTests:DNSbr1.sum(#3)}=>2
Code:
{ExternalTests:DNSbr1.sum(120)}<>0



Comment