ODT Export
 

A Complete recipe for monitoring DNS and NTP on your Network

The article assumes that you are using a single *nix Zabbix Server to monitor distributed DNS and/or NTP services running on or accessible from your network. Since I have not figured out a way to run external scripts tied to Items and Triggers from the server, I use zabbix_agentd to run these external checks. The scripts that execute the checks are PHP scripts based on the HOST (for DNS), and NTPQ (for NTP) commands.

The scripts

You may place these scripts wherever it is easiest for you. I placed them in /var/local/www/data/zabbix/scripts.

dnschk.php

  #!/usr/local/bin/php
  <?php
    // Define defaults
    $result=0;
    if($_SERVER["argv"][1])
    {
        $ns_server = $_SERVER["argv"][1];
    } else {
        echo "You need to supply a DNS server to check. Quitting.\n";
        exit;
    }
 
    $hosts = array("helpdesk",
                "ns1.nmsu.edu");
 
    // Do query
    foreach($hosts as $host)
    {
        if(shell_exec("host ".$host." ".$ns_server." | grep 'has address' | wc -l")==0)
 
        {
            $result= $result+0; // success
        } else {
            $result= $result+1; // failure
        }
    }
    if($result > 0)
    {
        $result=1;
    } else {
        $result=0;
    }
    echo $result;
 
  ?>

dnschk.sh

Same script, but in bash.

#!/bin/bash
timeout=2
host="/usr/bin/host"
if test -z "$1" ; then
    echo "You need to supply a DNS server to check. Quitting"
    exit;
fi
SERVER=$1
 
if test -n "$2" ; then
    Q=$2
else
    Q="yandex.ru"
fi
 
ERC=`$host -s -W $timeout $Q $SERVER > /dev/null 2>&1; echo $?`
if [ $ERC -eq 0 ] ; then
    echo 1
else
    echo 0
fi

(Two or more lookups may be used to test for various DNS lookup scenarios, e.g. referrals, reverse lookups.)

ntpchk.php

  #!/usr/local/bin/php
  <?php
    // Define defaults
    $result=0;
    if($_SERVER["argv"][1])
    {
        $ntp_server = $_SERVER["argv"][1];
    } else {
        echo "You need to supply an NTP server to check. Quitting.\n";
        exit;
    }
 
    // Do query
    if(shell_exec("ntpq -pn ".$ntp_server." | grep -E -c '^\*'")==1)
    {
 
        $result= 1; // success
 
    } else {
 
       $result= 0; // failure
 
    }
 
    echo $result;
 
  ?>

Be sure and set your scripts to the proper owner (e.g. www) and use chmod +x to make them executable.

These two lookups may take many seconds to complete and return a value. While they generally respond in less than a second for a successful query, a timeout response may take more than 6 seconds. Before we complete the setup we will extend the timeout period for both zabbix_server and zabbix_agentd so that we will always get some sort of response under normal circumstances.

Note: In my initial testing with these scripts, a timeout response would fail to return any value. The result of this is that the Trigger would not trip, as there were no new samples to evaluate. The timeout message appeared in the /var/log/zabbix_agentd.log file. Increasing the agent timeout resolved this problem.

Zabbix_agentd configuration

Edit /etc/zabbix/zabbix_agentd.conf. Set Timeout=20.

Add the following DNS tests, assuming your path to the scripts:

UserParameter=DNSbr1,php /usr/local/www/data/zabbix/scripts/dnschk.php 192.168.1.10
UserParameter=DNSbr6,php /usr/local/www/data/zabbix/scripts/dnschk.php 192.168.6.10
UserParameter=DNSbr4,php /usr/local/www/data/zabbix/scripts/dnschk.php 192.168.4.10

And for NTP:

UserParameter=NTPs1,php /usr/local/www/data/zabbix/scripts/ntpchk.php 192.168.1.68

You may have as many such tests as you want. Just keep track of the names for when you set up your Triggers in Zabbix.

In order to load your new agent configuration, use ps aux to find the PID of your zabbix_agentd: main process and kill it. Then start the agent again:

>cd /usr/local/bin
>./zabbix_agentd

(If you need to troubleshoot the agent process for any reason, you should take care to set the log path, owner and permissions to write to the /var/log/zabbix_agent.log.)

Assuming that you are about ready to set up triggers, you must now change the default timeout for Zabbix_server. It was set to 3 (seconds) here, and so when lookups failed I was getting nothing (instead of 0) in my triggers. Edit /etc/zabbix/zabbix_server.conf to set timeout=20.

Kill zabbix_server (sleeping…|main…) and then use ./zabbix_server to start it again with your new values.

Set up the Triggers

The Host

Time to switch to the Zabbix web interface. Login as an administrator, then go to Configuration → Hosts and create a host. I suggest the name “ExternalTests”. I typed in a new group “External” set Use IP address and typed in localhost. Port 10050 (or your configured port for the agent).

Items

Having created a host entry for your client, from the Configuration, Hosts screen, click the Items link next to your new host. Click Create Item and give your test a name (e.g. DNS Branch 1). Select the type as Zabbix Agent. This is where you use the Name, or Key, that you configured in your agent for your tests. The first item in your UserParameter= statement is the “Key” used to query your agent. Type DNSbr1 (my example from above) as your key. Set the type to Numeric (Integer), and set your preferences for the other parameters.

Your Trigger(s) will be based on values recorded by this Item.

Triggers

Go to (Configuration) Triggers and Create Trigger. Give your new Trigger a name like BR1 DNS Server and an expression like

{ExternalTests:DNSbr1.last(0)}=0

I set the Severity to Warning.

Repeat this for as many checks as you set up in the agentd configuration.

Actions

Now to set up an Action to warn you in case one of your services goes down. Go to (Configuration) Actions and Create Action. Select the Action type you want (I use Send message, and the media is a cell phone SMS service), Source must be a Trigger, I set two Conditions:

Host group = External
Trigger name like "DNS Server"

Set your other options as desired. Since I am sending an email or SMS message I set the subject to ”{TRIGGER.NAME} Problem”, and the message to ”{TRIGGER.NAME} may be down as of {DATE} at {TIME}”.

Once created, this actions will trip when any of your monitored services return a “0”.

You may wish to make your triggers a bit smarter or a bit less sensitive depending on your environment or the load on the servers. E.g. a trigger of:

{ExternalTests:DNSbr1.sum(#3)}=>2

will trip after two out of the last three tests failed.

{ExternalTests:DNSbr1.sum(120)}<>0

will trip if any test in the last 120 seconds failed. I think that last one will only trip every 120 seconds in case of an on-going failure.

 
howto/monitor/services/monitor_dns_and_ntp_services_on_your_network.txt · Last modified: 2011/10/13 00:31 by joe630
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki