Okay, I was reading the "wish list" (zabbix 1.6 discussion) and someone wanted something like what this script does. I run it as a daemon and it has been in production for about a year. It works well for what I have intended it to do.
It does
This script pings network interfaces using fping and sends the data to zabbix items using sender. It currently runs traceroute if it finds packet loss. Once it finds packetloss, it tells zabbix there is packetloss for 5 minutes. This is so that other triggers can be linked to the item and "chill out" for 5 minutes until the network stabilizes. It also tells zabbix the fping average latency, which if you compare the graphs generated by this and zabbix icmpping, will be noticeably different. (Not better, not worse, just different). Finally, it runs traceroute when packetloss is found and saves the output to /tmp using the timestamp.
History:
This script was actually originally designed to work around bugs in zabbix so I didn't get as many false positives. As of 1.4.4, most of the designed work-arounds are resolved, but I know of at least one open issue.
Improvements:
I imagine a better implementation would be to check the database for the items to "ping" and get the IP addresses from the mysql database, but I never took it that far. You have to specify the zabbix defined host names in a txt file along with their IPs.
Script:
It does
This script pings network interfaces using fping and sends the data to zabbix items using sender. It currently runs traceroute if it finds packet loss. Once it finds packetloss, it tells zabbix there is packetloss for 5 minutes. This is so that other triggers can be linked to the item and "chill out" for 5 minutes until the network stabilizes. It also tells zabbix the fping average latency, which if you compare the graphs generated by this and zabbix icmpping, will be noticeably different. (Not better, not worse, just different). Finally, it runs traceroute when packetloss is found and saves the output to /tmp using the timestamp.
History:
This script was actually originally designed to work around bugs in zabbix so I didn't get as many false positives. As of 1.4.4, most of the designed work-arounds are resolved, but I know of at least one open issue.
Improvements:
I imagine a better implementation would be to check the database for the items to "ping" and get the IP addresses from the mysql database, but I never took it that far. You have to specify the zabbix defined host names in a txt file along with their IPs.
Script:
Code:
#!/usr/bin/perl
#
# Designed to be pinger v2 with zabbix_sender
#
use strict;
use POSIX qw(setsid);
# daemonize the program
&daemonize;
my $nodes = "";
my $cmd = "/usr/bin/fping -B1 -q -c 10";
my %hostLossEndTime;
my @cfgParse;
my %dns_table;
open (CFG_FILE, "/usr/local/bin/zabbix-pinger2.dat") or die("unable to read cfg");
my @cfg = <CFG_FILE>;
close (CFG_FILE);
# generate list of nodes from cfg file
# also generate dns table
foreach (@cfg) {
@cfgParse = split(/\t+|\s+/);
$nodes .= " " . "$cfgParse[1]";
#$nodes .= " " . chomp("$cfgParse[1]");
$dns_table{"$cfgParse[1]"} = $cfgParse[0];
}
&debug("nodes: $nodes");
my $hostname;
my $latency;
my $packetloss;
while(1) {
#$FPING_CMD -B1 -q -c 1 $NODES
#open (IN, "$CMD @ARGV 2>&1 |") ||
open (IN, "$cmd $nodes 2>&1 |") ||
die "could not open pipe to fping: $!\n";
# clear array for file write
#@output = ();
while (<IN>) {
chomp;
sleep 0.5; # don't overload zabbix_server with zabbix_sender
# Fping returns two different strings depending on if there is 100% packet loss or not
# hostname and packet loss % and avg latency is what we're after
($hostname, $packetloss) = $_ =~ /^(\S+).*\/([0-9]+)%/;
if ("$packetloss" == "100") {
debug("100 % packet loss");
} else {
($latency) = $' =~ /\/([0-9\.]+)\//;
debug("latency: $latency");
system("zabbix_sender -z zabbix.myco.com -s $dns_table{$hostname} -k latency -o $latency");
}
if ($packetloss > 10) { # if packet loss > 10 %
system("zabbix_sender -z zabbix.myco.com -s $dns_table{$hostname} -k packetloss -o 1");
# packets are lost, host marked as down for the next 5 minutes
$hostLossEndTime{$hostname} = time() + (60 * 5);
debug("$hostname has packet loss");
if (`ps aux|grep traceroute|wc -l` < 5) {
system("traceroute -I $hostname &> /tmp/$hostname:trace:" . time() . "&");
#or print STDERR "couldn't exec traceroute: $!";
}
} elsif($hostname) {
if($hostLossEndTime{$hostname} && time() < $hostLossEndTime{$hostname}) {
# still make the host down
debug("$hostname: no packet loss, but still mark as down");
system("zabbix_sender -z zabbix.myco.com -s $dns_table{$hostname} -k packetloss -o 1");
} else {
$hostLossEndTime{$hostname} = "";
system("zabbix_sender -z zabbix.myco.com -s $dns_table{$hostname} -k packetloss -o 0");
debug("$hostname: no packet loss and latency of $latency");
}
}
}
&debug("sleeping ...");
# Generate random # from 61-70
sleep int(rand(10)) + 60;
}
sub debug {
my $msg = shift @_;
print "$msg\n";
}
sub daemonize {
chdir '/' or die "Can't chdir to /: $!";
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>>/dev/null' or die "Can't write to /dev/null: $!";
defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
setsid or die "Can't start a new session: $!";
umask 0;
}

Comment