Dear all,
Our Zabbix server stopped working sometime early this morning & we have been trying to get it up & running all day, but have had no joy. It will make on-call rather problematic because we won't know when systems are down.
The configuration files have not been changed since the 28th of September and the server has been bounced after that date, but afore this morning. Just to rule out oddities, we have rebooted the server.
Linux t2nl-mgt004 2.6.25.20-0.5-default #1 SMP 2009-08-14 01:48:11 +0200 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/*relea*
openSUSE 11.0 (X86-64)
Below is the output of the server.log following the startup, and subsequent crash of the server, and the server and agent configuration files.
I would be very grateful if someone might tell me what to investigate so that we can get it back.
Yours faithfully, P,
########################
###### server.log ##### #####
########################
########################
###### zabbix_server.conf #####
########################
########################
###### zabbix_agentd.conf #####
########################
Our Zabbix server stopped working sometime early this morning & we have been trying to get it up & running all day, but have had no joy. It will make on-call rather problematic because we won't know when systems are down.
The configuration files have not been changed since the 28th of September and the server has been bounced after that date, but afore this morning. Just to rule out oddities, we have rebooted the server.
Linux t2nl-mgt004 2.6.25.20-0.5-default #1 SMP 2009-08-14 01:48:11 +0200 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/*relea*
openSUSE 11.0 (X86-64)
Below is the output of the server.log following the startup, and subsequent crash of the server, and the server and agent configuration files.
I would be very grateful if someone might tell me what to investigate so that we can get it back.
Yours faithfully, P,
########################
###### server.log ##### #####
########################
Code:
9605:20091012:140915 Starting zabbix_server. ZABBIX 1.6.3. 9605:20091012:140915 **** Enabled features **** 9605:20091012:140915 SNMP monitoring: YES 9605:20091012:140915 WEB monitoring: NO 9605:20091012:140915 Jabber notifications: NO 9605:20091012:140915 ODBC: NO 9605:20091012:140915 IPv6 support: NO 9605:20091012:140915 ************************** 9605:20091012:140920 ZABBIX semaphores already exist, trying to recreate. 9605:20091012:140920 ZABBIX semaphores already exist, trying to recreate. 9605:20091012:140920 ZABBIX semaphores already exist, trying to recreate. 9622:20091012:140920 server #17 started [Poller. SNMP:YES] 9618:20091012:140920 server #13 started [Poller. SNMP:YES] 9606:20091012:140920 server #1 started [Poller. SNMP:YES] 9608:20091012:140920 server #3 started [Poller. SNMP:YES] 9609:20091012:140920 server #4 started [Poller. SNMP:YES] 9623:20091012:140920 server #18 started [Poller. SNMP:YES] 9616:20091012:140920 server #11 started [Poller. SNMP:YES] 9617:20091012:140920 server #12 started [Poller. SNMP:YES] 9607:20091012:140920 server #2 started [Poller. SNMP:YES] 9611:20091012:140920 server #6 started [Poller. SNMP:YES] 9610:20091012:140920 server #5 started [Poller. SNMP:YES] 9621:20091012:140920 server #16 started [Poller. SNMP:YES] 9619:20091012:140920 server #14 started [Poller. SNMP:YES] 9615:20091012:140920 server #10 started [Poller. SNMP:YES] 9635:20091012:140920 server #26 started [Trapper] 9625:20091012:140920 server #20 started [Poller. SNMP:YES] 9638:20091012:140920 server #27 started [Trapper] 9640:20091012:140920 server #28 started [Trapper] 9642:20091012:140920 server #29 started [Trapper] 9644:20091012:140920 server #30 started [Trapper] 9646:20091012:140920 server #31 started [ICMP pinger] 9648:20091012:140920 server #32 started [Alerter] 9650:20091012:140920 server #33 started [Housekeeper] 9620:20091012:140920 server #15 started [Poller. SNMP:YES] 9650:20091012:140920 Executing housekeeper 9652:20091012:140920 server #34 started [Timer] 9624:20091012:140920 server #19 started [Poller. SNMP:YES] 9632:20091012:140920 server #23 started [Poller. SNMP:YES] 9658:20091012:140920 server #36 started [Node watcher. Node ID:0] 9630:20091012:140920 server #21 started [Poller. SNMP:YES] 9612:20091012:140921 server #7 started [Poller. SNMP:YES] 9655:20091012:140921 server #35 started [Poller for unreachable hosts. SNMP:YES] 9613:20091012:140921 server #8 started [Poller. SNMP:YES] 9614:20091012:140921 server #9 started [Poller. SNMP:YES] 9631:20091012:140921 server #22 started [Poller. SNMP:YES] 9633:20091012:140921 server #24 started [Poller. SNMP:YES] 9668:20091012:140921 server #39 started [Escalator] 9663:20091012:140921 server #37 started [Discoverer. SNMP:YES] 9605:20091012:140921 server #0 started [Watchdog] 9605:20091012:140921 In main_watchdog_loop() 9667:20091012:140921 server #38 started [DB Syncer] 9634:20091012:140921 server #25 started [Poller. SNMP:YES] 9622:20091012:140921 Item [amshqb-bob01:system[procrunning]] error: Not supported by ZABBIX agent 9617:20091012:140921 Item [amshqb-dbase04:system[procrunning]] error: Not supported by ZABBIX agent 9619:20091012:140921 Item [t2nl-mgt001:agent.ping] error: Get value from agent failed: Cannot connect to [t2nl-mgt001:10050] [Connection refused] 9619:20091012:140921 Host [t2nl-mgt001]: first network error, wait for 15 seconds 9623:20091012:140921 Item [t2nl-app106:vfs.fs.size[/var,pfree]] error: Got empty string from [t2nl-app106]. Assuming that agent dropped connection because of access permissions 9619:20091012:140921 Parameter [agent.ping] will be checked after 240 seconds on host [t2nl-mgt001] 9623:20091012:140921 Host [t2nl-app106]: first network error, wait for 15 seconds 9605:20091012:140921 One child process died. Exiting ... 9605:20091012:140923 ZABBIX Server stopped. ZABBIX 1.6.3.
###### zabbix_server.conf #####
########################
Code:
# This is config file for ZABBIX server process # To get more information about ZABBIX, # go http://www.zabbix.com ############ GENERAL PARAMETERS ################# # Number of pre-forked instances of pollers # Default value is 5 # This parameter must be between 0 and 255 StartPollers=25 # How often ZABBIX will perform housekeeping procedure # (in hours) # Default value is 1 hour # Housekeeping is removing unnecessary information from # tables history, alert, and alarms # This parameter must be between 1 and 24 HousekeepingFrequency=1 # How often ZABBIX will try to send unsent alerts # (in seconds) # Default value is 30 seconds SenderFrequency=30 # Uncomment this line to disable housekeeping procedure #DisableHousekeeping=1 # Specifies debug level # 0 - debug is not created # 1 - critical information # 2 - error information # 3 - warnings (default) # 4 - for debugging (produces lots of information) DebugLevel=3 # Specifies how long we wait for agent response (in sec) # Must be between 1 and 30 Timeout=5 # Specifies how many seconds trapper may spend processing new data # Must be between 1 and 30 #TrapperTimeout=5 # After how many seconds of unreachability treat a host as unavailable UnreachablePeriod=45 # How ofter check host for availability during the unreachability period #UnavailableDelay=15 # How ofter check host for availability during the unavailability period #UnavailableDelay=60 # Name of PID file PidFile=/var/tmp/zabbix_server.pid # Name of log file # If not set, syslog is used LogFile=/home/zabbix/server.log # Maximum size of log file in MB. Set to 0 to disable automatic log rotation. LogFileSize=10 # Location for custom alert scripts AlertScriptsPath=/home/zabbix/bin/ # Location of external scripts ExternalScripts=/etc/zabbix/externalscripts # Location of fping. Default is /usr/sbin/fping # Make sure that fping binary has root permissions and SUID flag set #FpingLocation=/usr/sbin/fping # Location of fping6. Default is /usr/sbin/fping6 # Make sure that fping binary has root permissions and SUID flag set #Fping6Location=/usr/sbin/fping6 # Frequency of ICMP pings (item keys 'icmpping' and 'icmppingsec'). Defauls is 60 seconds. #PingerFrequency=60 # Database host name # Default is localhost #DBHost=t2nl-mgt102 # Database name # SQLite3 note: path to database file must be provided. DBUser and DBPassword are ignored. DBName=zabbix # Database user DBUser=zabbix # Database password # Comment this line if no password used DBPassword=zabbix # Connect to MySQL using Unix socket? #DBSocket=/tmp/mysql.sock StartDBSyncers=
###### zabbix_agentd.conf #####
########################
Code:
Server=127.0.0.1 Hostname=t2nl-mgt004 BufferSend=20 ListenIP=127.0.0.1 EnableRemoteCommands=1 PidFile=/var/tmp/zabbix_agentd.pid LogFile=/var/tmp/zabbix_agentd.log LogFileSize=1 Timeout=20 StartAgents=10 UserParameter=status.sshd,/home/zabbix/scripts/sshd_up.sh UserParameter=testdb[*],/home/zabbix/scripts/wrapper.sh $1 #UserParameter=proccount[*],ps -e | grep $1 | grep -v grep | wc -l #UserParameter=disk_check[*],/home/zabbix/scripts/check-dual-connected -m $1 -s $1 -d $1 -z UserParameter=system.temp[*],/etc/zabbix/externalscripts/temp.py --host $1 UserParameter=database.check_db[*],/home/zabbix/scripts/check_db.py $1 $2 UserParameter=database.check_refresh[*],/home/zabbix/scripts/check_db_refresh_date.py $1 $2
Comment