Ad Widget
Collapse
Zabbix 1.8.11 Suddenly Stops Sending Notifications
Collapse
X
-
Ok, so if those values in your zabbix_server.conf are commented out, then you are going with all of the default values. 93 hosts may not be a lot, but if you are doing SNMP on a lot of interfaces/volumes, that could be putting stress on the Zabbix application if you don't adjust some of those parameters. Which in turn, may be affecting it's ability to generate proper alerts.
I would suggest these adjustments to you and see if it provides any improvements. In zabbix_server.conf:
(Leave the commented one in place and just put a new line without the comment)
StartPollers=20
StartPollersUnreachable=4
StartTrappers=15
Timeout=10
You then need to restart your Zabbix Server process.
Regarding all of the "Sending list of active checks failed" error messages, Looks like you are also monitoring standard (non-SNMP) items on your Exchange servers? Do you have any of those items set as type "Zabbix Agent (Active)" ? It appears to not be translating system.uname to your host name. For Active items to work, they rely on exactly matching the hostname as you have it in the Zabbix GUI as compared to what you have in the Hostname= field on your hosts in zabbix_agentd.confLast edited by tchjts1; 02-07-2013, 20:24.Comment
-
Ok.. after changing those settings I did see them get set during boot in the Zabbix_server.log file. I immediately received the alerts from the same 2 servers I received before and there is one other one in the list that I should have received a low disk space alert from.

As for your last comment... "For Active items to work, they rely on exactly matching the hostname as you have it in the Zabbix GUI as compared to what you have in the Hostname= field on your hosts in zabbix_agentd.conf
I guess I don't understand this one... In my zabbix_agentd.conf Hostname= field has the host name of my zabbix server. If that was wrong nothing would work. Correct?
Also, yes... most of what I monitor is Agent Active stuff.Comment
-
On your hosts in the zabbix_agentd.conf file for the Hostname= field, if you have the name of your Zabbix server, that is certainly an issue. And if these are the servers where you are not receiving disk space alerts, and the items are set as Active agents, that would explain it. That field needs to be the name of your host and it must be an exact match (Including case sensitive) as to the name that is in the Zabbix GUI.
So here's a quick review of how your settings should be. If you make changes to this file, you must restart your Zabbix agent service.
On your hosts, in zabbix_agentd.conf, the DNS or IP of your Zabbix server needs to be in the Server= field such as this:
And, depending on the version you are running, this next field may also be part of the zabbix_agentd.conf file. This is the one which tells your host which server to send it's active check data to. if this doesn't exist, then the "Server=" above one will handle it.Code:##### Passive checks related ### Option: Server # List of comma delimited IP addresses (or hostnames) of Zabbix servers. # Incoming connections will be accepted only from the hosts listed here. # No spaces allowed. # If IPv6 support is enabled then '127.0.0.1', '::127.0.0.1', '::ffff:127.0.0.1' are treated equally. # # Mandatory: no # Default: # Server= Server=12.34.56.789
Now these next 2 variables, you can leave them commented out and let Zabbix grab the Hostname from system.hostname, or you can manually edit the Hostname= field and put the name in that you want, but again, it must match exactly as in the Zabbix GUI frontend:Code:##### Active checks related ### Option: ServerActive # List of comma delimited IP:port (or hostname:port) pairs of Zabbix servers for active checks. # If port is not specified, default port is used. # IPv6 addresses must be enclosed in square brackets if port for that host is specified. # If port is not specified, square brackets for IPv6 addresses are optional. # If this parameter is not specified, active checks are disabled. # Example: ServerActive=127.0.0.1:20051,zabbix.domain,[::1]:30051,::1,[12fc::1] # # Mandatory: no # Default: # ServerActive= ServerActive=12.34.56.789
Code:### Option: Hostname # Unique, case sensitive hostname. # Required for active checks and must match hostname as configured on the server. # Value is acquired from HostnameItem if undefined. # # Mandatory: no # Default: # Hostname= ### Option: HostnameItem # Item used for generating Hostname if it is undefined. # Ignored if Hostname is defined. # # Mandatory: no # Default: # HostnameItem=system.hostname
If your zabbix_agentd.conf file fields are a different version that what I am showing, just copy and paste it into a post here and put it between these tags. surround the below words with brackets [] around each word.
code
/codeComment
-
Regarding this... if you have Hostname= set to your Zabbix server name, then any item that you have set as an Active Agent item, is reporting up to your Zabbix server and is identifying itself as data for your Zabbix server. That means data you see for your Zabbix Server is probably totally skewed and all over the place if you have 20 or 30 servers reporting in as your Zabbix server.Comment
-
Wow.. now I am really confused because I haven't been in that conf file since I built this thing about 1 year ago. So you are telling me that the entry "Hostname=" in my /etc/zabbix/zabbix_agentd.conf should NOT be the name of my Zabbix server???
Ok, if not that, then what should it be? I haven't touch that file and certainly did not touch it on 6/20 when all of this started. Since then I have had 3 major outages that I was not alerted to in time. I need help.
I really appreciate the attempts at helping me on this BTW.
Comment
-
It would be the name of your Zabbix server - only on your Zabbix server.
In zabbix_agentd.conf
Server= <-- IP or DNS name of your Zabbix server
Hostname= <-- Name of your machine or device that is reporting data to Zabbix server
Let's talk about your Windows Exchange servers. See this in the Wiki:
You have the option to leave Hostname= blank and let Zabbix populate the host name into the GUI or you have the option to manually put the host name in that field.
It would simply be easier if you posted one of your Windows zabbix_agentd.conf files on here so I could look at it.Last edited by tchjts1; 03-07-2013, 15:54.Comment
-
I have some concerns since I think we're getting off track here. My everyday job is to troubleshoot issues so I have questions:
You said:
It would be the name of your Zabbix server - only on your Zabbix server.
In zabbix_agentd.conf
Server= <-- IP or DNS name of your Zabbix server
Hostname= <-- Name of your machine or device that is reporting data to Zabbix server"
I assume you mean that the zabbix_agentd.conf on the clients should have something different in them other than system.uname because that is what is on all 93 of my hosts and that did not change on 6/20. I checked and the system that is reporting correctly has Hostname=system.uname in the zabbix_agentd.conf file.
Let's talk about your Windows Exchange servers. See this in the Wiki:
I have read this many times and it refers to "server" and doesn't make it clear as to whether it is referring to the monitored server or the zabbix server.
You have the option to leave Hostname= blank and let Zabbix populate the host name into the GUI or you have the option to manually put the host name in that field.
It would simply be easier if you posted one of your Windows zabbix_agentd.conf files on here so I could look at it.
############ GENERAL PARAMETERS #################
### Option: DebugLevel
# Specifies debug level
# 0 - no debug
# 1 - critical information
# 2 - error information
# 3 - warnings
# 4 - for debugging (produces lots of information)
#
# Mandatory: no
# Default:
DebugLevel=0
### Option: LogFile
# Name of log file.
#
# Mandatory: no
# Default:
# LogFile=
LogFile=C:\Program Files\Zabbix Agent\Zabbix_agentd.log
### Option: LogFileSize
# Maximum size of log file in MB.
# 0 - disable automatic log rotation.
#
# Mandatory: no
# Range: 1-1024
# Default:
# LogFileSize=1
### Option: SourceIP
# Source IP address for outgoing connections.
#
# Mandatory: no
# Default:
# SourceIP=
### Option: EnableRemoteCommands
# Whether remote commands from Zabbix server are allowed.
# 0 - not allowed
# 1 - allowed
#
# Mandatory: no
# Default:
EnableRemoteCommands=1
##### Passive checks related
### Option: Server
# List of comma delimited IP addresses (or hostnames) of Zabbix servers.
# No spaces allowed. First entry is used for receiving list of and sending active checks.
# Note that hostnames must resolve hostname->IP address and IP address->hostname.
#
# Mandatory: yes
# Default:
# Server=
Server=MyZabbixServer.domain.net
### Option: Hostname
# Unique hostname.
# Required for active checks and must match hostname as configured on the server.
#
# Default:
# Hostname=system.uname
Hostname=system.uname
### Option: ListenPort
# Agent will listen on this port for connections from the server.
#
# Mandatory: no
# Range: 1024-32767
# Default:
ListenPort=10050
### Option: ListenIP
# Agent will listen on the specified interface.
#
# Mandatory: no
# Default:
# ListenIP=0.0.0.0
# ListenIP=127.0.0.1
### Option: DisablePassive
# Disable passive checks. The agent will not listen on any TCP port.
# Only active checks will be processed.
# 0 - do not disable
# 1 - disable
#
# Mandatory: no
# Default:
# DisablePassive=0
##### Active checks related
### Option: DisableActive
# Disable active checks. The agent will work in passive mode listening for server.
#
# Mandatory: no
# Default:
# DisableActive=0
# DisableActive=1
### Option: ServerPort
# Server port for retrieving list of and sending active checks.
#
# Mandatory: no
# Default:
# ServerPort=10051
### Option: RefreshActiveChecks
# How often list of active checks is refreshed, in seconds.
#
# Mandatory: no
# Range: 60-3600
# Default:
# RefreshActiveChecks=120
### Option: BufferSend
# Do not keep data longer than N seconds in buffer.
#
# Mandatory: no
# Range: 1-3600
# Default:
# BufferSend=5
### Option: BufferSize
# Maximum number of values in a memory buffer. The agent will send
# all collected data to Zabbix Server or Proxy if the buffer is full.
#
# Mandatory: no
# Range: 1-65535
# Default:
# BufferSize=100
### Option: MaxLinesPerSecond
# Maximum number of new lines the agent will send per second to Zabbix Server
# or Proxy processing 'log' and 'eventlog' active checks.
# The provided value will be overridden by the parameter 'maxlines',
# provided in 'log' or 'eventlog' item key.
#
# Mandatory: no
# Range: 1-1000
# Default:
# MaxLinesPerSecond=100
############ ADVANCED PARAMETERS #################
### Option: StartAgents
# Number of pre-forked instances of zabbix_agentd that process passive checks.
#
# Mandatory: no
# Range: 1-16
# Default:
StartAgents=5
### Option: Timeout
# Spend no more than Timeout seconds on processing
#
# Mandatory: no
# Range: 1-30
# Default:
Timeout=5
### Option: Include
# You may include individual files or all files in a directory in the configuration file.
#
# Mandatory: no
# Default:
# Include=
# Include=c:\zabbix\zabbix_agent.userparams.conf
# Include=c:\zabbix\zabbix_agentd\
####### USER-DEFINED MONITORED PARAMETERS #######
### Option: UserParameter
# User-defined parameter to monitor. There can be several user-defined parameters.
# Format: UserParameter=<key>,<shell command>
# Note that shell command must not return empty string or EOL only.
# Example: UserParameter=system.test,echo 1
#UserParameter=system.test,echo 1Comment
-
All I can say is that your setup just totally baffles me.
I assume you mean that the zabbix_agentd.conf on the clients should have something different in them other than system.uname because that is what is on all 93 of my hosts and that did not change on 6/20. I checked and the system that is reporting correctly has Hostname=system.uname in the zabbix_agentd.conf file.
First, let me clearly state that I don't have a 1.8.11 system to test with...
So, if it is "mostly" working for you as is, then take my following information with a grain of salt.
I don't understand how you are getting any reliable data. The system.uname variable doesn't even return a clean host name. It returns a string of data like: "Windows MachineXXX 5.2.3790 Microsoft Windows Server 2003 R2 Enterprise Edition Service Pack 2 x86"
In that particular case, they are referring to the host you are monitoring. The subject in that link is discussing the agent on a Windows (Server) host.
How did your host names get entered into the Zabbix GUI? Were they manually entered?
You mentioned this in a previous post:
This is because any item you have set to an active check, is trying to be sent out to servers that have the actual name of "system.uname" in the Zabbix GUI, and you don't have any.
I truly don't know how you are getting any data from the hosts you monitor. You are on 1.8.11, right? This includes both your Zabbix server and Zabbix agents? It almost looks to me like an install of 1.8.11 was done, or an upgrade to 1.8.11 was done, but older zabbix_agentd.conf files were left in use on your hosts.
On your agents, the appropriate use for Hostname= and HostnameItem= (Which I don't even see HostnameItem= in your conf file) would be this:
Scenario #1 of manually populating Hostname= in the conf and manually entering the host name in the Zabbix GUI
Scenario #2 of Zabbix automatically picking up the host server name with you manually entering the host name in the Zabbix GUICode:### Option: Hostname # Unique, case sensitive hostname. # Required for active checks and must match hostname as configured on the Zabbix server. # Value is acquired from HostnameItem if undefined. # # Mandatory: no # Default: # Hostname= Hostname=MyWindowsServerName ### Option: HostnameItem # Item used for generating Hostname if it is undefined. # Ignored if Hostname is defined. # This option is supported in version 1.8.6 and higher. # # Mandatory: no # Default: # HostnameItem=system.hostname
So, unless release 1.8.11 is a one-off, and allows the usage of system.uname to populate the Hostname= field, then I have no clue how your setup is working.Code:### Option: Hostname # Unique, case sensitive hostname. # Required for active checks and must match hostname as configured on the server. # Value is acquired from HostnameItem if undefined. # # Mandatory: no # Default: # Hostname= ### Option: HostnameItem # Item used for generating Hostname if it is undefined. # Ignored if Hostname is defined. # This option is supported in version 1.8.6 and higher. # # Mandatory: no # Default: # HostnameItem=system.hostname
If I were you, I would apply either scenario #1 or #2 above to one of your Windows servers where you are missing data and alerts, and see if that corrects the issue.
Or you can disregard this post altogether... because you have me fairly confused now.Comment
-
First of all I appreciate all of the attempted help and I am going to get back on this today. I do have an additional interesting thing. Yesterday we added 2 new servers using the same process to get them setup to be monitored. We got 2 different results.
1. Both servers started passing good data to Zabbix
2. Both triggered an alert due to an ISO being mounted (0 free space on D)
3. After 2.5 hours I have it set to send an alert to my cell phone. only 1 did
4. All historical data is working fine as of this morning on these and all other servers
The bigger issue here is that there are 2 escalations in the actions before it sends anything to my cell. Those did NOT happen. This is so weird!!!
So all in all... something happened on 6/20 that affected all but 2 or 3 of my 95 monitored devices. So you are right I am not getting reliable data but why? I did not change 92 devices. So that means there is something on the Zabbix server that changed but I will take your advice and pick 1 monitored device and start messing with the CONF file. I don't believe this is the issue unless there is an incompatibility between the Zabbix server and the agents I am running all of a sudden.
I will get back to this thread today
Also a clarification... If I said anywhere in this thread that I was using Type: Zabbix Agent (Active) I appoligized. I am using the standard template... Type: Zabbix Agent
Something else I found.. I manually added a item and trigger to the host (not the template) and I received consistent data and triggering from that item.Comment
-
Ok.. since most of this thread has centered around data collection I would like to refocus. Data collection is not the issue here. My real issue is inconsistent Actions. Triggers are triggered. I can see this on the console. What is not happening is the actions I have configured. They all worked consistently up until 6/20. So what I need help with and I can't find in this forum is:
How do Actions stop working all of a sudden when triggers ARE triggered?
I don't want to muddy the waters with looking into zabbix_agentd.conf on one client. This affected almost all of my hosts all at once and the only mandatory setting with Non-Active checks in place is "Server=" and that is correctly.
My only other option in my environment at this point is Tivoli
and I do NOT want to go down that road.
Comment
-
I guess I've upset the forum. Any help would be appreciated.
Now I think this may stem from a database corruption. I say that because I am seeing in some of my host a ghost (greyed out) entry under Latest Data of drives that these particular hosts have never had before. Not sure why this would stop triggering actions but maybe it's something.Comment
-

I have found the root cause.
Database corruption
This corruption caused me to have to unlink and clear every host with an agent installed. As I did this and linked them back to the template I started receiving alerts immediately. I have lots of clean up work to do.Comment
Comment