View Full Version : How to roll-back "Not supported" items
My SNMP agent has some number of items which are sometimes not available (they are dynamic). I.e. when some process is down I don't see his variables.
When it happends, zabbix is changing state of time to "Not supported", and I need manually change it back to "Monitored".
Is there any way to avoid such case ?
Nate Bell
22-06-2005, 15:54
When I run into values that are sometimes unsupported, I have been making scripts that check the value before giving it to Zabbix, and change the value to a bogus one that Zabbix will accept. However, these have all been executed with the Zabbix_Agent and UserParameters in the agent's config file. I'm not sure about doing this with the SNMP_Agent.
Perhaps there is a cleaner way to do this. I'd certainly be interested in hearing about it.
Nate
I've added new configuration parameter NoStatusChange to zabbix_server.conf.
Setting this parameter to 1 blocks this behaviour of zabbix. From yesterday it works for me.
This is patch to server.c (1.1alpha10)
=================================
79d78
< int CONFIG_NO_STATUS_CHANGE = 0; // Added by JT 2005.06.22
249d247
< {"NoStatusChange",&CONFIG_NO_STATUS_CHANGE,0,TYPE_INT,PARM_OPT,0,1},
474,475c472
< if( CONFIG_NO_STATUS_CHANGE == 0 ) // Added by JT 2005.06.22
< DBupdate_item_status_to_notsupported(item.itemid, error);
---
> DBupdate_item_status_to_notsupported(item.itemid, error);
=================================
I plan to introduce an additional status for monitored parameter - not available. 'Not available' items will be periodically rechecked.
For example, if you try to get filesize[/var/log/messages] and the file does not exists, status will be set to "Not available" (recoverable). However, if agent doesn't support filesize[], the status will be set to "Not supported" (unrecorevable).
Great idea, but the question is what means "Not supported" in case of SNMP ? If I have some dynamic values, which appears when some process is up, but they don't exist at all when process is down, they are unsoported or unavailable ?
I think, that better solution is to add some additional parameter to each item, which tell that the value shouldn't be blocked.
I'm using zabbix to monitor systems operated by non-technical people. I wan't to give them a tool, which will have absolutely fixed configuration. I don't even want to give them administration password.
Could you tell more about the dynamic SNMP items? Are they related to network interfaces?
Generally almost every value in SNMP can be dynamic. The best example is the list of processes running on machine. Every process has some number of attributes like CPU usage, allocated memory, priority etc.
Of course this set of values exist only if the process is running, when process terminates values are no more available. Even more, when process is restarted they will appear with different OIDs because the process number will be different.
In my case, I have some set of values which have fixed OIDS, but they exists only when the process is running.
Generally, most of attributes supplied by SNMP depends on state of the system. Network interfaces, mounted filesystems, tcp connections may not exist at boot time, but can appear afterwards.
It means that you cannot distinguish when some attribute is not supported, or currently not available.
I think that the only case when some attribute may be treated as unsupported is the case when SNMP cannot resove its name or oid is invalid (message: Unknown Object Identifier).
Your idea with rechecking "not available" values looks very good, but in case of SNMP, Item status, should be changed to "unsupported" only manually.
I see there also some other way: change of staus may be triggered also by SNMP trap, but this is risky, as the traps are delivered without confirmation.
By the way, handling of SNMP traps by zabbix should be also changed. I have some idea how, and if you will agree I can do it.
best regards
Jarek.
This is a very importand issue for me. Right now I have to activate around 100 items everytime someone restarts a server, or even worse, whenever there are problems I will loose important data since the item turns into not supported. A new status "Not available" will solve this problem.
It seems that you have a solution to this problem Jarek. It would be really nice to see this implemented in the next release of Zabbix.
Thanks
Robert
I see this problem every now and then and it remains in the ZABBIX 1.1beta2. Sometimems items turns "Not supported", i've notice that this happens if I stop the SNMP agent for a short time or even if i have intermittent problems in the network(the Internet in my case).
Basically we have three options.
1. Make sure internet is stable at all times, In my case that involves one ISP in Europe and one in SouthAmerica plus a number of unknown carriers in between. In addition I have to make sure the server i monitor is up and running at all time. Not even a restart can be allowed.
2. Change the Items back to "Monitored" everytime Zabbix turns them "Not supported".
3. Item status, should be changed to "Not supported" for type SNMP only manually by Zabbix as previously suggested in this thread.
I would vote for option number three above :)
Thanks
Robert
I like the idea of having an extra option so that I can enable forced checking (not a delayed check) for specific items...
Otherwise there could be a lot of junk for items that arent actually supported but are in my zabbix_agentd.conf.
I like the idea of having an extra option so that I can enable forced checking (not a delayed check) for specific items...
Otherwise there could be a lot of junk for items that arent actually supported but are in my zabbix_agentd.conf.
I too would prefer a forced checking option. I have problems with SQL server performance counters. Sometimes the counters completely disappear, and zabbix marks them unsupport, and I have to manually re enable them. I've been bit by it more than once :(
If I could choose a Manual override for that, it would be excellent. Or if ALL unsupport and not available items attempted reconnect periodically (at a configurable period!), it would be excellent.
I'm eagerly awaiting a resolution to this :)
I too, vote for absolutely must have this fixed somehow ASAP. Zabbix has been extremely solid for us (other than a strange occasional crash of the suckerd and/or agentd), and I use it to monitor a nationwide VOIP ITSP.
However, the bosses really really love the screens, and get extremely mad when a graph that shows the current number of active calls on a server suddenly gets no data and the item shows as "Not Supported".
I've tried to explain that it's because of occasional unreadable data passed by the SIP server, but they don't care, and are actually moving to remove the monitoring because of that ONE thing.
Manually fixing it is not a problem, except the other admin patently refuses to remember how to do it, and calls me ANY time it goes down....
Please Please Please, let's get that one fixed.
(also, the stability of the daemons, as I get a crashed sucker or agent at least once a week)
cameronsto
02-12-2005, 16:04
Maybe in the meantime, you could write a quick cron job to just update all disabled checks to enabled via the database every 5 minutes or so. Something like:
update zabbix.items set status = 0 where status = 3;
Note: The sql should be correct, but I haven't tested it. Be sure to test it before using on your production systems.
-cameron
Automatic rechecking of unsupported parameters is in CVS. Will be released as part of 1.1beta3.
I'm soooo glad to hear it!
Oh and yes, I did have a cron job running every 30 minutes to roll back those not supported items....
I've got a temp probe in the server room that responds to snmp requests. I've got it in zabbix and several times a day any of the three OIDs go "unsupported". Now I know the beta3 product will address this by retrying, but I'm trying to debug WHY they randomly turn unsupported and I don't see anything in the zabbix_server.log. I tried debug level logging but I still don't see anything relating to when a snmp request for an OID fails. The temp probe responds very fast when I try it by snmpget or snmpwalk, but I continue to see "SNMP error [(noSuchName) There is no such variable name in this MIB.]" as the error in the interface when it goes "unsupported". Can anyone suggest where zabbix might tell me something about this?
thanks
-zac
The temp probe responds very fast when I try it by snmpget or snmpwalk, but I continue to see "SNMP error [(noSuchName) There is no such variable name in this MIB.]" as the error in the interface when it goes "unsupported". Can anyone suggest where zabbix might tell me something about this?
I'm also very interested in understanding why this happens. I believe that
this may happen during initialisation (shutdown/startup) of a SNMP agent.
Perhaps better solution could be setting of an item status to Not Available first, and then, if this is still the case within say 30 minutes, to Not Supported.
What do you think?
Given the nature of SNMP, it's my opinion that the default non-active state should be "not available" with "not supported" only available as a user selected state (which kind of makes it moot as you can always change the item to not monitored).
This of course does leave the application open to being overloaded with configured items that it must retry and may never become available, but it puts the burden on the user to watch their individual configuration more carefully if they choose to utilize monitoring via SNMP.
As it stands now, there are just too many ways that any particular SNMP item can drop into "not supported" and thus be in need of a manual restart; changing the incative state to "not available" solves most, if not all, of these issues.
As it stands now, there are just too many ways that any particular SNMP item can drop into "not supported" and thus be in need of a manual restart; changing the incative state to "not available" solves most, if not all, of these issues.
I agree. This will hopefully be addressed in 1.1beta4.
I'm also very interested in understanding why this happens. I believe that
this may happen during initialisation (shutdown/startup) of a SNMP agent.
Perhaps better solution could be setting of an item status to Not Available first, and then, if this is still the case within say 30 minutes, to Not Supported.
What do you think?
I'm more interested in not changing the status at all. While the addition of Not Available and a retry is good, it's really a bandage, not a solution to the failed query. I'm not getting a "failed to connect" but a "There is no such variable name in this MIB", which indicates that zabbix asked for the wrong thing or I'm dropping bytes in the request. I'd rather just have a monitor go to "Not Available" or "UnSupported" when it really is, but the error doesn't indicate that, it indicates a transmission error. At the moment I'm of the opinion that finding the cause of the state change is more important then what we change it to. maybe an "inError" State to indicate that the value was not returned but we did make a connection.
ahha, in the time it took to write this.. I got a state change.
007405:20051208:143319 Error in packet
Reason: (noSuchName) There is no such variable name in this MIB.
007405:20051208:143319 Parameter [Sensor3-F] is not supported by agent on host [avtech]
007405:20051208:143323 Error in packet
Reason: (noSuchName) There is no such variable name in this MIB.
007405:20051208:143323 Parameter [Sensor2-H] is not supported by agent on host [avtech]
so from this I see that checks_snmp.c has the code but I don't understand the logic.
279 if (status == STAT_SUCCESS)
280 {
281 zabbix_log( LOG_LEVEL_WARNING, "Error in packet\nReason: %s\n",
282 snmp_errstring(response->errstat));
I'm reading this as "if status = STAT_SUCCESS" which I can't find where that is being set.....oh hell..is there a dev list I should be using now?
thanks
-zac
I'm reading this as "if status = STAT_SUCCESS" which I can't find where that is being set.....oh hell..is there a dev list I should be using now?
It goes in else after:
if (status == STAT_SUCCESS && response->errstat == SNMP_ERR_NOERROR)
so basically it will be true when errstat is equal to SNMP_ERR_NOERROR