Ad Widget

**tchjts1** · 17-11-2014, 23:08

Graphs can "appear" to be populating fine, depending on what size period of time you are re looking at. Look at 1 hour versus, say.... 7 days.

If that were happening to me, I would look at my Zabbix internal processes and see if maybe you need to allocate some additional pollers or other config adjustment. To see the data for Internal Processes, see here, the last paragraph and the graphs that follow it: https://www.zabbix.com/forum/showthread.php?t=41219

**chrisw** · 17-11-2014, 23:57

Hey tchjts1, Thanks for the reply!

I do have them defaulted to 1 hour views, and they are still populating as we speak.

I resolved the one hosts issue (duplicate IP) and it no longer shows as Unreachable, but the other hosts aren't experiencing even remotely similar symptoms (the host that had the duplicate IP was practically inaccessible and disconnecting me every few minutes, all my other hosts are active and stable).

Literally all my hosts fired off that alert yesterday at 15:15:30, with exception to one that fired at 15:16:00. I have confirmed via our security panel that no one was even in the building at that time to cause anything physical (and the duplicate IP issue only started this morning around 8AM).

There is a small gap from my restart, but literally every other host that is alerting for Unreachable is populating up to date data on the graphs - even going to Latest Data is showing values and changes for the last check, which at this time, occurred just under 1 minute ago (Nov 17th 16:55 EST).

I had those graphs set up on a screen, but everything is okay now. Leads me to some other potential issues though, as it was pretty hairy prior to my restart (CPU load and busy% were quite a bit higher than they are now). With exception to the recovery spike, however, the past 12 hours have been steady and clear at 30% busy or below / CPU load of 2 or below.

I'm just really baffled by the updated checks / graphs but them stating they've been unreachable for over 24 hours.

I will keep digging around the Zabbix server though, as it doesn't seem to be a problem with the clients at least.

Thanks again!

**chrisw** · 18-11-2014, 00:21

I've resolved a few other one off problems now and cleared up my list quite a bit, and all but 3 of the remaining hosts have one thing in common I did not notice before

Code:

Cannot evaluate function "ServerName:agent.ping.nodata(5m)"

Where ServerName is different for each server affected.

This doesn't sound particularly good, but I'll keep digging.

**tchjts1** · 18-11-2014, 00:59

What version of Zabbix server are you using?

**tchjts1** · 18-11-2014, 01:12

Is this your trigger expression at the template level?

Code:

{Template OS Windows:agent.ping.nodata(5m)}=1

**chrisw** · 18-11-2014, 01:25

The latest currently available in Gentoo's Portage repo, 2.2.7. I recently did an upgrade, I believe from 2.2.2, but that was over 2 weeks ago and things have been running fine up until yesterday.

I was having some issues installing sudo but I managed to get that going, and I confirmed Zabbix user can use ping, for arguments sake.

Yes to your trigger question, for both Windows and Linux templates (3 linux hosts and 5 windows hosts remain in this state, same errors for both OS's)

Code:

	{Template OS Windows:agent.ping.nodata(5m)}=1
	{Template OS Linux:agent.ping.nodata(5m)}=1

There is also no firewall between this machine and the clients, nor anything like GRSEC or SELINUX to get in the way.

Also strange, these errors don't appear to be showing up in the logs (the "cannot evaluate function" errors). The only other similar instances I have are in regards to one of our MFC's but I know that's related to the differences in a ColorQube vs a WorkCenter. I was hoping there would be something more in the logs related to this but I can't find anything.

Also just confirmed there are a few agents still at 2.2.2 but it's affecting some on 2.2.2 and some on 2.2.7 so the version difference doesn't appear to be root cause.

**chrisw** · 22-11-2014, 01:33

Well I finally gave up, as I don't have a backup to Zabbix and there's only so long you can go without your monitoring system.

I went to do an update to 2.4.1, and upon trying to back up the database, I found there was quite a few corrupted tables. Not sure if this was related to the issues I was seeing or not.

I blew away the database and started fresh with 2.4.1. Was able to export everything, and everything imported but the Screens, which will be a pain. Also of course lost all my historical data, but at least I've got my Zabbix back up and running properly.

Ad Widget

All Agents "Unreachable" At Same Time - Graphs Still Populating

All Agents "Unreachable" At Same Time - Graphs Still Populating

Comment

Comment

Comment

Comment

Comment

Comment

Comment