Ad Widget

**andris** · 18-08-2016, 10:44

Seems like somewhere uptime is stored as 32-bit number which is too small for long uptimes. Going to 64-bits could be a solution.
Does your monitored SNMP device report correct uptime after 497 days with other tools ? Or is it only 32-bit aware ? Is your Zabbix 64-bit ?

**kloczek** · 18-08-2016, 10:52

Originally posted by Ultrasonic

2016-08-16 18:43:14 42949650

$ echo 42949530; echo "2^32" | bc
42949530
4294967296

From http://www.alvestrand.no/objectid/1.3.6.1.2.1.1.3.html

Code:

 OID description:

sysUpTime OBJECT-TYPE
              SYNTAX  TimeTicks
              ACCESS  read-only
              STATUS  mandatory
              DESCRIPTION
                      "The time (in hundredths of a second) since the
                      network management portion of the system was last
                      re-initialized."
              ::= { system 3 }

So looks like everything is OK

**Ultrasonic** · 18-08-2016, 11:18

I don't know what is this zabbix platform 64 or 32 bit - I have not administration privileges on platform cluster. Zabbix version is 2.4.7

It looks like 32 bit values problem, but what we can do with it?

I see only "numeric unsigned", "numeric float", or "log","text" values in zabbix item configuration.

There is mostly cisco devices.

**kloczek** · 18-08-2016, 15:26

Originally posted by Ultrasonic

I don't know what is this zabbix platform 64 or 32 bit - I have not administration privileges on platform cluster. Zabbix version is 2.4.7

It looks like 32 bit values problem, but what we can do with it?

I see only "numeric unsigned", "numeric float", or "log","text" values in zabbix item configuration.

There is mostly cisco devices.

It has nothing to do with zabbix.
I've quoted the definition of the sysUpTime from SNMPv2 MIB.You cannot read from the SNMP agent data which such agent does not provide.
If you see that in the monitoring of another device uptime provided over SNMP is not affected by 497.1 days max interval it means that your monitoring is reading not SNMPv2-MIB::sysUpTime OID. Which one exactly? You can check it in your monitoring.
IIRC in other Cisco-specific MIB is a definition of the OID stored in 64bit counter as the number of seconds. Just try to google

**Ultrasonic** · 18-08-2016, 15:58

You have right, i came to this.

Cisco have another OID where store uptime in seconds (instead seconds*100), but it requires SNMP-FRAMEWORK-MIB supported in device (snmpEngineTime at OID .1.3.6.1.6.3.10.2.1.3).

Unfortunately my devices doesn't have SNMP-FRAMEWORK-MIB supported

**kloczek** · 18-08-2016, 17:12

As long as this device not been rebooted and probably firmware upgrades not have been done as well you should check latest versions of the firmware .. maybe they've added support for this MIB.

BTW: I would be very worry having so long not restarted devices. After so long time probability that for some reasons such device would be not able to boot correctly could be above unacceptable level.
Sometimes even almost-faulted cooling fun by stopping it and starting going over power cycle may be found as now-it-is-faulted state

Always better is to find such problems during working hours instead be wake up in the middle of the night :P
If you have full redundancy of the network infrastructure time to time failing over to standby devices to perform full reboot with power cycle should part of the normal operation procedures. I have in all my templates uptime mapped to inventory records to have single simple view allowing to identify systems/devices with longest uptimes.

**Ultrasonic** · 19-08-2016, 07:53

I could reboot the routers if it were my - unfortunately it is owned by a government agency and to nod your finger there is a need council meeting, approval, etc.

In fact, it does not need to track uptime for these devices, but it is important information that there was a restart. Currently I am looking for another indicator that occurs every cisco and reflects the restart.

**andris** · 19-08-2016, 09:43

Device reboot exactly after 497.1 days is extremely unlikely.
So, as a workaround you could compare 2 latest uptime values.
If uptime has decreased AND it was less than (497.1 - small value, depends on how often you poll device) days before decrease, then a reboot took place.
Otherwise no reboot, just counter overflow.

**Ultrasonic** · 19-08-2016, 14:36

Good idea
I made little modification of standard SNMP trigger "{HOST.NAME} has just been restarted":

Expression:

{hostname_xxx:sysUpTime.change(0)}<0 and {hostname_xxx:sysUpTime.prev()}<42949000

42949000 is near max allowed uptime value.

I hope this should works...

**syntax53** · 04-06-2018, 14:10

Stumbled onto this post researching a similar issue with a device. The default template, "Template Module Generic SNMPv2" has a trigger for "{HOST.NAME} has been restarted" with a value of "{Template Module Generic SNMPv2:system.uptime.last()}<10m". So it's not looking at .change, but rather an uptime of less than 10 minutes. I have modified the trigger as follows:

Code:

{Template Module Generic SNMPv2:system.uptime.last()}<10m and ({Template Module Generic SNMPv2:system.uptime.max(660)}<4294307 or {Template Module Generic SNMPv2:system.uptime.max(660)}>4294997)

I believe this will stop the false alerts for 32 bit values but still allow them on 64 bit values. The upper limit on a 32-bit unsigned int is 4,294,967,295. The last thousandths of that number (0-999) are used as the fractions of seconds. So the upper limit in seconds is 4,294,967. Minus 600 seconds (10 minutes) would be 4,294,367. I subtracted an extra 60 seconds for wiggle room which is were the 4294307 comes from. Likewise, I added 30 seconds to the max of 4294967 to get 4294997 for the upper limit. So only if a device happens to reboot within that 11 minute and 30 second window would it get missed.

I haven't actually tested this, but it looks good

**kloczek** · 05-06-2018, 15:47

More than year without any firmware upgrades ...

**syntax53** · 05-06-2018, 22:52

If it's not insecure and it ain't broke, don't fix it.

**kloczek** · 06-06-2018, 00:49

Originally posted by syntax53

If it's not insecure and it ain't broke, don't fix it.

Dos't matter.
Keeping long uptime is nothing more than asking for troubles.
I saw to many times in the past computers, routers or even switches not been able to work after full power cycle (usually by final failing of the bearings in in sining parts like disks of cooling fans).
In many environments people are using monitoring to observe uptimes to take actions (automatic, semi automatic or manual) to perform at least full power cycle if not system reinstall to allow finally fail something just when all ops are around to handle such failure quickly instead in the middle of the night or when some people are on holiday.

**syntax53** · 13-06-2018, 15:39

Originally posted by syntax53

Stumbled onto this post researching a similar issue with a device. The default template, "Template Module Generic SNMPv2" has a trigger for "{HOST.NAME} has been restarted" with a value of "{Template Module Generic SNMPv2:system.uptime.last()}<10m". So it's not looking at .change, but rather an uptime of less than 10 minutes. I have modified the trigger as follows:

Code:

{Template Module Generic SNMPv2:system.uptime.last()}<10m and ({Template Module Generic SNMPv2:system.uptime.max(660)}<4294307 or {Template Module Generic SNMPv2:system.uptime.max(660)}>4294997)

I believe this will stop the false alerts for 32 bit values but still allow them on 64 bit values. The upper limit on a 32-bit unsigned int is 4,294,967,295. The last thousandths of that number (0-999) are used as the fractions of seconds (e.g. milliseconds). So the upper limit in seconds is 4,294,967. Minus 600 seconds (10 minutes) would be 4,294,367. I subtracted an extra 60 seconds for wiggle room which is were the 4294307 comes from. Likewise, I added 30 seconds to the max of 4294967 to get 4294997 for the upper limit. So only if a device happens to reboot within that 11 minute and 30 second window would it get missed.

I haven't actually tested this, but it looks good

I had to modify this trigger because I found one device that seems to only use the last 2 digits of the integer for milliseconds, so it rolled over at 42949672 instead of 4294967. Modified trigger as follows:

Code:

{Template Module Generic SNMPv2:system.uptime.last()}<10m
and ({Template Module Generic SNMPv2:system.uptime.max(660)}<4294307 or {Template Module Generic SNMPv2:system.uptime.max(660)}>4294967)
and ({Template Module Generic SNMPv2:system.uptime.max(660)}<42949012 or {Template Module Generic SNMPv2:system.uptime.max(660)}>42949672)

Ad Widget

SNMP uptime overflow after 497 days

SNMP uptime overflow after 497 days

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment