[bug?] arithmetic overflow error in v1.1.4 to v1.1.7

pdwalker

Senior Member

Joined: Dec 2005
Posts: 166

[bug?] arithmetic overflow error in v1.1.4 to v1.1.7

30-04-2007, 10:14

Zabbix Version 1.1.7, both agent and client
OS: 32 bit Centos 4.4 (server)
OS: 64 bit Centos 4.4 (clients)

I'm seeing to be what I believe to be an overflow error in the zabbix server.

What I have is one server that has been up for a period of over 5 months and has sent a fair amount of data during that time.

The item I am monitoring is configured as follows:

Code:

Description:         * Net: eth2: Out Bytes
Type:                Zabbix Agent
Key:                 [B]net.if.out[eth2][/B]
Type Of Information: [B]numeric(float)[/B]
Units:               B/s
Use Multiplier:      Do not use
Update Interval:     30
Keep History:         7
Keep Trends:         30
Status:              Monitored
Store Value:         [B]Delta (speed per second)[/B]

The statistics from an ifconfig command on two successive invocations are as follows:

Code:

eth2      Link encap:Ethernet  HWaddr 00:E0:81:42:A2:8A  
          inet addr:66.237.62.70  Bcast:66.237.62.95  Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6206970074 errors:0 dropped:0 overruns:2 frame:2
          TX packets:5969353011 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          [B]RX bytes:1381256643448 (1.2 TiB)  TX bytes:1959423311117 (1.7 TiB)[/B]

Code:

eth2      Link encap:Ethernet  HWaddr 00:E0:81:42:A2:8A  
          inet addr:66.237.62.70  Bcast:66.237.62.95  Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:6206970561 errors:0 dropped:0 overruns:2 frame:2
          TX packets:5969353459 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          [B]RX bytes:1381256741733 (1.2 TiB)  TX bytes:1959423430756 (1.7 TiB)[/B]

The delta's being stored in the database for this item look like this:

Code:

mysql> select * from history where itemid = 17631 limit 10;
+--------+------------+------------------+
| itemid | clock      | value            |
+--------+------------+------------------+
|  17631 | 1177312225 | 25380538891.4118 |
|  17631 | 1177312252 | 31960873547.0000 |
|  17631 | 1177312285 | 26149985370.9091 |
|  17631 | 1177312312 | 31961285623.2593 |
|  17631 | 1177312345 | 26150311678.3636 |
|  17631 | 1177312372 | 31961687237.2222 |
|  17631 | 1177312405 | 26150659855.6970 |
|  17631 | 1177312432 | 31962095095.3333 |
|  17631 | 1177312465 | 26150981869.8788 |
|  17631 | 1177312492 | 31962497902.9630 |
+--------+------------+------------------+
10 rows in set (0.00 sec)

If I look at the trends (sorry, I had this problem a long time and the data has moved out of the history table) early on in the history, I see the values I "expect" to see

Code:

mysql> select * from trends where itemid = 17631 order by clock limit 5;
+--------+------------+-----+-----------+-----------+-----------+
| itemid | clock      | num | value_min | value_avg | value_max |
+--------+------------+-----+-----------+-----------+-----------+
|  17631 | 1154732400 |  51 |  107.6923 |  182.2010 |  582.5000 |
|  17631 | 1154736000 | 120 |   42.2800 |  174.7061 |  290.6757 |
|  17631 | 1154739600 | 120 |   34.3333 |  173.9369 |  291.3333 |
|  17631 | 1154743200 | 120 |   57.7778 |  171.0022 |  253.8974 |
|  17631 | 1154746800 | 120 |   34.9259 |  168.4361 |  274.3939 |
+--------+------------+-----+-----------+-----------+-----------+
5 rows in set (0.00 sec)

When I look at the latest trends, I see the grossly exaggerated values

Code:

mysql> select * from trends where itemid = 17631 order by clock desc limit 5;
+--------+------------+-----+------------------+------------------+-------------------+
| itemid | clock      | num | value_min        | value_avg        | value_max         |
+--------+------------+-----+------------------+------------------+-------------------+
|  17631 | 1177920000 |   5 | 29074763141.8485 | 31659416580.5771 |  35536204022.2222 |
|  17631 | 1177916400 | 117 |  8197531095.2222 | 32206601441.0039 |  43611314467.7273 |
|  17631 | 1177912800 | 121 | 21794439748.1818 | 33813439024.3971 | 159800714695.1667 |
|  17631 | 1177909200 | 120 | 27386203382.8286 | 32353125326.6845 |  38343269614.5200 |
|  17631 | 1177905600 | 120 | 24570471886.8462 | 32440411990.4769 |  47912528382.3500 |
+--------+------------+-----+------------------+------------------+-------------------+
5 rows in set (0.00 sec)

Chart 2c shows the weekly traffic graphs just before the problem appeared

Chart 2d is from an hour later when the problem occurs on the outgoing traffic

Chart 2e is from a week later - you can see the network traffic showing obviously bad values (if only my network card got faster the longer I used it!)

Chart 2b shows when the problem when it started occurring on the ingoing traffic

Finally, Chart 2a shows the last month of traffic with my outgoing traffic being listed as almost 30GB/s

What I am guessing is that numbers being returned to the zabbix server are overflowing the counter that is used to store them.

I've verified that the zabbix client is returning the correct values, as in the big numbers that I am seeing from the ifconfig command (1,959,423,311,117).

Has anyone seen this problem before, or does anyone have any suggestions on how to work around the problem? (yes, I could reboot the server, but I'd rather not since they are production servers).

Thanks in advance for your help.

- Paul

Attached Files

Last edited by pdwalker; 30-04-2007, 10:21.

Tags: None

pdwalker

Senior Member

Joined: Dec 2005

Posts: 166
#2

08-05-2007, 13:12

*bump*

Anyone seen this problem?
Comment
Alexei

Founder, CEO

Joined: Sep 2004

Posts: 5654
#3

08-05-2007, 13:26

Yes, this is a know issue. It won't be fixed in 1.1.x because a fix would require changes in database structure.

Alexei Vladishev
Creator of Zabbix, Product manager
New York | Tokyo | Riga
My Twitter
Comment
pdwalker

Senior Member

Joined: Dec 2005

Posts: 166
#4

10-05-2007, 06:09

Thanks for the update Alexei!
Comment

Ad Widget

[bug?] arithmetic overflow error in v1.1.4 to v1.1.7

[bug?] arithmetic overflow error in v1.1.4 to v1.1.7

Comment

Comment

Comment