Ad Widget

Collapse

[bug?] arithmetic overflow error in v1.1.4 to v1.1.7

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pdwalker
    Senior Member
    • Dec 2005
    • 166

    #1

    [bug?] arithmetic overflow error in v1.1.4 to v1.1.7

    Zabbix Version 1.1.7, both agent and client
    OS: 32 bit Centos 4.4 (server)
    OS: 64 bit Centos 4.4 (clients)

    I'm seeing to be what I believe to be an overflow error in the zabbix server.

    What I have is one server that has been up for a period of over 5 months and has sent a fair amount of data during that time.

    The item I am monitoring is configured as follows:

    Code:
    Description:         * Net: eth2: Out Bytes
    Type:                Zabbix Agent
    Key:                 [B]net.if.out[eth2][/B]
    Type Of Information: [B]numeric(float)[/B]
    Units:               B/s
    Use Multiplier:      Do not use
    Update Interval:     30
    Keep History:         7
    Keep Trends:         30
    Status:              Monitored
    Store Value:         [B]Delta (speed per second)[/B]
    The statistics from an ifconfig command on two successive invocations are as follows:

    Code:
    eth2      Link encap:Ethernet  HWaddr 00:E0:81:42:A2:8A  
              inet addr:66.237.62.70  Bcast:66.237.62.95  Mask:255.255.255.224
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:6206970074 errors:0 dropped:0 overruns:2 frame:2
              TX packets:5969353011 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              [B]RX bytes:1381256643448 (1.2 TiB)  TX bytes:1959423311117 (1.7 TiB)[/B]
    Code:
    eth2      Link encap:Ethernet  HWaddr 00:E0:81:42:A2:8A  
              inet addr:66.237.62.70  Bcast:66.237.62.95  Mask:255.255.255.224
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:6206970561 errors:0 dropped:0 overruns:2 frame:2
              TX packets:5969353459 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              [B]RX bytes:1381256741733 (1.2 TiB)  TX bytes:1959423430756 (1.7 TiB)[/B]
    The delta's being stored in the database for this item look like this:

    Code:
    mysql> select * from history where itemid = 17631 limit 10;
    +--------+------------+------------------+
    | itemid | clock      | value            |
    +--------+------------+------------------+
    |  17631 | 1177312225 | 25380538891.4118 |
    |  17631 | 1177312252 | 31960873547.0000 |
    |  17631 | 1177312285 | 26149985370.9091 |
    |  17631 | 1177312312 | 31961285623.2593 |
    |  17631 | 1177312345 | 26150311678.3636 |
    |  17631 | 1177312372 | 31961687237.2222 |
    |  17631 | 1177312405 | 26150659855.6970 |
    |  17631 | 1177312432 | 31962095095.3333 |
    |  17631 | 1177312465 | 26150981869.8788 |
    |  17631 | 1177312492 | 31962497902.9630 |
    +--------+------------+------------------+
    10 rows in set (0.00 sec)
    If I look at the trends (sorry, I had this problem a long time and the data has moved out of the history table) early on in the history, I see the values I "expect" to see

    Code:
    mysql> select * from trends where itemid = 17631 order by clock limit 5;
    +--------+------------+-----+-----------+-----------+-----------+
    | itemid | clock      | num | value_min | value_avg | value_max |
    +--------+------------+-----+-----------+-----------+-----------+
    |  17631 | 1154732400 |  51 |  107.6923 |  182.2010 |  582.5000 |
    |  17631 | 1154736000 | 120 |   42.2800 |  174.7061 |  290.6757 |
    |  17631 | 1154739600 | 120 |   34.3333 |  173.9369 |  291.3333 |
    |  17631 | 1154743200 | 120 |   57.7778 |  171.0022 |  253.8974 |
    |  17631 | 1154746800 | 120 |   34.9259 |  168.4361 |  274.3939 |
    +--------+------------+-----+-----------+-----------+-----------+
    5 rows in set (0.00 sec)
    When I look at the latest trends, I see the grossly exaggerated values

    Code:
    mysql> select * from trends where itemid = 17631 order by clock desc limit 5;
    +--------+------------+-----+------------------+------------------+-------------------+
    | itemid | clock      | num | value_min        | value_avg        | value_max         |
    +--------+------------+-----+------------------+------------------+-------------------+
    |  17631 | 1177920000 |   5 | 29074763141.8485 | 31659416580.5771 |  35536204022.2222 |
    |  17631 | 1177916400 | 117 |  8197531095.2222 | 32206601441.0039 |  43611314467.7273 |
    |  17631 | 1177912800 | 121 | 21794439748.1818 | 33813439024.3971 | 159800714695.1667 |
    |  17631 | 1177909200 | 120 | 27386203382.8286 | 32353125326.6845 |  38343269614.5200 |
    |  17631 | 1177905600 | 120 | 24570471886.8462 | 32440411990.4769 |  47912528382.3500 |
    +--------+------------+-----+------------------+------------------+-------------------+
    5 rows in set (0.00 sec)
    Chart 2c shows the weekly traffic graphs just before the problem appeared

    Chart 2d is from an hour later when the problem occurs on the outgoing traffic

    Chart 2e is from a week later - you can see the network traffic showing obviously bad values (if only my network card got faster the longer I used it!)

    Chart 2b shows when the problem when it started occurring on the ingoing traffic

    Finally, Chart 2a shows the last month of traffic with my outgoing traffic being listed as almost 30GB/s


    What I am guessing is that numbers being returned to the zabbix server are overflowing the counter that is used to store them.

    I've verified that the zabbix client is returning the correct values, as in the big numbers that I am seeing from the ifconfig command (1,959,423,311,117).

    Has anyone seen this problem before, or does anyone have any suggestions on how to work around the problem? (yes, I could reboot the server, but I'd rather not since they are production servers).

    Thanks in advance for your help.

    - Paul
    Attached Files
    Last edited by pdwalker; 30-04-2007, 10:21.
  • pdwalker
    Senior Member
    • Dec 2005
    • 166

    #2
    *bump*

    Anyone seen this problem?

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Yes, this is a know issue. It won't be fixed in 1.1.x because a fix would require changes in database structure.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • pdwalker
        Senior Member
        • Dec 2005
        • 166

        #4
        Thanks for the update Alexei!

        Comment

        Working...