Ad Widget

Collapse

Wrong numbers in reports

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cioris
    Member
    • Oct 2008
    • 30

    #1

    Wrong numbers in reports

    Hi all,

    Did you see any strange numbers in availability reports like numbers more than 100%? I see this kind of problems quite often. I tried to debug the problem and it looks like we have a problem in the database itself (or in some quesris we run for reports or for event details).

    Here is the problem I saw:

    When a report is ran or we display the details about a specific event, there is a query to detect what are the events related to the report period (or the last 20 events in the event log page). After this there is computation (calculate_availability() in triggers.php) that puts together the time spent in TRUE, FALSE and UNKNOWN state. Everything seems to be fine except that the algorithm assumes that eventid and clock run together. In other words we cannot have an event w/ a bigger eventid that happened earlier than the previous event: e.g.

    event 1 --> id evid1 --> time t1
    event 2 --> id evid2 --> time t2

    Algorithm assumes that if evid2 > evid1 than t2 MUST BE > t1.

    In my case, for some reason I cannot explain, the time is inverted. I know this cannot be true as the eventid is incremented in time, so the assumption above seems to be logic.

    here are the numbers pulled off my database:

    Time | Status | Duration | Age | Ack | Actions
    2009.Jul.05 10:45:41 | OK | 2w 1d 4h | 2w 1d 4h | - | -
    2009.Jul.05 10:47:54 | PROBLEM | 2m 13s | 2w 1d 4h | - | -
    2009.Jun.17 00:47:22 | OK | 2w 4d 10h | 4w 5d 14h | - | -
    2009.Jun.17 00:52:21 | PROBLEM | 4m 59s | 4w 5d 14h | - | -


    if you link the time w/ duration you will see in fact the fisrt problem appeard at 00:47:22 and it went away at 0052:21 w/ a total duration of 4m and 59s. For some reason the time is "switched" between the 2 consecutive events.

    The duatation is computed ok, which meas that some info is properly strored, but there seems to be a "link" issue in the database causing the query to "fail".

    Any ideas how to fix this issue?

    Thanks.
  • cioris
    Member
    • Oct 2008
    • 30

    #2
    more clear

    to be more clear:
    - for an unknown reason, in the event table we can have 2 records w/ eventid1> eventid2 but clock1 < clock2.

    Does anybody have any idea why is this happening?

    This "inversion" generate wrong values in reports.

    Thanks

    Comment

    • richlv
      Senior Member
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Oct 2005
      • 3112

      #3
      hmm. i just spotted _negative_ time in my availability report, which might be caused by the same reason, so add that as a "me too".
      Zabbix 3.0 Network Monitoring book

      Comment

      • cioris
        Member
        • Oct 2008
        • 30

        #4
        We have to look in the code and see why zabbix server adds events in "wrong order". My supposition is that in case one message is delayed (queued) it is possible to have a time inversion. I don't know how to correct but I hope somebody can answer. I sure everybody has this issue, but is difficult to be detected because nobody is spending time to mannualy compute the availability and compar the result w/ the number reported in Zabbix. The problem becomes obvious when we get negative numbers.

        So for now I just modified the zbx_date2age() function in func.php.inc. In line 200 I replaced the abs function with "-". This should be ok in case the data is properly recorded into the databse. In case of time inversion, I can detect the wrong record because in the event list the duration will be a negative number. After this I use my "brain" and manually update the data base w/ the proper time values (flip the clock field between the 2 consecutive events).

        This manul procedure is ok for 1 maybe 2 machines, but it is impossiblr for large sites. And you have to understand that the messages can be delayed exactly when yo monitor big sits w/ a lot of machines.

        Let me know if sombody has a fix.

        Thanks

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          my issue with negative percentages has been fixed for 1.6 branch in revision 7723 (didn't notice patch for trunk, 1.6 branch issue only ?).
          the change seems to add some more ordering when retrieving data for calculation.
          i'd suggest you to look at the change & test it in your environment.
          Zabbix 3.0 Network Monitoring book

          Comment

          Working...