Ad Widget

Collapse

1.4.1 graph shows gaps

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • BusteR81
    Senior Member
    • Apr 2007
    • 150

    #1

    1.4.1 graph shows gaps

    After upgrading from 1.4 to 1.4.1 while keeping zabbix_agentd 1.4 on monitored hosts:

    Zabbix server seems to stop grabbing data during these gaps for "some" triggers (pls refer to image). When accessing "500 last graph values" , data are missing during these period of gap times so this would explain "due to missing data, graphs are showing gaps".

    my conclusion: zabbix server 1.4.1 is at fault because the gaps from these 2 graphs happened at the SAME period of time !

    The "funny" thing is other graphs are working fine except for these few graphs show gaps . All these gaps appeared with 1.4.1 and NOT in 1.4

    So what is happening?

    My system, CentOS, Mysql 4.x.x
    Attached Files
  • BusteR81
    Senior Member
    • Apr 2007
    • 150

    #2
    updates on this problem

    after awhile, i discovered that this "no data = no graph" problem occurs hourly. pls see attached image.

    item used: system.cpu.util[,system,avg1] 10(update interval) 7(History(
    trigger used: {HOST_1:system.cpu.util[,system,avg1].avg(300)}>90

    500 last values: Red denotes missing data

    2007.Jul.04 16:48:03 38.9667
    2007.Jul.04 16:47:50 37.1667
    2007.Jul.04 16:47:31 37.9833
    2007.Jul.04 16:43:51 41.3333

    2007.Jul.04 16:41:55 37.2500
    2007.Jul.04 16:41:46 36.9000
    2007.Jul.04 16:41:36 38.5833
    2007.Jul.04 16:41:25 38.7167
    2007.Jul.04 16:41:15 37.9833
    2007.Jul.04 16:41:05 35.0500
    2007.Jul.04 16:40:55 35.8500
    2007.Jul.04 16:40:46 36.6667
    2007.Jul.04 16:40:36 37.8833
    Attached Files

    Comment

    • StanZoid
      Member
      • Oct 2005
      • 47

      #3
      I am having the same problem in my system. It seems to be caused by the 1.4.1 housekeeper process taking huge chunks of cpu and disk resource every hour, enough that the data recording process can't compete and gets skipped. I also see locking on the history tables during the housekeeper runs.

      I understand that a problem with the housekeeper was fixed in 1.4.1, and that it is now actually deleting data. I have seen the resource drain drop over a couple of days, and I see from the logs that the number of deleted records is dropping slowly, so I hope that the housekeeper is simply finishing the job it started out to do.

      StanZoid

      Comment

      • robertsonstudios
        Junior Member
        • Jul 2007
        • 10

        #4
        I am seeing this as well... Any suggestions on how to control how much CPU, disk, memory the housekeeping is allowed to consume?

        --edit/add--
        I have now disabled housekeeping for the time being, it is a relatively new install and I am keeping everything for at least 90 days.
        Last edited by robertsonstudios; 03-08-2007, 02:59.

        Comment

        • vikty
          Senior Member
          • Jul 2007
          • 104

          #5
          Hi,

          I have disabled the housekeeper, but in any machines some time the grafic have any gaps

          I have thought to the packets lost... (it is possible for any of these machines), but I have seen that zabbix work with tcp... (there is the retrasmission of the segments)

          But the most strange thing is that one of these machines is linked with a switch directly to zabbix-machine...
          I don't know what to think......

          Have anyone any idea??
          Attached Files
          Last edited by vikty; 26-09-2007, 15:17.

          Comment

          • BusteR81
            Senior Member
            • Apr 2007
            • 150

            #6
            hi

            ok after upgrading to 1.4.2, my graph problems had gone away. the reasons i deduce is from the initial (started) housekeeping service. 'somehow' housekeeper delete history data bit by bit EVEN when u have a HUGE chunck of data, it dun just delete the HUGE chunck of data in 1 shot even all of them are back-dated.

            my advise to you is to wait few days 4-7 and see how it goes. also like others who have commented, it might be due to high CPU utilization on local zabbix_server OR even high bandwidth utilization on server and / or client blocking the data to be sent in / out.

            Check tail -f /tmp/zabbix_server.log on server AND tail -f /tmp/zabbix_agentd.log on client


            Cheers

            Comment

            • vikty
              Senior Member
              • Jul 2007
              • 104

              #7
              If I send a lot of request to zabbix-agent in way that all zabbix-agent will be busy in a single macchine and the total process number is very high,

              Could I lost any data??

              Is it a possible cause of the graphic gaps??

              Comment

              • BusteR81
                Senior Member
                • Apr 2007
                • 150

                #8
                check the timeouts

                there are 2 timeout settingS in respective server.conf and agentd.conf... my guess is that if the CPU utilization if unable to catch up with the server requests, the timeout (in sec) will take effect and waz u get will be "NOT Supported" on your items simply coz server and/or agentd fail to get the required monitored data within timeout period.

                Comment

                • tobiasly
                  Junior Member
                  • Nov 2007
                  • 7

                  #9
                  This issue is still occurring with me for 1.4.2. Some of my graphs would only ever show history for about an hour. The CPU Load graph showed load1 just fine, but load5 and load15 kept getting wiped out after an hour.

                  I checked the housekeeping table and there were indeed numerous entries there, even though I'd only been using Zabbix for a couple days and nothing should have needed housekeeping yet.

                  I deleted the entries and the issue went away. The only thing I can think of is that I had exported, deleted, then imported my templates (I wanted to do some mass updates to them which weren't available in the UI) and maybe the entries got put there but weren't cleaned out.

                  Anyhoo, it's working now after the manual cleanup of the table, just posting this in case anyone else is facing the same issue.

                  Comment

                  • araw
                    Member
                    • May 2005
                    • 31

                    #10
                    Not sure if anyone mentioned whether their system was virtualised, but that patchy graph issue is a common problem when the system clock is slipping and being pullled forward again by something else (ntpd, ntpdate run from cron, virtualisation watchdog/s, etc) pulling the clock forward.

                    This has been my experience with VMWare ESX/Vi3 anyway, workstation and server seem to be pretty reliable out of the box. VMware has a big in depth doc that goes into virtual time keeping and recommended steps to fix it.

                    But either way double check your systems, even if it's physical hardware, and make sure your clock is keeping time correctly.

                    2 cents.

                    Comment

                    • tobiasly
                      Junior Member
                      • Nov 2007
                      • 7

                      #11
                      It sounds like my problem may be different from the problems with the gaps; I am also on a virtualized system (OpenVz) but the clock is not the issue with mine; it would simply lose any data that was over an hour old.

                      The clock didn't seem to drift much if at all, but even some graphs with multiple items on them (such as load average) would be fine for one item such as load1 while load5 was gone so I don't think clock drift could have been the culprit.

                      Comment

                      Working...