Ad Widget

Collapse

1.1beta4 gaps in graphs

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • volkerjaenisch
    Junior Member
    • Dec 2005
    • 7

    #1

    1.1beta4 gaps in graphs

    Hi Zabbix users!

    We have strange behavior in graphs monitoring a cluster of several servers. Approx every hour a time span of
    5 minutes is not recorded.


    the paradox thing is that e.g. data for cpu load, network load are not recorded but data for mysql-params is recorded properly.



    This picture is taken from the zabbix-server (slave7) itself. There is no error in the zabbix-logs, the mysql, or system logs on the zabbix server. We logged the
    cpu-load seperatly. cpu-load rises in the gaps to a max of 2-3 but this can not be the reason for such a data loss.

    Any ideas welcome

    regards
    volker
  • pdwalker
    Senior Member
    • Dec 2005
    • 166

    #2
    how often are you sampling the data for each of the monitored things?

    Comment

    • volkerjaenisch
      Junior Member
      • Dec 2005
      • 7

      #3
      Originally posted by pdwalker
      how often are you sampling the data for each of the monitored things?
      All sapling rates are on their default values. Nothing shorter than 1/min.
      May be it's a database issue. We count 300 qps from which the most are inserts.
      But our zabbix server running the MySQL-DB is a 2 CPU XEON HT (running virtual 4 CPU) with 2.8 GHz and 1 GB RAM. This machine should have enough power to handle zabbix server and a MySQL-DB came whatever. There are no other jobs running on this machine _only_ zabbix and it's DB.

      May be it's the Hyperthreading?
      We have run zabbix (1.0) running in production one single cou machine with no performance problems at all. But this production system monitors only 4 and not 15 machines as in the current (problem) case.

      Best regards and a happy new year,

      Volker

      Comment

      • pdwalker
        Senior Member
        • Dec 2005
        • 166

        #4
        1/ recheck your sampling rates.

        I can see more than 1 peak and trough in your charts in any 2 minute interval on your processor load chart which tells me your sampling rate is much more than 1/minute

        2/ What version of zabbix are you using?

        3/ 300 QPS on mysql is nothing. A healthy system won't even notice it. That's not your problem. What version of mysql?

        4/ What is your mysql database type? Is it innodb? or myisam? or something else?

        5/ It's not hyperthreading. You should be using one of the newer 2.6 kernels rather than the 2.4 kernels though for best cpu and process performance.
        (but this is still not your problem)

        The holes in your processor load charts look like the holes I see when I sample the data at too high a rate. Either, I am losing values when it happens, or there is a problem with the chart drawing library when too much data is presented to it. (its easy enough to see if it is data loss by querying the data from the tables and seeing if the data is in the history table or if it is missing - I couldnt be bothered at the time, I just lowered my sampling rate)

        If your database is using the myisam type, it is possible that there is a periodic process that is locking the tables on that hourly basis. (Anyone know when and how often the housekeeping is done?). If you are using myisam tables, then any action on a table will lock the entire table. Convert the table format to innodb format and those locking issues go away.

        - Paul

        Comment

        • volkerjaenisch
          Junior Member
          • Dec 2005
          • 7

          #5
          Hello Paul!

          Thanks a lot for this discussion in detail.
          Originally posted by pdwalker
          1/ recheck your sampling rates.
          Sampling rate for cpu load is indeed 5/10/20 sec for cpuload1/5/15 min respectively.
          2/ What version of zabbix are you using?
          1.1b4 compiled from source on debian sarge gcc-3.3.5.

          3/ 300 QPS on mysql is nothing. A healthy system won't even notice it. That's not your problem. What version of mysql?
          4.0.24 stock debian.

          4/ What is your mysql database type? Is it innodb? or myisam? or something else?
          myisam

          5/ It's not hyperthreading. You should be using one of the newer 2.6 kernels rather than the 2.4 kernels though for best cpu and process performance.
          (but this is still not your problem)
          We recomment 2.6.12-smp stock debian kernels since we have had problems using older 2.6 kernels (but not with zabbix).

          The holes in your processor load charts look like the holes I see when I sample the data at too high a rate.
          "too high" is a relative measure. load5 is sampled at 10 seconds - the default rate. Is this "too high"? Unlikely.

          I couldnt be bothered at the time, I just lowered my sampling rate)
          To what sampling rate.

          If your database is using the myisam type, it is possible that there is a periodic process that is locking the tables on that hourly basis. (Anyone know when and how often the housekeeping is done?). If you are using myisam tables, then any action on a table will lock the entire table. Convert the table format to innodb format and those locking issues go away.
          This is a very good hint. I will check this. The gaps are not exact periodically - therefore a cronjob is unlikely.
          Will have a look at the MySQL Docu.

          Best regards,

          Volker

          Comment

          • volkerjaenisch
            Junior Member
            • Dec 2005
            • 7

            #6
            Originally posted by pdwalker
            whats is your MySQL Table type?
            Originally posted by volkerjaenisch
            myisam
            Sorry wrong guess. We already use InnoDB tables.
            So we now have a look at the sampling rate and wether the sampled data reaches the DB or not.

            Best regards,

            Volker

            Comment

            • pdwalker
              Senior Member
              • Dec 2005
              • 166

              #7
              humour me, and recheck your table status, and what format they are in (use database zabbix, show table status and eyeball each table to be sure)

              Change your cpu sampling rates to no more than 30 seconds. In fact, I would make it a minute.

              When I say the sampling rates are "too high" I mean that it appears to cause chart drawing problems (whether because it is of data loss or a bug in the chart handling libraries, I don't know as I have not bothered to track it down).

              In all the instances where I had the problem, reducing the sampling rate solved it.

              - Paul

              Comment

              • pdwalker
                Senior Member
                • Dec 2005
                • 166

                #8
                I just noticed this thread. Try that fix first and see if it corrects your problem to any degree.

                Last edited by pdwalker; 31-12-2005, 06:33.

                Comment

                • volkerjaenisch
                  Junior Member
                  • Dec 2005
                  • 7

                  #9
                  Originally posted by pdwalker
                  humour me, and recheck your table status, and what format they are in (use database zabbix, show table status and eyeball each table to be sure)
                  InnoDB at all tables confirmed.
                  Originally posted by pdwalker
                  Change your cpu sampling rates to no more than 30 seconds. In fact, I would make it a minute.
                  Will do so.

                  Originally posted by pdwalker
                  When I say the sampling rates are "too high" I mean that it appears to cause chart drawing problems (whether because it is of data loss or a bug in the chart handling libraries, I don't know as I have not bothered to track it down).
                  I will try this.

                  Originally posted by pdwalker
                  In all the instances where I had the problem, reducing the sampling rate solved it.
                  But why is the interesting question.

                  Is it the DB load
                  (learned yesterday that our zabix tablepace is 2.6 GB by now)?
                  Or a wrong SQL-statement? See your pointer below.

                  Best regards,
                  Volker

                  Comment

                  • volkerjaenisch
                    Junior Member
                    • Dec 2005
                    • 7

                    #10
                    Originally posted by pdwalker
                    I just noticed this thread. Try that fix first and see if it corrects your problem to any degree.

                    http://www.zabbix.com/forum/showthread.php?t=1364
                    I fount this thread independend of you yesterday.
                    I patched the code accordinly to Elkars finding and discovered that the gaps have been vanished, but only due to filling them with straight lines.
                    So this copes with the real problem but did not solve it.

                    But it indicates two things:
                    First the data may not reached the database. There's a real data gap. This shifts the focus on sampling rate.
                    [I've looked into the data but the timestamp clock is in ticks and it is unclear how they have to convert (UTC, MESZ) to localtime. It's not easy to find an answer.]
                    Has someone more familiar with the zabbix db scheme an SQL statement at hand.

                    Second it is not the fault of the drawing lib due to "too many points" since after the patch there are now more points to draw and they were drawn without problem.

                    Best regards,

                    Volker

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #11
                      The problem is obvously related to missing data in the database. I suspect that because of unreachable hosts, ZABBIX poller performs too many timeouts thus causing delays in data retrieval.

                      select from_unixtime(clock) from history where ...;
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      • pdwalker
                        Senior Member
                        • Dec 2005
                        • 166

                        #12
                        Originally posted by volkerjaenisch
                        Second it is not the fault of the drawing lib due to "too many points" since after the patch there are now more points to draw and they were drawn without problem.
                        Yeah, it was just a guess as one of the possibilities from someone who didnt bother to verify it. (I was leaving that for someone else to do

                        - Paul

                        Comment

                        Working...