Ad Widget

Collapse

serious performance difference between 1.1 and 1.0

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • roosit
    Junior Member
    • Apr 2006
    • 17

    #1

    serious performance difference between 1.1 and 1.0

    Again performance problems with a fresh install of 1.1, 18 hosts, two items per host (5min, 5s sample time) and 1 trigger per host.

    When starting the server the load rises from 0.02 to 1.79 (see chart)

    see for details previous post:


    What is wrong with this new development? Does anyone have the same problems?
    Attached Files
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    I do not see nothing wrong here. Three minutes on CPU say nothing about overall performance hit. I believe it can be impact of housekeeper. Can I see a comparison chart for an hour at least?
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • roosit
      Junior Member
      • Apr 2006
      • 17

      #3
      see measurement2.png and measurement1.png from previous post for 1 hour chart.
      (http://www.zabbix.com/forum/showthre...us+performance)

      I do not think this has something to do with housekeeping i have set this to once a day with HousekeepingFrequency=24.
      Furthermore the database is 580kB.

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        The housekeeping starts immediately after startup of the server. If it is set to 24h, it will run next time after 24h after startup.
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • roosit
          Junior Member
          • Apr 2006
          • 17

          #5
          I have let it monitor the whole day. When stopping the server i looked at the load of the server which went down from 1.2 to 0.01

          Furthermore i hope you are not suggesting that housekeeping on a db of 600kB will utilize the processor for 100% is normal behaviour.


          ===============
          [root@ root]# ssh idns2 "/etc/init.d/zabbix_server stop"

          [root@db1 root]# uptime
          00:37:52 up 208 days, 8:27, 1 user, load average: 1.06, 1.24, 1.22
          [root@db1 root]# uptime
          00:37:54 up 208 days, 8:27, 1 user, load average: 1.06, 1.24, 1.22
          [root@db1 root]# uptime
          00:37:55 up 208 days, 8:27, 1 user, load average: 0.98, 1.22, 1.21
          [root@db1 root]# uptime
          00:37:56 up 208 days, 8:27, 1 user, load average: 0.98, 1.22, 1.21
          [root@db1 root]# uptime
          00:37:57 up 208 days, 8:27, 1 user, load average: 0.98, 1.22, 1.21
          [root@db1 root]# uptime
          00:37:58 up 208 days, 8:27, 1 user, load average: 0.98, 1.22, 1.21
          [root@db1 root]# uptime
          00:37:59 up 208 days, 8:27, 1 user, load average: 0.98, 1.22, 1.21
          [root@db1 root]# uptime
          00:38:00 up 208 days, 8:27, 1 user, load average: 0.98, 1.22, 1.21
          [root@db1 root]# uptime
          00:38:04 up 208 days, 8:27, 1 user, load average: 0.90, 1.20, 1.20
          [root@db1 root]# uptime
          00:38:09 up 208 days, 8:27, 1 user, load average: 0.83, 1.18, 1.20
          [root@db1 root]# uptime
          00:38:13 up 208 days, 8:27, 1 user, load average: 0.76, 1.16, 1.19
          [root@db1 root]# uptime
          00:38:15 up 208 days, 8:27, 1 user, load average: 0.76, 1.16, 1.19
          [root@db1 root]# uptime
          00:38:17 up 208 days, 8:27, 1 user, load average: 0.70, 1.14, 1.18
          [root@db1 root]# uptime
          00:38:24 up 208 days, 8:27, 1 user, load average: 0.64, 1.12, 1.18
          [root@db1 root]#
          [root@db1 root]# uptime
          00:38:26 up 208 days, 8:27, 1 user, load average: 0.59, 1.10, 1.17
          [root@db1 root]# uptime
          00:38:29 up 208 days, 8:27, 1 user, load average: 0.59, 1.10, 1.17
          [root@db1 root]# uptime
          00:38:31 up 208 days, 8:28, 1 user, load average: 0.54, 1.08, 1.16
          [root@db1 root]# uptime
          00:38:34 up 208 days, 8:28, 1 user, load average: 0.54, 1.08, 1.16
          [root@db1 root]# uptime
          00:38:43 up 208 days, 8:28, 1 user, load average: 0.46, 1.04, 1.15
          [root@db1 root]# uptime
          00:38:50 up 208 days, 8:28, 1 user, load average: 0.42, 1.03, 1.14
          [root@db1 root]# uptime
          00:38:59 up 208 days, 8:28, 1 user, load average: 0.36, 0.99, 1.13
          [root@db1 root]# uptime
          00:39:05 up 208 days, 8:28, 1 user, load average: 0.30, 0.96, 1.12
          [root@db1 root]# uptime
          00:39:44 up 208 days, 8:29, 1 user, load average: 0.17, 0.85, 1.07
          [root@db1 root]# uptime
          00:40:11 up 208 days, 8:29, 1 user, load average: 0.10, 0.77, 1.04
          [root@db1 root]# uptime
          00:40:19 up 208 days, 8:29, 1 user, load average: 0.09, 0.76, 1.03
          [root@db1 root]# uptime
          00:40:24 up 208 days, 8:29, 1 user, load average: 0.08, 0.74, 1.03
          [root@db1 root]# uptime
          00:40:25 up 208 days, 8:29, 1 user, load average: 0.08, 0.73, 1.02
          [root@db1 root]# uptime
          00:40:55 up 208 days, 8:30, 1 user, load average: 0.05, 0.67, 0.99
          [root@db1 root]# uptime
          00:42:38 up 208 days, 8:32, 1 user, load average: 0.01, 0.47, 0.88

          PS could you set attachment size higher so i dont have resize/scale images produced by zabbix.
          Attached Files

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            Originally posted by roosit
            PS could you set attachment size higher so i dont have resize/scale images produced by zabbix.
            Done. Now it is 128KB with maximum dimension of 800x600 pixels.
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            • roosit
              Junior Member
              • Apr 2006
              • 17

              #7
              Are you waiting for me to supply more information? Are you able to recreate this problem in a test environment of your own?


              Regards,
              Marc

              Comment

              • Alexei
                Founder, CEO
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Sep 2004
                • 5654

                #8
                [root@db1 root]# uptime
                00:42:38 up 208 days, 8:32, 1 user, load average: 0.01, 0.47, 0.88
                I do not understand how processor load 0.01 can be treated as "serious performance problem".
                Alexei Vladishev
                Creator of Zabbix, Product manager
                New York | Tokyo | Riga
                My Twitter

                Comment

                • Nate Bell
                  Senior Member
                  • Feb 2005
                  • 141

                  #9
                  rootsit, processor load isn't the same thing as CPU load. I used to monitor processor load, and it would regularly hit 3.0 or 4.0 on machines that get heavy use during the day (or during backups at night). A 1.0 processor load is not equal to 100% CPU load. I haven't internalized what processor load is actually measuring but here's a quick definition:
                  In short it is the average sum of the number of processes waiting in the run-queue plus the number currently executing over 1, 5, and 15 minute time periods.
                  So it's looking more at how many processes your system has open and running. As far as I know, this is related to how taxed a CPU is, but it's not the same thing. If you really want to monitor how heavily your CPU(s) are being used, then monitor CPU idle, system, user, nice, etc. percentages. If your idle stays below some threshhold for a while, then you have a trigger that is more meaningful.

                  My copy of Zabbix 1.1 stable is running just fine with minimal dips in CPU idle. Unless you're experiencing real performance loses on the server, it sounds like you're chasing after a problem that doesn't exist.

                  Nate

                  Comment

                  • just2blue4u
                    Senior Member
                    • Apr 2006
                    • 347

                    #10
                    Try watching output of "sar" and/or "top"... they show the CPU load in %. I think that values are more relyable.

                    I have: "load average: 0.21, 0.21, 0.17 | Cpu(s): 3.3% us, 0.5% sy, 0.0% ni, 95.5% id, 0.3% wa, 0.2% hi, 0.2% si "
                    while running kde, zabbix server and zabbix client on a 2.6Ghz with 1024MB RAM...
                    Big ZABBIX is watching you!
                    (... and my 48 hosts, 4513 items, 1280 triggers via zabbix v1.6 on CentOS 5.0)

                    Comment

                    • raminix
                      Member
                      • Jun 2006
                      • 37

                      #11
                      Processor load, load avg, et al

                      Processor load, from what I have determined as far as Zabbix is concerned, is the same thing as load average, and is NOT the same thing as CPU utilization.

                      Load average takes CPU utilization, disk I/O, and memory usage (if I'm not mistaken) into account. All it is a measure of is how fast processes are being executed. If you have a load average lower than 1.0, then processes are executing as soon as they request CPU time. If your load average is right at 1.0, then the pipeline is staying full of processes but none are having to wait for CPU time. When you have a load average over 1.0, then you start having a logjam of sorts. You have processes backing up in the queue waiting to execute. Now is when you want to look at CPU utilization. If it's being pegged in either the user or system areas, then the CPU is the limiting factor. (If system, take a look at your kernel.) If you CPU utilization is low, look at memory. Do you have a lot of swap in use? No free memory? If both CPU and memory look good, is your hard disk thrashing? In my experience, most high load averages result from inadequate disk I/O. I had to do some major performance tuning on my db server in order to get Zabbix working properly as it had major disk I/O bottlenecks.

                      Comment

                      • roosit
                        Junior Member
                        • Apr 2006
                        • 17

                        #12
                        To Alexei
                        ... I do not understand how processor load 0.01 can be treated as "serious performance problem"...

                        You have not read my post! When i shutdown the zabbix server the load on the db goes from 1.06 to 0.01. Thats a 94% drop (in 3 minutes)!!!


                        To Nate Bell
                        .. processor load isn't the same thing as CPU load...
                        I think you mean the "load average"
                        Actually it does not matter what it measures, i only want to emphasize that a db server has a drop of the load of 94% when i shutdown zabbix server process (which is located on another server)

                        To raminix
                        This is a 2.4GHz, 1GB server, 36GB scsi, swap in use 7MB. I dont think people should look at terms as load overage and processor load.
                        Again what is important, is the 94% difference in load when only the zabbix 1.1 server process is disabled. And this was never an issue in 1.0.

                        What ever this difference causes (disk io, openfiles, processes, system calls) is not important.

                        Also remarks about having enough system resources should not be made. If there is some kind of bug, i think it should be traced, and not disregarded because everyone has enough processing capacity on their server (than i would be running windows)
                        Last edited by roosit; 16-07-2006, 11:03.

                        Comment

                        • raminix
                          Member
                          • Jun 2006
                          • 37

                          #13
                          Honestly, the stats you mention don't sound out of line. Have you looked at the stats on your database server as far as QPS, query cache hit rates, sorts, etc? In my observation, Zabbix is extremely db heavy due to the manner in which it handles history and queuing of items. I had to tone down the amount of history I was saving and rely more on trends, as well as get creative with the delays on my checks (see the prime number thread on here) to distribute the load on my db server. I was initially seeing a load avg of 12+ and added a second CPU which got it down to 4-6. Still not great, but I have to make do with what I have. The system is a dual P4 Xeon 3.06GHz, 4GB RAM, and at the most hits 1-2 MB of swap. It's been tuned heavily for InnoDB as the ONLY thing this db server does is handle Zabbix queries. Also, are you running Zabbix on the same server as the db? I had to split them off because the load was killing graphing. My actual Zabbix server after the split hit's .4-.5 at the most oin load avg, and runs flawlessly. Even with the 4-6 load on the db server, I rarely have anything get past the 30 second mark in my queue. Have you done any optimizations to the db yet? Below is are the stats on what mine is monitoring (for comparison):

                          Values stored: 124357812
                          Trends stored: 1625962
                          Hosts (m/n/t/d): 204(189/3/12/0)
                          Items (m/d/n)[t]: 13235(10254/464/2517)[0]
                          Triggers (e/d)[t/u/f]: 1515(1040/475)[1/519/520]
                          Number of alarms: 12375
                          Number of alerts: 302
                          Last edited by raminix; 17-07-2006, 18:56.

                          Comment

                          • Alexei
                            Founder, CEO
                            Zabbix Certified Trainer
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Sep 2004
                            • 5654

                            #14
                            Originally posted by roosit
                            To Alexei
                            ... I do not understand how processor load 0.01 can be treated as "serious performance problem"...

                            You have not read my post! When i shutdown the zabbix server the load on the db goes from 1.06 to 0.01. Thats a 94% drop (in 3 minutes)!!!
                            Sorry, I haven't noticed you stopped the server.

                            Anyway I hardly believe monitoring of 18 hosts, 2 items each, may generate CPU load >1. I expected something close to 0.01-0.02 where most of the CPU usage is by MySQL not ZABBIX server. Handling of 50 SQL statements per second is nothing on Xeon.

                            Perhaps your database is corrupted or doesn't have all required indexes? I just do not see other explanation.
                            Alexei Vladishev
                            Creator of Zabbix, Product manager
                            New York | Tokyo | Riga
                            My Twitter

                            Comment

                            Working...