Ad Widget

Collapse

Zabbix Two Hour Database Load

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rsvancara
    Member
    • Jul 2012
    • 42

    #1

    Zabbix Two Hour Database Load

    We are observing an odd, but repeating trend in zabbix with regards to "100% busy pollers" every two hours, exactly on the hour. During this period, Postgresql is very busy, IOWait is high and same with CPU Load. It seems that when the load increases on the database, the pollers can not keep up and we end up with areas of no or missing data in our monitoring data which creates a highly undesirable situation for monitoring our large enterprise network. Additionally, We observe this trend on three separate instances of zabbix, one instance is using MySQL, and the other two are using postgresql. We observe this trend even when housekeeping is turned off. Zabbix is the only software application that talks to each database, so there is not any contention from any other application.

    I am wondering if anyone in the community is knowledgeable enough about the zabbix internals that can tell me why we are seeing this trend exactly every two hours.

    We see these trends starting at 2:00 A.M, 4:00 A.M. 6:00 A.M. etc.



    Thanks,

    Randall
    Last edited by rsvancara; 14-08-2012, 01:21.
  • rsvancara
    Member
    • Jul 2012
    • 42

    #2
    Here is an image of our busy pollers graph.

    See attached pollers graph. Why does this occur every two hours, on the hour exactly?

    We are using zabbix 2.0.1. Our server has 96GB of RAM and 12 CPU cores. Our disk array has 6 - 300 GB, 15K RPM Drives configured in a RAID 1+0.
    Attached Files

    Comment

    • rsvancara
      Member
      • Jul 2012
      • 42

      #3
      Additional information

      Hosts Monitored: 3728
      Items Monitored: 200228+
      Number of Triggers: 17546
      Required Server Performance: 2456

      Comment

      • richlv
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Oct 2005
        • 3112

        #4
        could you please attach both busy graphs (internal & data gathering) and cache usage graph ?
        Zabbix 3.0 Network Monitoring book

        Comment

        • rsvancara
          Member
          • Jul 2012
          • 42

          #5
          Busy Graphs

          Here you go
          Attached Files
          Last edited by rsvancara; 14-08-2012, 19:03.

          Comment

          • Colttt
            Senior Member
            Zabbix Certified Specialist
            • Mar 2009
            • 878

            #6
            hmm maybe do you use housekeeper??
            Debian-User

            Sorry for my bad english

            Comment

            • rsvancara
              Member
              • Jul 2012
              • 42

              #7
              Housekeeper

              Yes, housekeeper runs ever 1 hour. I tried setting it to every four hours and still observed the two spikes in 100% poller busy.

              Could there be something in terms of an item in a template that runs against the zabbix server every two hours. I looked, but I did not find anything, but maybe i missed something.

              Comment

              • Colttt
                Senior Member
                Zabbix Certified Specialist
                • Mar 2009
                • 878

                #8
                if you use postgresql, it is recommend that you run housekeeper only 24hours..

                in this two hours maybe you can use mytop or ptop (it looks like top for mysql/pgsql)..

                why you dont use partition for the tables?
                Debian-User

                Sorry for my bad english

                Comment

                • ghoz
                  Senior Member
                  • May 2011
                  • 204

                  #9
                  I think you have a task running on your hosts or network that makes the pollers take more time than usual to get their values from the agent every two hour.

                  The way zabbix works, each poller process gets the values one by one sequentially, in case of network or agent slowdown, the pollers will spend more time waiting for each value, and won't be able to keep up.

                  I think 75% busy for pollers and pingers is way too high to cope with even the slightest slow donw.

                  You should increase the number of pollers ...

                  Comment

                  • rsvancara
                    Member
                    • Jul 2012
                    • 42

                    #10
                    Increase Pollers

                    I can bump up the pollers, that is not a problem. I would not be surprised if it is something with our network, as all the test systems are monitoring hosts on the same network.

                    Comment

                    • rsvancara
                      Member
                      • Jul 2012
                      • 42

                      #11
                      Partition Tables

                      I am considering partitioning the tables. I have a few questions about it though.

                      1. Does it work with 2.0.1+ of Zabbix?
                      2. Do I still run the housekeeper or do I drop "partitions" older than some interval of time?
                      3. Given the amount of data we collect, a table per day seems reasonable. Is this correct?

                      I have set up a development zabbix instance where I am attempting to partition the tables per several documentation sources that I found:

                      Join the friendly and open Zabbix community on our forums and social media platforms.

                      http://www.zabbix.com/wiki/howto/db/postgres/partition

                      Not sure which one is the best one to follow, but I guess I will find out!!

                      Thanks,

                      Randall

                      Comment

                      • ghoz
                        Senior Member
                        • May 2011
                        • 204

                        #12
                        I know partitionning won't work as is in 2.x with mysql because of the new primary keys. I haven't seen a report for postgresql yet...

                        As for the documentation, the wiki on zabbix.com is older and is being more or less migrated to the .org AFAIK
                        Last edited by ghoz; 17-08-2012, 09:31.

                        Comment

                        • Colttt
                          Senior Member
                          Zabbix Certified Specialist
                          • Mar 2009
                          • 878

                          #13
                          the partition example on http://www.zabbix.org/wiki/Docs/howt...oning_by_range dont work!! please use http://www.zabbix.com/wiki/howto/db/postgres/partition !!

                          in my test envirement zabbix 2.0.2 works fine with postgresql and partitions.. (i installed by my own install-script i will publish it in a few days)

                          you must turn off housekeeper when you use partitions! maybe take a look at this: http://www.slideshare.net/xsbr/alexe...formancetuning

                          after this you can delete the oldest table-partition manually or via cron-job..

                          to 3) how big is you database and how long do you keep up you datas in you databse?
                          Debian-User

                          Sorry for my bad english

                          Comment

                          • rsvancara
                            Member
                            • Jul 2012
                            • 42

                            #14
                            Data Retention

                            Right now most items are using the default, 7 days history and 365 days for trends.

                            With partitioning, my understanding is that you turn off housekeeper and simply remove the partitions older than a certain period of time. Is this safe to do, how about trends? Is this what DBSyncers do?

                            Thanks,

                            Randall

                            Comment

                            • richlv
                              Senior Member
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Oct 2005
                              • 3112

                              #15
                              don't do partitioning with that amount of nvps. it is really a low one. better find out the actual bottleneck
                              Zabbix 3.0 Network Monitoring book

                              Comment

                              Working...