Ad Widget

Collapse

Why is StartAgents limited to 16?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • marcel
    Senior Member
    Zabbix Certified Specialist
    • Oct 2010
    • 112

    #1

    Why is StartAgents limited to 16?

    Is there any reason to limit the number of pre-forked agents to 16 in agentd standalone mode?

    I believe this limit is only checked in zbxconf.c

    I have compiled the agent with patched version of this file set the value to 32 and so far this seems to be working fine.

    Marcel
    Zabbix Certified Specialist for Large Environments since 12/2010
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #2
    Why on earth would you need more agents than that? The agents are usually millisecond responsive. Are you really polling with passive agent requests on the order of 64 requests per second?

    Far better to use Zabbix Agent(active) and reduce the number of StartAgents.

    Comment

    • marcel
      Senior Member
      Zabbix Certified Specialist
      • Oct 2010
      • 112

      #3
      I am actually polling much more data per second than 64...the isse is when one of the agents starts waiting for external script to finish it cannot be used by any other process - this happens with databases a lot - that is, running a SQL query might take much longer than 1 or 2 seconds and with more than 350 items per second, 16 pollers is way to little to rely on if few of them get stuk waiting for external process to finish the check (UserParameter)...

      I am actually not very sure how would that work with active agent...
      Zabbix Certified Specialist for Large Environments since 12/2010

      Comment

      • untergeek
        Senior Member
        Zabbix Certified Specialist
        • Jun 2009
        • 512

        #4
        That is what active agent was created for. You realize that if you're really running that many requests per second to a single agent that your server is running at least two, probably 3 queries to the zabbix database for EACH one of those requests? The active agent allows the agent to say to the zabbix server, "What should I be checking?" The server sends a list and then the agent manages the rest. Think of it as being more like a remote-controlled trapper—instead of requesting each thing individually, you batch it back to the agent and it sends them to the trapper pool rather than out->fetch->return as it is with the passive agent requests.

        Seriously, if you're taxing a single agent that heavily you should be using the active agent so your zabbix database doesn't get run into the ground.

        Comment

        • richlv
          Senior Member
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Oct 2005
          • 3112

          #5
          actually, active agent would not cope with that anyway. is it really 350 items _per second_ on a single agent ? that sounds a bit insane...
          if so still, try using zabbix_sender, connected to a named pipe or similar approach.

          edit : also, in the second post you talk about pollers instead - so is this about the agent or the server ?
          Zabbix 3.0 Network Monitoring book

          Comment

          • marcel
            Senior Member
            Zabbix Certified Specialist
            • Oct 2010
            • 112

            #6
            No, it's not 350 queries per agent, its 350 queries in total per second for all agents.

            I might be missusing the name "poller" as zabbix agent is in some terms also a poller, anyways - I am only talking about zabbix agent, not zabbix server although all the agents i am talking about are running on zabbix server machine.



            The issue which really is that
            1) it is OK for SQL query to take up to 15 seconds
            2) there is about 300 SQL queries per second

            If any of those SQL queries takes longer than 16/300 of a second it starts to occupy agent for too long to allow other queries to proceed, items will slowly become "unknown"

            yes, zabbix_sender is a way to go, however I am loosing many of the zabbix scheduling benefits and parallel threading - would be much simpler to run 256 zabbix agents and measure their occupancy via `pstree` (which I am actually already doing)
            Last edited by marcel; 25-02-2011, 11:07.
            Zabbix Certified Specialist for Large Environments since 12/2010

            Comment

            • richlv
              Senior Member
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Oct 2005
              • 3112

              #7
              ok, so you are using zabbix agents to run lots of sql queries to make it easier to manage than zabbix_senders... i'm not aware of reasons to limit startagents to 16 except maybe "nobody will need that many"
              maybe a zbxnext would be in order.
              Zabbix 3.0 Network Monitoring book

              Comment

              • untergeek
                Senior Member
                Zabbix Certified Specialist
                • Jun 2009
                • 512

                #8
                This may be rude, but I think that's a rather crazy way to monitor something. That's awfully taxing on a database by itself, but you're talking about having your Zabbix server query run 2 or 3 queries for each of these other queries.

                Does your Zabbix server run off of the same Oracle database?

                I'm intensely curious as well. What's so important that you need to run 350 queries per second and monitor the responses every second of the day?

                350*86400=30,240,000 rows inserted per day.

                How does housekeeping even keep up with this? You'd have to run partitioned tables (at least you would in our environment, and we're hooked up to a 128 core Sun server with 64G of RAM hooked up to a SAN).

                Comment

                • marcel
                  Senior Member
                  Zabbix Certified Specialist
                  • Oct 2010
                  • 112

                  #9
                  its several hundred oracle databases running on many different servers

                  zabbix database is growing at a pace of cca 3GB per day

                  running on RAID10 SAS 15k disks with MySQL 5.1 on 16core 64bit intel with 16gb of ram
                  Zabbix Certified Specialist for Large Environments since 12/2010

                  Comment

                  • untergeek
                    Senior Member
                    Zabbix Certified Specialist
                    • Jun 2009
                    • 512

                    #10
                    That amounts to 1.1TB per year.

                    How are you handling housekeeping, if you don't mind my asking? That is a mindboggling amount of data, not to mention the sheer volume in row-count.

                    Comment

                    • JBo
                      Senior Member
                      • Jan 2011
                      • 310

                      #11
                      Hi marcel,

                      If zabbix agent is only used on Zabbix server, why don't you use external checks instead of a UserParameter in agent ?

                      They would be collected directly by zabbix_server process, you would remove one layer of inter process communication and you can define more than 16 poller threads.

                      Regards,
                      JBo

                      Comment

                      • marcel
                        Senior Member
                        Zabbix Certified Specialist
                        • Oct 2010
                        • 112

                        #12
                        Hey JBo,
                        actually that is a valid question, I might have a proper look at external checks for this. Good point...Im so used to using UserParameters that I have actually forgotten about the external checks functionality alltogether.

                        untergeek,
                        no special handling for housekeeping is in place. 1.8.4 works quite well with this amount of data, however It took some time to fine-tune MySQL, Zabbix and the actual system.

                        I do have an Oracle DBA in my team as a backup solution if MySQL will stop handling the load.
                        Zabbix Certified Specialist for Large Environments since 12/2010

                        Comment

                        • untergeek
                          Senior Member
                          Zabbix Certified Specialist
                          • Jun 2009
                          • 512

                          #13
                          I appreciate if you consider the data private since you've spent so much time and effort on the setup, but I think that everyone here could benefit greatly if you would post the tweaks you put in your my.cnf and the number of pollers, etc. in your zabbix_server.conf. If you've tweaked the kernel or any sysctl variables, those would also be of interest.

                          Comment

                          • marcel
                            Senior Member
                            Zabbix Certified Specialist
                            • Oct 2010
                            • 112

                            #14
                            Server is runnig stock RHEL5.6, PHP compiled from source. MySQL from RHEL repos.

                            Code:
                            [mysqld]
                            datadir=/z001/
                            socket=/var/lib/mysql/mysql.sock
                            user=mysql
                            old_passwords=1
                            innodb_buffer_pool_size=7G
                            innodb_flush_log_at_trx_commit=2
                            skip-bdb
                            log-slow-queries
                            long_query_time=5
                            log-queries-not-using-indexes
                            thread_cache_size=775
                            max_connections=668
                            table_cache=1024
                            query_cache_size=32M
                            query_cache_limit=32M
                            join_buffer_size=3M
                            query_cache_min_res_unit=128
                            log-warnings=1
                            
                            [mysqld_safe]
                            log-error=/var/log/mysqld.log
                            pid-file=/var/run/mysqld/mysqld.pid
                            /dev/sdb1 on /z001 type ext4 (rw) - RAID10 (hwraid on sas 15k disks)

                            -rw-rw---- 1 mysql mysql 90G Mar 1 20:08 ibdata1

                            Code:
                            LogFile=/opt/zabbix/var/zabbix_server.log
                            LogFileSize = 500
                            DebugLevel = 3
                            PidFile=/opt/zabbix/tmp/zabbix_server.pid
                            DBHost=localhost
                            DBName=zabbix
                            DBUser=zabbix
                            DBPassword=screened
                            StartPollers=255
                            StartTrappers=30
                            StartPollersUnreachable=128
                            TrapperTimeout=5
                            StartPingers=32
                            StartHTTPPollers=32
                            StartDBSyncers=64
                            Timeout=15
                            AlertScriptsPath=/opt/zabbix/libexec
                            ExternalScripts=/opt/zabbix/libexec
                            FpingLocation=/usr/sbin/fping
                            TmpDir=/opt/zabbix/tmp
                            MaxHousekeeperDelete=10000
                            CacheSize=64M
                            TrendCacheSize=128M
                            HistoryTextCacheSize=256M

                            Any suggestions appreciated I am actually not very sure about the cache settings since the zabbix internal items are giving me strange results (or they are not correctly documented).
                            Last edited by marcel; 01-03-2011, 21:07.
                            Zabbix Certified Specialist for Large Environments since 12/2010

                            Comment

                            • richlv
                              Senior Member
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Oct 2005
                              • 3112

                              #15
                              well... starting with zabbix 1.8.5 you shouldn't need any modifications to the source, because max limit for StartAgents has been increased to 100

                              http://www.zabbix.com/documentation/...mits_increased
                              Zabbix 3.0 Network Monitoring book

                              Comment

                              Working...