Ad Widget

Collapse

busy server agent impacts queue of zabbix server ?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • windsurf51
    Junior Member
    • Sep 2009
    • 20

    #1

    busy server agent impacts queue of zabbix server ?

    Hi !

    I have a problem with my zabbix configuration( only 20 servers in passive mode , with 1200 items )

    At 10:00pm a server running zabbix agent is very busy (database operation) and can not respond to zabbix server

    And i think it has an impact on other agents because queue is increasing and all others agents appears unreachable (with agent.ping.nodata(300))

    When this busy server is going back idle , queue decreases and all agents become reachable

    i don't understand how does Zabbix queue work , and what can i do to make it work better?


    Thx !
  • windsurf51
    Junior Member
    • Sep 2009
    • 20

    #2
    Same problem this week end , do you have any idea?

    Comment

    • MrKen
      Senior Member
      • Oct 2008
      • 652

      #3
      I would try increasing the number of Pollers/Trappers in the zabbix_server.conf

      Another thing to consider is the frequency of Housekeeper. If Housekeeper is configured to run only once or twice a day, and it just happens to coincide with the busy server operation.

      If all else fails - disable that host during that time.

      [Perhaps that busy server could do with more memory.]

      MrKen
      Disclaimer: All of the above is pure speculation.

      Comment

      • windsurf51
        Junior Member
        • Sep 2009
        • 20

        #4
        Thx MrKen

        i'm just trying to understand how does zabbix queue work with all clients

        what i can see is when a client is busy queue increase and all checks are waiting until client is back to normal load

        I don't know how zabbix server process use the queue , and distribute the checks for pollers (if it is waiting all checks for a client , to continue with the next client)

        I have a particular configuration because i use a timeout of 200 seconds ( i have some checks running with about 90 seconds )

        Maybe this is my problem , what do you think about that?

        Comment

        • MrKen
          Senior Member
          • Oct 2008
          • 652

          #5
          I could be wrong, but as I understand it the queue will increase due to many reasons, one of which is that there are insufficient trappers/pollers available to deal with the work load.

          Your timeout may be part of the problem, but I don't know.

          If I were you, I would have increased the pollers/trappers yesterday, so that today you could have struck that off the list of potential causes of the problem.
          Last edited by MrKen; 03-11-2009, 07:23. Reason: off has 2 F's
          Disclaimer: All of the above is pure speculation.

          Comment

          • windsurf51
            Junior Member
            • Sep 2009
            • 20

            #6
            Hi

            Thx for you reply

            I'll try to add some pollers on my server

            here you can see zabbix Server queue graph, with faulty client , and after 16:30 i've disabled this client

            Difference is pretty big !



            I will investigate on network part because client is slow but some checks take really too much time

            Comment

            • alixen
              Senior Member
              • Apr 2006
              • 474

              #7
              Hi,

              We have had this problem in the past and we have solved it by increasing StartPollers parameter in zabbix_server.conf.

              Although the manual states that you should keep this parameter as low as possible, we have increased it from 5 (default value) to 30.

              Hope this helps
              Alixen
              http://www.alixen.fr/zabbix.html

              Comment

              • windsurf51
                Junior Member
                • Sep 2009
                • 20

                #8
                Thx Alixen !!

                today i have 35 pollers for about 20 zabbix agents think that's enough for 1200 items and 550 triggers

                we found 2 anormal process this morning on the busy server and killed them

                Then i enabled monitoring again , and i'll control the queue this affternoon.....

                Comment

                • MrKen
                  Senior Member
                  • Oct 2008
                  • 652

                  #9
                  Originally posted by alixen

                  solved it by increasing StartPollers parameter in zabbix_server.conf.
                  That's strange. Didn't I say that? [2 times]

                  windsurf51 said: today i have 35 pollers for about 20 zabbix agents think that's enough for 1200 items and 550 triggers.

                  35 pollers should be enough. But, don't think of it as 20 hosts, 1200 items and 550 triggers. Think of it in terms of 'Required Server Performance', i.e how often the items are updating. For example, you have 2 hosts, each host has only one item. Host #1 updates every second. Host #2 updates only twice daily. Clearly, Host #1 has to work harder than Host #2.

                  MrKen
                  Disclaimer: All of the above is pure speculation.

                  Comment

                  • windsurf51
                    Junior Member
                    • Sep 2009
                    • 20

                    #10
                    Originally posted by MrKen
                    That's strange. Didn't I say that? [2 times]

                    MrKen
                    yes i think you did

                    Everything seems to work better , but i still don't know if pollers number modification or process kill has solved this problem

                    I know that pollers number depends of many parameters (like CPU, memory) but is there any formula to calculate this number (with refresh , item number , etc..) ?

                    thx

                    Comment

                    • Kerrygeek
                      Senior Member
                      • Dec 2008
                      • 115

                      #11
                      My GUI has been very slow responding so I tried changing my pollers from 15 to 35 as you guys have been suggesting. It made a BIG difference! When I click one of the links, especially for a screen or graph, it's much snappier. Here's what I've got:

                      Number of hosts (monitored/not monitored/templates) 186 128 / 13 / 45
                      Number of items (monitored/disabled/not supported) 16587 1636 / 14907 / 44
                      Number of triggers (enabled/disabled)[true/unknown/false] 492 275 / 217 [0 / 90 / 185]
                      Number of users (online) 18 2
                      Required server performance, new values per second 28.0187

                      I'm seeing a lot of things in the queue though, I don't know if it was this way before but I know I've seen some e-mail alerts coming through slowly lately. I'm even periodically seeing things show up in the 10 min column but I don't think it's correct because I don't see anything in the 1 min and 5 min columns. Is the queue screen accurate? If so, I'm seeing a lot of entries in the 5, 10, and 30 second columns, what do I need to do about that to speed up the throughput?

                      Thanks,
                      Kerry

                      Comment

                      • windsurf51
                        Junior Member
                        • Sep 2009
                        • 20

                        #12
                        Originally posted by Kerrygeek
                        My GUI has been very slow responding so I tried changing my pollers from 15 to 35 as you guys have been suggesting. It made a BIG difference! When I click one of the links, especially for a screen or graph, it's much snappier. Here's what I've got:

                        Number of hosts (monitored/not monitored/templates) 186 128 / 13 / 45
                        Number of items (monitored/disabled/not supported) 16587 1636 / 14907 / 44
                        Number of triggers (enabled/disabled)[true/unknown/false] 492 275 / 217 [0 / 90 / 185]
                        Number of users (online) 18 2
                        Required server performance, new values per second 28.0187

                        I'm seeing a lot of things in the queue though, I don't know if it was this way before but I know I've seen some e-mail alerts coming through slowly lately. I'm even periodically seeing things show up in the 10 min column but I don't think it's correct because I don't see anything in the 1 min and 5 min columns. Is the queue screen accurate? If so, I'm seeing a lot of entries in the 5, 10, and 30 second columns, what do I need to do about that to speed up the throughput?

                        Thanks,
                        Kerry

                        Hi kerry

                        Well i'm newbie in zabbix tuning , but i think 35 pollers is a small number for your configuration , i have 40 pollers for 20 agents (all in passive mode) , and it is just the right number

                        for your problem in queue , try to have a look on detail view to understand what item is waiting

                        Bye

                        Comment

                        • Kerrygeek
                          Senior Member
                          • Dec 2008
                          • 115

                          #13
                          Thanks for the info, I haven't really played with it much other than to make that change this morning and it already made a BIG difference. The pages load much more quickly now. I'm about to raise the number of pollers to 45, but how do I know when I've reached the optimum number? What should I be looking for? I'm a relative newbie also, I've been running it for about 8 months now and have had a pretty good experience with Zabbix but haven't done much tuning other than just setting it up.

                          Thanks,
                          Kerry

                          Comment

                          • Kerrygeek
                            Senior Member
                            • Dec 2008
                            • 115

                            #14
                            I know it's bad form to reply to your own post, but here goes... I found the reason I had so many things stuck in my queue - one of my guys put an acl on a 48 port switch so Zabbix couldn't see the switch anymore. All 48 ports plus several other parameters were unavailable to Zabbix. I disabled that host for now until we fix the ACL, now there's nothing in my queue. I guess I need to check it periodically, or maybe set up an alert to tell me my queue is getting out of hand.

                            Thanks for your help, maybe this will help somebody else.

                            Kerry

                            Comment

                            • johnw230873
                              Member
                              • Aug 2010
                              • 54

                              #15
                              Very old question I know but it could help with a problem I'm having now.

                              I thought that Zabbix would be smart enough not to effect other hosts if one host became unreachable but the last reply indicates this is not the case.

                              Can anyone else comment?

                              Comment

                              Working...