Ad Widget

Collapse

Throughput limit on data sender process

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dmcken
    Junior Member
    • May 2013
    • 12

    #1

    Throughput limit on data sender process

    Good Day,

    We have 10 proxies pointing to a single central server (v2.4.5). As far as I can tell the data sender process is what sends data from the proxy to the server trapper processes which then send it to the server's database.

    There seems to be some limit past which the data sender can no longer keep up with the vps rate and data collected by that proxy starts to go missing or lag behind on graphs.

    Any suggestions as to how best to deal with such an issue? Most of the proxies are very quiet CPU wise so I was hoping to have each one take on more polling per proxy and thereby need less proxies. At one location we are running 3 proxies just to deal with this issue. Given that our overall vps is 2434, I'm a bit concerned about how well this solution will scale.

    Signed
    David McKen
  • kloczek
    Senior Member
    • Jun 2006
    • 1771

    #2
    Originally posted by dmcken
    Good Day,

    We have 10 proxies pointing to a single central server (v2.4.5). As far as I can tell the data sender process is what sends data from the proxy to the server trapper processes which then send it to the server's database.

    There seems to be some limit past which the data sender can no longer keep up with the vps rate and data collected by that proxy starts to go missing or lag behind on graphs.

    Any suggestions as to how best to deal with such an issue? Most of the proxies are very quiet CPU wise so I was hoping to have each one take on more polling per proxy and thereby need less proxies. At one location we are running 3 proxies just to deal with this issue. Given that our overall vps is 2434, I'm a bit concerned about how well this solution will scale.

    Signed
    David McKen
    Qs:
    1) how many vps have totally those three proxies?
    2) are those lags had happen constantly or only time to time? If it is only time to time observed effect may be caused by network issues.
    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
    https://kloczek.wordpress.com/
    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
    My zabbix templates https://github.com/kloczek/zabbix-templates

    Comment

    • dmcken
      Junior Member
      • May 2013
      • 12

      #3
      Good Day kloczek,

      1) Currently looking at two of them one has 169.39 the other has 431.86 vps.
      2) It is constant, the only variation is how far behind it is which doesn't vary much.

      Comment

      • dmcken
        Junior Member
        • May 2013
        • 12

        #4
        Just to add some extra information.

        We have lots of data in the proxy_history table that just keeps growing, which seems to indicate that the pollers are doing their job but the data is just not reaching the central server.

        Comment

        • dmcken
          Junior Member
          • May 2013
          • 12

          #5
          Looking at the process list I'm seeing:

          1305 ? S 101:07 /usr/sbin/zabbix_proxy: data sender [sent 619032 values in 17635.703273 sec, sending data]

          Should it be even trying to send so many in one batch?

          Looking at the database proxy_history still has 728923 entries left, is there any way to activate multiple data senders?

          Comment

          • kloczek
            Senior Member
            • Jun 2006
            • 1771

            #6
            Originally posted by dmcken
            Looking at the process list I'm seeing:

            1305 ? S 101:07 /usr/sbin/zabbix_proxy: data sender [sent 619032 values in 17635.703273 sec, sending data]

            Should it be even trying to send so many in one batch?

            Looking at the database proxy_history still has 728923 entries left, is there any way to activate multiple data senders?
            With few hundredths nvps few tenths values send in singly batch suggest that you changed frequency of sending data to server.

            Did you changed DataSenderFrequency in zabbix proxy settings?
            Default is DataSenderFrequency=1 so proxy it means that if your proxy is using active proxy settings proxy is trying to push data every second.

            Proxy is sending data in so big batches when it reconnects with server. Default proxy ProxyOfflineBuffer is 1h so all last hour data are resynced with server when it was not possible to to push data few times. If you have changed ProxyOfflineBuffer even more data may be pushed to server in such scenarios.
            http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
            https://kloczek.wordpress.com/
            zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
            My zabbix templates https://github.com/kloczek/zabbix-templates

            Comment

            • dmcken
              Junior Member
              • May 2013
              • 12

              #7
              grep -i DataSender /etc/zabbix/zabbix_proxy.conf
              ### Option: DataSenderFrequency
              # DataSenderFrequency=1

              I kept kicking up StartDBSyncers thinking it was the process that was doing this syncing. When I read it the first time, I always saw this as a pointless option as why would I want to introduce a delay to the time it takes to get data from a proxy to the server.

              In other news we got the queues down, it turns out that there was high load on the server's mysql instance.

              Comment

              • ingus.vilnis
                Senior Member
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Mar 2014
                • 908

                #8
                Hi,

                Just to add my two cents here.

                Active proxies have a hard coded data sender limit of 1000 values per each connection. However with 400+ nvps on a proxy that is not the case.

                Each DB syncer is capable of processing again up to ~1000 nvps so it can even make things worse if you increase the amount of syncers to more than default 4 unless you run over 4k nvps.

                Networks and database performance are usually the most common problems.

                If a data sender on a proxy is busy then check ALL internal process and cache graphs on Zabbix server. Most likely you might see some issues there. And always log slow queries longer than 3000 milliseconds.

                Best Regards,
                Ingus

                Comment

                • kloczek
                  Senior Member
                  • Jun 2006
                  • 1771

                  #9
                  Originally posted by ingus.vilnis
                  Active proxies have a hard coded data sender limit of 1000 values per each connection. However with 400+ nvps on a proxy that is not the case.
                  This limit can be increased by change in source code ZBX_MAX_HRECORD in include/proxy.h. FYI: Alexey told me that zabbix dev team is thinking about add this param as the proxy configuration variable (they are aware that fixed value of this #define may be a little problematic )
                  I'm using from few weeks 5000 as limit (so far with this change was possible to fix few nasty issues).

                  Networks and database performance are usually the most common problems.

                  If a data sender on a proxy is busy then check ALL internal process and cache graphs on Zabbix server. Most likely you might see some issues there. And always log slow queries longer than 3000 milliseconds.
                  Howgh
                  Last edited by kloczek; 27-10-2015, 22:19.
                  http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                  https://kloczek.wordpress.com/
                  zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                  My zabbix templates https://github.com/kloczek/zabbix-templates

                  Comment

                  • ingus.vilnis
                    Senior Member
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Mar 2014
                    • 908

                    #10
                    Hi,

                    Yes, this ZBX_MAX_HRECORD value indeed can be increased and it is recommended to do so on large setups however that will not be the case of our friend with few hundred vps. Not likely he will see any difference with it. Having this setting configurable in config would be great though.

                    But the last thing with network ad perf tuning is overlooked waaay to many times than we would like to. People might forget to check it and it can be useful to remind.

                    Best Regards,
                    Ingus

                    Comment

                    • kloczek
                      Senior Member
                      • Jun 2006
                      • 1771

                      #11
                      Originally posted by ingus.vilnis
                      But the last thing with network ad perf tuning is overlooked waaay to many times than we would like to. People might forget to check it and it can be useful to remind.
                      Any suggestions about perf tuning?
                      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                      https://kloczek.wordpress.com/
                      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                      My zabbix templates https://github.com/kloczek/zabbix-templates

                      Comment

                      • ingus.vilnis
                        Senior Member
                        Zabbix Certified Trainer
                        Zabbix Certified SpecialistZabbix Certified Professional
                        • Mar 2014
                        • 908

                        #12
                        Hi,

                        I wonder what exactly could I tell you here. I believe you have great experience in Zabbix and not likely I will tell you anything new what has not been discussed here in forum hundreds of times already.

                        I start with the basics - Zabbix internal graphs. I check if all proxies are monitored correctly - by themselves and never from Zabbix server.

                        Then the logs. And after that the config files.

                        One catch regarding networks would be to use IP instead of DNS names in Server parameter on proxies if DNS is used and you experience lags because of network. But that is a specific use case.

                        Is there anything you are particularly interested in?

                        Best Regards,
                        Ingus

                        Comment

                        • mushero
                          Senior Member
                          • May 2010
                          • 101

                          #13
                          You didn't mention if your Proxies are well-connected, i.e. in the same data center or well-connected Internet.

                          Ours are all over the world and poorly connected and we have had endless problems with Proxies failing and getting stuck, needed a restart to send again - this is better in 2.4 but not gone, I think.

                          They don't die, and don't disconnect, but get stuck with a big queue and send very slowly, like 1 NVPS, not the 100-250 we need.

                          It's like there is no timeout at all or failure mode and we have to restart - it seems to never give up or reset the connection again.

                          But if on a LAN, should be no problem - we also built SQL and tools to watch and monitor these queues in the proxy (and report them via the agent) so we can see and alert on them.

                          And we are working on some simple PHP pages on the proxy to view these queues and poller / late items like you can on the main server.

                          Steve

                          Comment

                          • zalex_ua
                            Senior Member
                            Zabbix Certified Trainer
                            Zabbix Certified SpecialistZabbix Certified Professional
                            • Oct 2009
                            • 1286

                            #14
                            See also https://support.zabbix.com/browse/ZBX-5448

                            Comment

                            • db100
                              Member
                              • Feb 2023
                              • 61

                              #15

                              is it possible to throttle the incoming data on purpose?
                              assuming the disk is too slow, is it possible to impose a max data incoming rate in particular for Trapper items, so that some backpressure is applied and , for example, peaks of high incoming data rates are smoothed ?

                              Comment

                              Working...