Ad Widget

Collapse

Zabbix data sender processes more than 95% busy

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Zabbix data sender processes more than 95% busy

    Hi Zabbix Gurus

    We have lots of proxies in our environment, all of them are ok except one, whose data sender proccess gets almost 100% busy

    Its around 450 nvps, the other ones, many of them has more nvps and are ok.
    Its DataSenderFrequency=1

    Network issues seems not be the problem, since other proxies are on the same network of this problematic one.
    I have tcdumped and no retransmition, latency or window related problem were present.

    Reading forum threads like this, I strace the process data sender, and realize one operation of (read(9)) being very slowly
    Follow the output of strace -p 13119 -t -T -e read=9

    08:53:38 read(9, "ZBXD\3\36\0\0\0\26\0\0\0x\234\253V*J-.\310\317+NU\262R*.MN"..., 2048) = 43 <5.565102>
    08:53:44 read(9, "ZBXD\3\36\0\0\0\26\0\0\0x\234\253V*J-.\310\317+NU\262R*.MN"..., 2048) = 43 <18.780230>
    08:54:03 read(9, "ZBXD\3\36\0\0\0\26\0\0\0x\234\253V*J-.\310\317+NU\262R*.MN"..., 2048) = 43 <1.675908>
    08:54:05 read(9, "ZBXD\3\36\0\0\0\26\0\0\0x\234\253V*J-.\310\317+NU\262R*.MN"..., 2048) = 43 <5.629691>
    08:54:10 read(9, "ZBXD\3\36\0\0\0\26\0\0\0x\234\253V*J-.\310\317+NU\262R*.MN"..., 2048) = 43 <31.897951>
    08:54:42 read(9, "\2\0\0\0\1\0\0\0\1\0\0\0\4\0\0\0!\0\0\0\0\0\0 \0", 24) = 24 <0.000049>
    08:54:42 read(9, "\n\34\31X\2serverxxxx"..., 38) = 38 <0.000056>
    08:54:42 read(9, "ZBXD\3\36\0\0\0\26\0\0\0x\234\253V*J-.\310\317+NU\262R*.MN"..., 2048) = 43 <43.634166>

    I cant find anything about this reading operation, what this string means."ZBXD\3\36\0\0\0\26\0\0\0x\234\253V*J-.\310\317+NU\262R*.MN"

    I could see this same string on the other proxies, and they are much faster.

    The stas from data sender shows this slowly read(9 related behavior

    13119:20191014:095717.310 __zbx_zbx_setproctitle() title:'data sender [sent 30941 values in 78.294439 sec, idle 0 sec]'
    13119:20191014:095717.310 __zbx_zbx_setproctitle() title:'data sender [sent 30941 values in 78.294439 sec, sending data]'

    13119:20191014:095719.546 __zbx_zbx_setproctitle() title:'data sender [sent 11563 values in 2.235230 sec, idle 1 sec]'
    13119:20191014:095720.546 __zbx_zbx_setproctitle() title:'data sender [sent 11563 values in 2.235230 sec, sending data]'
    13119:20191014:095720.568 __zbx_zbx_setproctitle() title:'data sender [sent 409 values in 0.022718 sec, idle 1 sec]'
    13119:20191014:095721.569 __zbx_zbx_setproctitle() title:'data sender [sent 409 values in 0.022718 sec, sending data]'
    13119:20191014:095721.583 __zbx_zbx_setproctitle() title:'data sender [sent 115 values in 0.014698 sec, idle 1 sec]'
    13119:20191014:095722.584 __zbx_zbx_setproctitle() title:'data sender [sent 115 values in 0.014698 sec, sending data]'
    13119:20191014:095722.606 __zbx_zbx_setproctitle() title:'data sender [sent 254 values in 0.022417 sec, idle 1 sec]'
    13119:20191014:095723.606 __zbx_zbx_setproctitle() title:'data sender [sent 254 values in 0.022417 sec, sending data]'
    13119:20191014:095723.628 __zbx_zbx_setproctitle() title:'data sender [sent 257 values in 0.021877 sec, idle 1 sec]'
    13119:20191014:095724.628 __zbx_zbx_setproctitle() title:'data sender [sent 257 values in 0.021877 sec, sending data]'
    13119:20191014:095724.646 __zbx_zbx_setproctitle() title:'data sender [sent 163 values in 0.018144 sec, idle 1 sec]'
    13119:20191014:095725.646 __zbx_zbx_setproctitle() title:'data sender [sent 163 values in 0.018144 sec, sending data]'
    13119:20191014:095725.673 __zbx_zbx_setproctitle() title:'data sender [sent 311 values in 0.026590 sec, idle 1 sec]'
    13119:20191014:095726.673 __zbx_zbx_setproctitle() title:'data sender [sent 311 values in 0.026590 sec, sending data]'
    13119:20191014:095726.696 __zbx_zbx_setproctitle() title:'data sender [sent 241 values in 0.022858 sec, idle 1 sec]'
    13119:20191014:095727.696 __zbx_zbx_setproctitle() title:'data sender [sent 241 values in 0.022858 sec, sending data]'
    13119:20191014:095727.711 __zbx_zbx_setproctitle() title:'data sender [sent 159 values in 0.015039 sec, idle 1 sec]'
    13119:20191014:095728.711 __zbx_zbx_setproctitle() title:'data sender [sent 159 values in 0.015039 sec, sending data]'
    13119:20191014:095728.728 __zbx_zbx_setproctitle() title:'data sender [sent 179 values in 0.016213 sec, idle 1 sec]'
    13119:20191014:095729.728 __zbx_zbx_setproctitle() title:'data sender [sent 179 values in 0.016213 sec, sending data]'
    13119:20191014:095729.746 __zbx_zbx_setproctitle() title:'data sender [sent 275 values in 0.018542 sec, idle 1 sec]'
    13119:20191014:095730.747 __zbx_zbx_setproctitle() title:'data sender [sent 275 values in 0.018542 sec, sending data]'
    13119:20191014:095730.779 __zbx_zbx_setproctitle() title:'data sender [sent 152 values in 0.032924 sec, idle 1 sec]'
    13119:20191014:095731.780 __zbx_zbx_setproctitle() title:'data sender [sent 152 values in 0.032924 sec, sending data]'
    13119:20191014:095731.797 __zbx_zbx_setproctitle() title:'data sender [sent 201 values in 0.016948 sec, idle 1 sec]'
    13119:20191014:095732.797 __zbx_zbx_setproctitle() title:'data sender [sent 201 values in 0.016948 sec, sending data]'
    13119:20191014:095732.813 __zbx_zbx_setproctitle() title:'data sender [sent 165 values in 0.016138 sec, idle 1 sec]'
    13119:20191014:095733.813 __zbx_zbx_setproctitle() title:'data sender [sent 165 values in 0.016138 sec, sending data]'
    13119:20191014:095733.830 __zbx_zbx_setproctitle() title:'data sender [sent 222 values in 0.016642 sec, idle 1 sec]'
    13119:20191014:095734.830 __zbx_zbx_setproctitle() title:'data sender [sent 222 values in 0.016642 sec, sending data]'
    13119:20191014:095735.477 __zbx_zbx_setproctitle() title:'data sender [sent 419 values in 0.647629 sec, idle 1 sec]'
    13119:20191014:095736.478 __zbx_zbx_setproctitle() title:'data sender [sent 419 values in 0.647629 sec, sending data]'
    13119:20191014:095746.768 __zbx_zbx_setproctitle() title:'data sender [sent 4007 values in 10.290795 sec, idle 1 sec]'
    13119:20191014:095747.769 __zbx_zbx_setproctitle() title:'data sender [sent 4007 values in 10.290795 sec, sending data]'
    13119:20191014:095857.673 __zbx_zbx_setproctitle() title:'data sender [sent 31022 values in 69.904848 sec, idle 0 sec]'
    13119:20191014:095857.674 __zbx_zbx_setproctitle() title:'data sender [sent 31022 values in 69.904848 sec, sending data]'

    13119:20191014:095859.910 __zbx_zbx_setproctitle() title:'data sender [sent 16507 values in 2.236210 sec, idle 1 sec]'
    13119:20191014:095900.910 __zbx_zbx_setproctitle() title:'data sender [sent 16507 values in 2.236210 sec, sending data]'
    13119:20191014:095900.945 __zbx_zbx_setproctitle() title:'data sender [sent 560 values in 0.034576 sec, idle 1 sec]'
    13119:20191014:095901.945 __zbx_zbx_setproctitle() title:'data sender [sent 560 values in 0.034576 sec, sending data]'

    We are running Zabbix 4.0

    Any help would be appreciated.
    Thanks in advance.



    #2
    What are the proxy specs? What is the proxy database? what is iowait on the proxy?

    Comment


    • RicardoHoffmann
      RicardoHoffmann commented
      Editing a comment
      Running on Centos Linux 7.5. Its a vmware virtual machine with 6Gb memory and 4 core cpu (Intel(R) Xeon(R) CPU X5650 @ 2.67GHz).
      Database is Server version: 5.5.60-MariaDB MariaDB Server (50Gb for it, being used 800M
      From last day, iowait (last: 0.41% Min: 0.28% Avg: 1.25% Max: 12.69%)

    #3
    What is the patch level of Zabbix? 4.0.x? Have you ran any of the mysql tuning scripts against the proxy? Is this an "Active" proxy?

    On the zabbix server, check all the busy% items in the Zabbix Server application, particularly Trapper.

    Are there other proxies on the same vmware environment that work OK?

    My busiest similar proxy is 900 NVPS, max iowait is 6%, not that much different.

    What type of items does this proxy gather? Zabbix agent, Zabbix agent(active), web, snmp, etc. If it has a lot of network packets, one test is to set the nic to "permiscious". At least in the past, the virtualization of mac filtering on a vmware nic could slow things if there are a large number of packets.

    Comment


      #4
      Originally posted by LenR View Post
      What is the patch level of Zabbix? 4.0.x? Have you ran any of the mysql tuning scripts against the proxy? Is this an "Active" proxy?

      On the zabbix server, check all the busy% items in the Zabbix Server application, particularly Trapper.

      Are there other proxies on the same vmware environment that work OK?

      My busiest similar proxy is 900 NVPS, max iowait is 6%, not that much different.

      What type of items does this proxy gather? Zabbix agent, Zabbix agent(active), web, snmp, etc. If it has a lot of network packets, one test is to set the nic to "permiscious". At least in the past, the virtualization of mac filtering on a vmware nic could slow things if there are a large number of packets.
      Zabbix 4.0.10, no mysql tunnig scripts, Active proxy.
      Zabbix server is OK, and busy trapper is last: 2.56% min: 1.56% avg: 4.28% max: 56% (last 1 day)
      Yes, we have other proxies on the same VM environment working fine, many with more nvps than this problematic.
      The vast majority of itens are Zabbix agent(active).

      Comment


        #5
        Ok, my trapper is never > 4% busy, how many trappers are you starting on the main server? We are set to 30, default is 5.

        Comment


          #6
          Originally posted by LenR View Post
          Ok, my trapper is never > 4% busy, how many trappers are you starting on the main server? We are set to 30, default is 5.
          StartTrappers=100

          Comment


            #7
            Hi,
            Please, did you solve this? We're facing exactly the same.

            Comment


            • RicardoHoffmann
              RicardoHoffmann commented
              Editing a comment
              Yes! a number of hosts on an HPC cluster was sending data in 2s interval, despite the item interval configured was 5m. They re-install the agent on all servers of the clustar and thats solve the issue. You should search for hosts sending data on a diferent time interval from the one configured on the item. tcpdump may point something, latest data too.

            • amandahlak
              amandahlak commented
              Editing a comment
              RicardoHoffmann, thanks! I'll do that!

          Announcement

          Collapse
          No announcement yet.
          Working...
          X