Ad Widget

Collapse

Zabbix - Suspicion of a hanging thread

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pa1975
    Junior Member
    • Dec 2017
    • 14

    #1

    Zabbix - Suspicion of a hanging thread

    There is a problem: periodically on several network devices, the load on the CPU increases and keeps.
    In debugging - the SNMP process takes up most of it.
    The tshark capture shows a lot of accesses (about 6000 in 10 seconds) which does not stop even when the host is switched to disable or to a proxy.
    Has anyone experienced this?
    Is it possible to track which of the poller threads is polling this device?
    Zabbix = 6.4
    Poller's = 350
    Number of hosts 945
    Required server performance, new values per second 1708.2
    High availability cluster Disabled​
    Zabbix performance counters is in "green zone"
  • LenR
    Senior Member
    • Sep 2009
    • 1005

    #2
    I'm not sure I'm following, but if you disable a host, are you doing a zabbix_proxy -R config-cache-reload (or zabbix_server, whichever). The polling may not stop immediately, I think it takes a config cache update.

    Comment

    • pa1975
      Junior Member
      • Dec 2017
      • 14

      #3
      We ran into the problem again. The service requests the same data from the host regardless of the response:
      1854 2023-05-22 12:54:30.986887298 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.2
      1855 2023-05-22 12:54:30.987085498 10.69.0.12 → 172.19.99.114 get-response
      1856 2023-05-22 12:54:30.987112240 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.47.1.1.1.1.11
      1857 2023-05-22 12:54:30.993851605 10.69.0.12 → 172.19.99.114 get-response
      1858 2023-05-22 12:54:30.993892761 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.8
      1859 2023-05-22 12:54:30.995221932 10.69.0.12 → 172.19.99.114 get-response
      1860 2023-05-22 12:54:30.995269925 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.2
      1861 2023-05-22 12:54:30.995599137 10.69.0.12 → 172.19.99.114 get-response
      1862 2023-05-22 12:54:30.995627148 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.47.1.1.1.1.11
      1863 2023-05-22 12:54:31.000191557 10.69.0.12 → 172.19.99.114 get-response
      1864 2023-05-22 12:54:31.000295402 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.8
      1865 2023-05-22 12:54:31.002425017 10.69.0.12 → 172.19.99.114 get-response
      1866 2023-05-22 12:54:31.002471545 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.2
      1867 2023-05-22 12:54:31.003502822 10.69.0.12 → 172.19.99.114 get-response
      1868 2023-05-22 12:54:31.003550464 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.47.1.1.1.1.11
      1869 2023-05-22 12:54:31.006745115 10.69.0.12 → 172.19.99.114 get-response
      1870 2023-05-22 12:54:31.006782333 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.8
      1871 2023-05-22 12:54:31.008883783 10.69.0.12 → 172.19.99.114 get-response
      1872 2023-05-22 12:54:31.008932462 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.2
      1873 2023-05-22 12:54:31.010556465 10.69.0.12 → 172.19.99.114 get-response
      1874 2023-05-22 12:54:31.010948334 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.47.1.1.1.1.11
      1875 2023-05-22 12:54:31.013046426 10.69.0.12 → 172.19.99.114 get-response
      1876 2023-05-22 12:54:31.013081584 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.8
      1877 2023-05-22 12:54:31.027514709 10.69.0.12 → 172.19.99.114 get-response
      1878 2023-05-22 12:54:31.027552896 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.2
      1879 2023-05-22 12:54:31.027787449 10.69.0.12 → 172.19.99.114 get-response
      1880 2023-05-22 12:54:31.028001350 10.69.0.12 → 172.19.99.114 get-response
      1881 2023-05-22 12:54:31.028028985 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.2.2.1.8
      1882 2023-05-22 12:54:31.028384109 172.19.99.114 → 10.69.0.12 getBulkRequest 1.3.6.1.2.1.47.1.1.1.1.11
      1883 2023-05-22 12:54:31.034604754 10.69.0.12 → 172.19.99.114 get-response


      In the settings, I switched the host service to a proxy, 5 minutes of waiting - requests from the server did not stop.
      zabbix_server -R config_cache_reload
      "Runtime control command was successfully forwarded"
      For half an hour, every 5 minutes I checked whether the polling of the host from the server had stopped and did " config_cache_reload " It did not help.​
      Is there a way to determine which poller is polling a given host?
      Because it turns out he lost contact with the main server thread.
      After restarting the service, everything is fine for a while, but then one of the network nodes begins to experience problems.
      Yes, and restarting the monitoring service so often is not a good idea.​

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        First field in each logline is the process id which generated the line...

        Comment

        Working...