Ad Widget

Collapse

Pollers hanging up

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • leos
    Junior Member
    • Nov 2021
    • 5

    #1

    Pollers hanging up

    Hello everyone.
    I have a little issue. The data pollers are getting stuck.
    You can see my attached screenshots. More and more pollers are getting stuck until everything item cannot get any new data. I have to restart zabbix-server to recover the pollers.
    The pollers gets stuck with, for example, "[got 16 values in 5.016000 sec, getting values]". And it never changes the status.
    I think maybe is an item that has some issue. But I don't know which one.

    There are no locks on the mysql DB. CPU and memory are very low usage. I enabled 100 pollers (when 20 of them were enough).
    I tried to increase debug level, but nothing out of normal seems to be happening.
    Zabbix version is 5.4.6, running Ubuntu 20.04.

    Thanks in advance for any help you can give me.
  • leos
    Junior Member
    • Nov 2021
    • 5

    #2
    It's been some time since I asked about this.
    Less frequently but it is still happening.
    The only difference from before, that with Zabbix 6.0.4 they add a new poller for ODBC. And that is what's. I can't find which DB query is causing the issue, or why. We do a lot of queries to several and different DB Servers and instances.

    If anyone knows how to debug the poller when I see it's stuck, please let me know.
    Thanks!!

    Comment

    • Markku
      Senior Member
      Zabbix Certified SpecialistZabbix Certified ProfessionalZabbix Certified Expert
      • Sep 2018
      • 1781

      #3
      Just an idea: If the problem can be identified by the duration of the TCP sessions to the monitored databases, you could try capturing all the database connections on Zabbix server using tcpdump, and then using Wireshark statistics (or other similar tools) to find out the database connections that are problematic.

      Disclaimer: It's up to you to find out if you can do that in accordance to your security and other relevant operational policies.

      Markku
      Last edited by Markku; 12-05-2022, 20:58. Reason: Length -> duration

      Comment

      • tim.mooney
        Senior Member
        • Dec 2012
        • 1427

        #4
        Originally posted by leos
        I enabled 100 pollers (when 20 of them were enough).
        Why? If 20 were enough (possibly more than enough) why increase pollers to 100?

        Keep in mind that the more pollers you run, the more chance there is for contention for certain resources, including database resources.

        With some types of software, you can set certain configuration parameters to values that are "larger than I will ever need" and not have any issues, but Zabbix isn't one of them where that's a good idea.

        In general, you only want to run as many of each subsystem as your environment requires. Since you haven't told us a lot about your environment (for example, what your "new values per second (NVPS)" is, how many hosts you have, how many items you have, etc.), it's impossible to know what the right values are for you environment, but I'm suspicious that 100 pollers (or even 20 pollers) is more than you need (and may be part of the problem).

        Also, typically when your environment is large enough to require you to increase the counts for multiple subsystems, that also means that database tuning is also useful.

        If you provide more information about your environment, it may shed some light on the issue.

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4807

          #5
          How is the data looking? with gaps? probably not getting all the data all the time...?
          Evetho you said you did it already... but try again... Increase the logging level of one or two pollers and let them run for a while, then look in logs. what are they up to. Which hosts and which items they have been dealing with etc... You should be able to track down those things, which either lock your pollers or cause timeouts etc...

          Comment

          • leos
            Junior Member
            • Nov 2021
            • 5

            #6
            Originally posted by cyber
            How is the data looking? with gaps? probably not getting all the data all the time...?
            Evetho you said you did it already... but try again... Increase the logging level of one or two pollers and let them run for a while, then look in logs. what are they up to. Which hosts and which items they have been dealing with etc... You should be able to track down those things, which either lock your pollers or cause timeouts etc...
            A poller just get stuck. I don't see any noticeable gap. I checked hours later, and the poller still shows the same info, for example: "odbc poller #3 [got 3 values in 0.251546 sec, getting values]"
            One by one, with enough time, I run out of "good" pollers, and then is when the bad things happen, as I don't get any new values at all.
            If I can debug, see what was the last value/query that was trying to get, I could fix it.

            I tried with debug level, but I found nothing.

            Comment

            Working...