Ad Widget

Collapse

More than 100 items having missing data for more than 10 minutes on a single host

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • EHRETic
    Member
    • Jan 2021
    • 45

    #1

    More than 100 items having missing data for more than 10 minutes on a single host

    Hello there,

    I need a little help troubleshooting a warning that I have because of one single host.
    Queue looks like this (I have more than 100 items) and no information is getting above the 15 minutes delay:

    Click image for larger version

Name:	image.png
Views:	3929
Size:	62.8 KB
ID:	480528

    Last lines of host agent log:
    Code:
    2024/03/10 20:35:49.953313 [101] active check configuration update from [192.168.X.X:10051] is working again
    2024/03/10 20:35:49.955537 [101] history upload to [192.168.X.X:10051] [SRV--XXX] is working again
    2024/03/10 20:36:41.312534 [101] sending of heartbeat message to [192.168.X.X:10051] is working again
    The "working again" is probably due to Zabbix server restart or the host restart when I tried fixing it but otherwise, I didn't spot any noticeable error.
    I've tried reinstalling the agent on host (with reboot before reinstall) and also disabling the agent and re-enabling it.
    This didn't fix the issue so where should I continue troubleshooting?

    Thanks in advance for your help! ;-)
  • The_Shadow
    Junior Member
    • Mar 2024
    • 6

    #2
    Hi,
    same issue with zabbix docker server. (v. 6.4.12). After docker container reboot all work again. I'm monitoring if the issue came back

    Comment


    • cfrancis
      cfrancis commented
      Editing a comment
      I also solved the problem by restarting the server.
  • TheOddPerson
    Junior Member
    • Feb 2024
    • 12

    #3
    <Post Redacted - Please see next post>
    Last edited by TheOddPerson; 12-03-2024, 18:29.

    Comment

    • TheOddPerson
      Junior Member
      • Feb 2024
      • 12

      #4
      I see the error in the log relating to Active checks.

      Is this host running a Zabbix Agent in Active mode?
      What is the timeout for the type of check that is clogging up the system?
      How many pollers/trappers do you have running?
      Are you receiving any warnings on your zabbix server?
      How many hosts do you have configured for that same type of poller?

      check /etc/zabbix/zabbix_server.conf

      For in the case of Zabbix Agent items, the default is this:
      Timeout=3
      StartPollers=5

      For Active Agents it is this:
      StartTrappers=5


      The default timeout is reasonable for most installations.
      You may want to consider increasing the pollers if the timeout is default.
      Last edited by TheOddPerson; 12-03-2024, 18:29.

      Comment

      • EHRETic
        Member
        • Jan 2021
        • 45

        #5
        Hi TheOddPerson ,

        Thanks for your reply, I'll try to reply as concise as possible!

        Originally posted by TheOddPerson
        I see the error in the log relating to Active checks.
        Is this host running a Zabbix Agent in Active mode?
        Yes, all of my hosts are active (except some SNMP ) - I have today 43 active agents (mix of Linux & Windows VMs, including Zabbix server) + 9 SNMP, one SNMP is most of the time offline on purpose

        Originally posted by TheOddPerson
        What is the timeout for the type of check that is clogging up the system?
        I didn't change it so default

        Originally posted by TheOddPerson
        How many pollers/trappers do you have running?
        Also default

        Originally posted by TheOddPerson
        Are you receiving any warnings on your zabbix server?
        Not that I'm aware of (except the active warning "More than....")

        Originally posted by TheOddPerson
        How many hosts do you have configured for that same type of poller?
        I'm not sure about this, so dumb question: what is the "same type of pollers"?

        I gave yesterday my server more resources (it's a single VM that has now 4 vCPUs and 4 GB of RAM) but it didn't help.
        Now that I roughly understand the concept of pollers, I understand why it didn't change... so increasing the configuration values should not be an issue (also giving even more ressources)

        Should I go to 10 for StartPollers & StartTrappers?

        Comment

        • TheOddPerson
          Junior Member
          • Feb 2024
          • 12

          #6
          Giving the server more resources won't help unless you're seeing Zabbing complain about high CPU or memory utilization.
          I would definitely increase the number of trappers. This is the number of items that Zabbix can receive simultaneously. I would start with 10 and see if that makes any difference.
          The default configuration is for a fairly small implementation so expect to need to increase some values. Zabbix should be triggering problems on itself if it sees the poller / memory / cpu / cache usage too high and that can guide you on what configuration needs to be changed.

          Comment

          • EHRETic
            Member
            • Jan 2021
            • 45

            #7
            Edit, I tried the following, but the queue reappeared after a few minutes... :-/

            Code:
            ############ ADVANCED PARAMETERS ################
            
            ### Option: StartPollers
            #       Number of pre-forked instances of pollers.
            #
            # Mandatory: no
            # Range: 0-1000
            # Default:
            # StartPollers=5
            StartPollers=10
            
            ### Option: StartIPMIPollers
            #       Number of pre-forked instances of IPMI pollers.
            #               The IPMI manager process is automatically started when at least one IPMI poller is started.
            #
            # Mandatory: no
            # Range: 0-1000
            # Default:
            # StartIPMIPollers=0
            
            ### Option: StartPreprocessors
            #       Number of pre-forked instances of preprocessing workers.
            #               The preprocessing manager process is automatically started when preprocessor worker is started.
            #
            # Mandatory: no
            # Range: 1-1000
            # Default:
            # StartPreprocessors=3
            StartPreprocessors=5
            
            ### Option: StartPollersUnreachable
            #       Number of pre-forked instances of pollers for unreachable hosts (including IPMI and Java).
            #       At least one poller for unreachable hosts must be running if regular, IPMI or Java pollers
            #       are started.
            #
            # Mandatory: no
            # Range: 0-1000
            # Default:
            # StartPollersUnreachable=1
            StartPollersUnreachable=5
            
            ### Option: StartTrappers
            #       Number of pre-forked instances of trappers.
            #       Trappers accept incoming connections from Zabbix sender, active agents and active proxies.
            #       At least one trapper process must be running to display server availability and view queue
            #       in the frontend.
            #
            # Mandatory: no
            # Range: 0-1000
            # Default:
            # StartTrappers=5
            StartTrappers=10
            
            ### Option: StartPingers
            #       Number of pre-forked instances of ICMP pingers.
            #
            # Mandatory: no
            # Range: 0-1000
            # Default:
            # StartPingers=1
            StartPingers=5
            
            ### Option: StartDiscoverers
            #       Number of pre-forked instances of discoverers.
            #
            # Mandatory: no
            # Range: 0-250
            # Default:
            # StartDiscoverers=1
            
            ### Option: StartHTTPPollers
            #       Number of pre-forked instances of HTTP pollers.
            #
            # Mandatory: no
            # Range: 0-1000
            # Default:
            # StartHTTPPollers=1
            StartHTTPPollers=10

            Comment

            • EHRETic
              Member
              • Jan 2021
              • 45

              #8
              Hi,

              I just doubled all values from the above post but (so 20 for both pollers)
              I also increased all cache(s) size.

              Still the same result, any clue?
              (It doesn't really change the ressources consumption)

              Comment

              • TheOddPerson
                Junior Member
                • Feb 2024
                • 12

                #9
                Make sure you're editing the correct config file
                systemctl status zabbix-server
                should tell you in the first line what config file you're using
                you should also see child processes for each of the trappers.

                No zabbix problems on the zabbix server itself?

                Comment

                • Zuzuka
                  Member
                  • Aug 2011
                  • 39

                  #10
                  Try to run "ps -ax | grep zabbix" on Zabbix Server and look for the load of zabbix processes (how they are loaded):
                  Click image for larger version

Name:	image.png
Views:	3764
Size:	588.7 KB
ID:	480733
                  If there are all syncers/discoverers/pollers/trappers/etc. are busy then you need to add more. Don't forget that adding more processes to Zabbix may require to add more CPU cores

                  Comment

                  • Zuzuka
                    Member
                    • Aug 2011
                    • 39

                    #11
                    Also on agent side look for "StartAgents" parameter value in "zabbix_agentd.conf" file. Maybe this value is required to increase if you have many items to monitor on a single host. Test with different values - I using StartAgents=10.

                    Comment

                    • EHRETic
                      Member
                      • Jan 2021
                      • 45

                      #12
                      Originally posted by Zuzuka
                      Try to run "ps -ax | grep zabbix" on Zabbix Server and look for the load of zabbix processes (how they are loaded):
                      If there are all syncers/discoverers/pollers/trappers/etc. are busy then you need to add more. Don't forget that adding more processes to Zabbix may require to add more CPU cores
                      Well, they seems pretty much OK I think no?
                      (the rest is also OK, it's a nice homelab, but it remains a lab with less traffic as normal)

                      Code:
                         1771 ?        S      1:47 /usr/sbin/zabbix_server: poller #1 [got 0 values in 0.000047 sec, idle 1 sec]
                         1772 ?        S      1:42 /usr/sbin/zabbix_server: poller #2 [got 0 values in 0.000033 sec, getting values]
                         1773 ?        S      1:40 /usr/sbin/zabbix_server: poller #3 [got 7 values in 3.956291 sec, getting values]
                         1775 ?        S      1:43 /usr/sbin/zabbix_server: poller #4 [got 0 values in 0.000041 sec, idle 1 sec]
                         1776 ?        S      1:44 /usr/sbin/zabbix_server: poller #5 [got 0 values in 0.000059 sec, idle 1 sec]
                         1777 ?        S      1:46 /usr/sbin/zabbix_server: poller #6 [got 0 values in 0.000030 sec, idle 1 sec]
                         1778 ?        R      1:40 /usr/sbin/zabbix_server: poller #7 [got 5 values in 0.064381 sec, getting values]
                         1779 ?        S      1:42 /usr/sbin/zabbix_server: poller #8 [got 0 values in 0.000058 sec, idle 1 sec]
                         1781 ?        S      1:44 /usr/sbin/zabbix_server: poller #9 [got 0 values in 0.000046 sec, idle 1 sec]
                         1782 ?        S      1:42 /usr/sbin/zabbix_server: poller #10 [got 0 values in 0.000024 sec, idle 1 sec]
                         1783 ?        S      1:39 /usr/sbin/zabbix_server: poller #11 [got 0 values in 0.000045 sec, getting values]
                         1785 ?        S      1:41 /usr/sbin/zabbix_server: poller #12 [got 1 values in 0.052022 sec, idle 1 sec]
                         1786 ?        S      1:47 /usr/sbin/zabbix_server: poller #13 [got 0 values in 0.000027 sec, idle 1 sec]
                         1787 ?        S      1:43 /usr/sbin/zabbix_server: poller #14 [got 0 values in 0.000045 sec, idle 1 sec]
                         1788 ?        S      1:46 /usr/sbin/zabbix_server: poller #15 [got 4 values in 1.125960 sec, idle 1 sec]
                         1789 ?        S      1:42 /usr/sbin/zabbix_server: poller #16 [got 0 values in 0.000021 sec, idle 1 sec]
                         1790 ?        S      1:40 /usr/sbin/zabbix_server: poller #17 [got 0 values in 0.000031 sec, getting values]
                         1791 ?        S      1:47 /usr/sbin/zabbix_server: poller #18 [got 0 values in 0.000025 sec, getting values]
                         1792 ?        S      1:36 /usr/sbin/zabbix_server: poller #19 [got 0 values in 0.000046 sec, idle 1 sec]
                         1793 ?        S      1:39 /usr/sbin/zabbix_server: poller #20 [got 0 values in 0.000046 sec, getting values]

                      Comment

                      • EHRETic
                        Member
                        • Jan 2021
                        • 45

                        #13
                        Originally posted by Zuzuka
                        Also on agent side look for "StartAgents" parameter value in "zabbix_agentd.conf" file. Maybe this value is required to increase if you have many items to monitor on a single host. Test with different values - I using StartAgents=10.
                        I could not find this parameters in my agent's configuration files and it probably be cause I use v2 with active checks (https://www.zabbix.com/forum/zabbix-...ter-in-agent-2)
                        Any clue about what I can do?

                        Comment

                        • TheOddPerson
                          Junior Member
                          • Feb 2024
                          • 12

                          #14
                          It's become apparent you're using active checks.. in which case you want to check your TRAPPERS
                          run
                          ps -ax | grep trapper
                          to see how your trappers are doing.
                          You may want to increase the number of trappers.

                          Is the queue being consumed by 1 client? How many clients are configured as active agents?

                          Comment

                          Working...