Ad Widget

Collapse

Active check failed?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • killmasta93
    Member
    • Oct 2019
    • 45

    #1

    Active check failed?


    Hi

    Currently have working zabbix to monitor servers externally but today all of a suden im getting many zabbix alerts of the agent not responding, I know its not a fiewall issue because i can telnet the server. Im thinking more of a zabbix server issue even though i havent changed nothing.

    this is the agent


    active check configuration update from [monitor.mydomain.com:10051] started to fail (cannot connect to [[monitor.mydomain.com]:10051]: [4] Interrupted system call)



    and on the server side i get lots of this


    1325:20200726:230918.638 temporarily disabling Zabbix agent checks on host "prometheusideas": host unavailable




    currently running zabbix server with 6 gigs of ram with 4 cpu

    Not sure what else i should be looking at, havent never got this type of situation before.




    Thank you






  • ripperSK
    Member
    • Jul 2019
    • 42

    #2
    when doing a quick Google search for your issue - most of the times it's related to network performance issues or zabbix server performance / trapper count /timeout settings.

    You can telnet zabbix server means it's not a firewall rule issue but "Interrupted system call" means timeout in network context. Meaning the network traffic is being dropped.

    Try to look at tcpdump network analysis and see if there's any indication of poor network performance or packet drops or bandwidth being saturated / network being flooded.

    Secondly look at zabbix server /proxy performance and utilization of zabbix trappers. Either the sending or the receiving side can be overloaded for the interrupted system call to happen.

    Lastly try to setup a clean server - agent communication within a single IP subnet and see if your problem persists in this setup.

    Comment

    • killmasta93
      Member
      • Oct 2019
      • 45

      #3
      Thanks for the reply, as for the google search i did many times before posting, i then found out something
      PHP Code:
      Utilization of unreachable poller data collector processesin 100 
      but when i try changing on the server config from default StartPollers
      to StartPollers=80

      and restart the zabbix server wont start

      So not sure what else to do

      Thank you

      Comment

      • ripperSK
        Member
        • Jul 2019
        • 42

        #4
        Increase the logging verbosity of the server and restart again. Server log should indicate what's the problem. You may be hitting a memory limit for example.

        refer to the following documentation for optimal poller count: https://www.zabbix.com/documentation...ormance_tuning

        https://blog.zabbix.com/monitoring-h...esses-are/457/

        Comment

        • killmasta93
          Member
          • Oct 2019
          • 45

          #5
          Thanks for the reply, so i was reading a bit and found this

          https://bobcares.com/blog/zabbix-bus...#comment-75253

          But im lost on the
          Keep lost resources period=0
          how would i know which template?

          im going to increase it and post back the logs and see whats going on

          Thank you

          Comment

          • ripperSK
            Member
            • Jul 2019
            • 42

            #6
            Log in to zabbix web interface as Super Admin and select Administration - >Queue - >(roll down menu) Details

            and you should see all the delayed items with times and from specific hosts that zabbix server expects to receive. If these items no longer exist (for example a disk partition that is no longer available on a host) you can delete them. You'll find them on the specific host as not available items from auto discovery.

            If you feel that you need to set keep lost resource time to 0 you will find it in the discovery rule of the template you use for your hosts (see attachment).
            Attached Files

            Comment

            • killmasta93
              Member
              • Oct 2019
              • 45

              #7
              Thanks for the reply,
              so this is the queue


              i checked the logs and found lots of this

              PHP Code:
              2399:20200728:152039.008 Zabbix agent item "system.cpu.util[,iowait]" on host "prometheus4hagroup" failedanother network errorwait for 15 seconds
              2403
              :20200728:152039.243 Zabbix agent item "vfs.file.contents["/sys/class/net/enp1s0/operstate"]" on host "prometheus5mery" failedanother network errorwait for 15 seconds
              2401
              :20200728:152039.254 Zabbix agent item "vfs.file.contents["/sys/class/net/tap107i0/operstate"]" on host "prometheussp" failedanother network errorwait for 15 seconds
              2410
              :20200728:152041.738 Zabbix agent item "vfs.file.contents["/sys/class/net/vmbr1/operstate"]" on host "prometheus5mery" failedanother network errorwait for 15 seconds 

              Comment

              • ripperSK
                Member
                • Jul 2019
                • 42

                #8
                looking at your queue your server is waiting for more than 1200 items for more than 1 minute. It is not critical but also not optimal. It also seems that you are using passive zabbix agents - they have worse performance than active agents so consider switching the setup. All you can do now is to increase number of pollers so zabbix server starts to catch up on the queued items.

                Items mentioned in the server log are probably missing from the host. Try to connect to prometheussp and do

                Code:
                cat /sys/class/net/tap107i0/operstate
                most likely the 'tap' interface no longer exists. If this is the case, you can set the keep lost resource time to 0 (as discussed previously) to fix this problem.

                looking at this forum someone had a similar problem that you have :https://www.zabbix.com/forum/zabbix-...-seconds/page2

                it was fixed by tuning zabbix server for better performance.

                Comment

                • killmasta93
                  Member
                  • Oct 2019
                  • 45

                  #9
                  Thanks for the reply, so i tried everything and no luck, i was looking at proxy setup which i configured on the client side which has pfSense and connected successfully to the zabbix server, but my question is now on the agents of the client i point it to the zabbix proxy
                  and then on the zabbix server when i configure the agent i would put the WAN ip of the Client zabbix proxy? and which port?

                  Comment

                  • ripperSK
                    Member
                    • Jul 2019
                    • 42

                    #10
                    I recommend that you read up on the concept of zabbix proxy :https://www.zabbix.com/documentation...concepts/proxy

                    I'm attaching a picture from Google search for zabbix proxy.

                    concept is that the client (zabbix_agent) connects only to the proxy and the proxy connects to the zabbix server. When configuring host on the zabbix web interface you select monitored via proxy (and select appropriate proxy)
                    Attached Files

                    Comment

                    • killmasta93
                      Member
                      • Oct 2019
                      • 45

                      #11
                      Thanks eventually got the proxy working i had to install on each site the zabbix proxy, but only odd issue, when a agent goes off line it alerted me every 30mins so what i did went to the template and changed the macro from 30min to 5min. But whats odd is that it doesn't appear offline rather then in the section unknown. And whats even more odd if the zabbix proxy of the remote site goes offline it doesn't alert me just shows unknown. The template i have for the agents on windows and linux called OS windows active agent and OS linux active agent.

                      Comment

                      • killmasta93
                        Member
                        • Oct 2019
                        • 45

                        #12
                        Thanks for the reply, correct they are active agents which go though the proxy, the only config i changed on the proxy is the config frequency i put to 100

                        Comment

                        • cyber
                          Senior Member
                          Zabbix Certified SpecialistZabbix Certified Professional
                          • Dec 2006
                          • 4841

                          #13
                          Originally posted by killmasta93
                          The template i have for the agents on windows and linux called OS windows active agent and OS linux active agent.
                          Are the agents also Active? Previous posts point to use of passive agents, so your active agent templates won't do much good.

                          Comment

                          • ripperSK
                            Member
                            • Jul 2019
                            • 42

                            #14
                            killmasta93 to really help you out with these "weird" things you describe in your post we would need you to collect and post logs from agent, proxy and the zabbix server once these things happen.

                            Be assured that server vs. proxy vs. agent communications functions properly for all supported LTS releases of zabbix - at the moment of writing this it's 4.0 and 5.0 and non-LTS 5.4.

                            Make sure your zabbix server, proxy and agent are all installed with the same major version, ideally also match on the minor version.

                            Also try not to mix passive proxy with active agents and vice versa.

                            Lastly please be aware that your zabbix proxy host also needs monitoring via zabbix agent.

                            Here's a quick explanation of passive vs. active :

                            passive agent or proxy just sits there, having a listening port open and doing nothing until the server tells it do do something

                            active agent does all the work - it does not have a listening port open but initiates connection to the server and asks for all the items it needs to monitor with monitoring frequency and afterwards collects and sends the data to the server. Active architecture requires less firewall rules adjustments than passive.

                            passive is server heavy and client light
                            active is server light and client heavy

                            all this said active agents still have couple of drawbacks - they cannot accept "check now" from web interface or issue ad-hoc command from web interface..

                            Comment

                            • Ngk
                              Junior Member
                              • Aug 2022
                              • 10

                              #15
                              what are the required ports to be opened to communicate between agent and server?

                              Comment


                              • Atsushi
                                Atsushi commented
                                Editing a comment
                                In order to use active checks, Zabbix agent must be able to access Zabbix server through port number 10051.
                            Working...