Ad Widget

Collapse

Agent-check net.tcp.service not returning data on failure

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • stelben
    Junior Member
    • Aug 2014
    • 16

    #1

    Agent-check net.tcp.service not returning data on failure

    Hi!

    I have a Zabbix server (2.2.2) running on a Debian box.
    It works fine, overall.

    I just added an item (net.tcp.service) to a monitored Debian server with the agent (2.2.5) installed onto it.
    I receive data from it so it looks all fine.

    I use it to monitor a tunnel connection that only that particular server with the agent on, can reach.
    The problem is when the tunnel goes down and I expected to receive a "0" as data value from the trigger, instead of the "1" I get when the tunnel is up.

    Instead, I don't get any value.
    No value is inserted in the latest data/values.

    I used the nodata-trigger to detect an item check failure but that's most likely not the way it should be done.

    I've seen that the agent.ping behaves like that according to the documentation.
    According to the documentation the net.tcp.service should return a "0" if the check fails.

    I have tried the net.tcp.port as well, same behaviour.

    Has anyone else managed to get a "0" when the item check fails?


    Thanks in advance
  • stelben
    Junior Member
    • Aug 2014
    • 16

    #2
    Monitor connection from agent to X

    I don't know if I'm trying to solve the monitoring task in the wrong way but using an agent seemed the easies way too me.
    Please enlighten me if you think it should be done in another way.

    Comment

    • stelben
      Junior Member
      • Aug 2014
      • 16

      #3
      zabbix_get seems to work though

      More info:

      When I run (from the zabbix-server):
      zabbix_get -s 10.11.12.13 -k net.tcp.port [172.16.4.1,22] i get:
      0

      That is expected!
      But why does the item not see the "0"?
      It believes that no data is received. That is also the case when I look in the "latest data"-section in the UI on the Zabbix server.

      Is this "behaviour by design" (as Microsoft calls it...) or is something wrong?

      Comment

      • filipp.sudanov
        Senior Member
        Zabbix Certified Specialist
        • May 2014
        • 137

        #4
        From zabbix server it should work exactly the same as when you issue
        Code:
        zabbix_get -s 10.11.12.13 -k net.tcp.port [172.16.4.1,22]
        Try setting log level to 4 on the agent and see if it has any valuable info there.
        Grab a tcpdump / wireshark, check communication to that agent from zabbix_server and from zabbix_get and see if there are any differences. (That's rather for the case, when agent is passive, active agent will use different protocol). Protocols are described on zabbix.org.

        Is your agent active or passive?

        Comment

        • stelben
          Junior Member
          • Aug 2014
          • 16

          #5
          Thanks for your reply!

          It's a passive agent and passive item requests from the Zabbix server.

          I set the log level to 4 and got:

          29222:20140813:062813.274 collector [idle 1 sec]
          29223:20140813:062814.103 TCP expect network error: cannot connect to [[172.16.4.1]:22]: [4] Interrupte$
          29223:20140813:062814.103 Sending back [0]
          29223:20140813:062814.103 listener #1 [waiting for connection]
          29222:20140813:062814.274 collector [processing data]
          29222:20140813:062814.274 In update_cpustats()
          29222:20140813:062814.275 End of update_cpustats()
          29222:20140813:062814.275 collector [idle 1 sec]


          According to the log the agent is "Sending back [0]", which I also got from the zabbix-get.
          The request above was initiated by the zabbix server.
          I get the same log entry if I request i manually with zabbix-get.

          I took a breif look at the traffic with tcpdump but didn't find anything being malformed or missing on the way.

          Comment

          • filipp.sudanov
            Senior Member
            Zabbix Certified Specialist
            • May 2014
            • 137

            #6
            What's in the zabbix server log? Any failed sql queries? What's in the log for debug level 4?
            What db are you using?

            Comment

            • stelben
              Junior Member
              • Aug 2014
              • 16

              #7
              I'm using mysql
              Log excerpt from the zabbix server in log level 4:

              9629:20140813:075146.414 resuming Zabbix agent checks on host "10.11.35.14": connection restored
              9627:20140813:075155.292 Zabbix agent item "net.tcp.port[128.2.100.2,134]" on host "10.11.35.14" failed: first network error, wait for 15 seconds
              9629:20140813:075213.417 Zabbix agent item "net.tcp.port[128.2.100.2,134]" on host "10.11.35.14" failed: another network error, wait for 15 seconds
              9629:20140813:075231.419 Zabbix agent item "net.tcp.port[128.2.100.2,134]" on host "10.11.35.14" failed: another network error, wait for 15 seconds
              9629:20140813:075246.438 resuming Zabbix agent checks on host "10.11.35.14": connection restored
              9626:20140813:075255.818 Zabbix agent item "net.tcp.port[128.2.100.2,134]" on host "10.11.35.14" failed: first network error, wait for 15 seconds
              9629:20140813:075313.441 Zabbix agent item "net.tcp.port[128.2.100.2,134]" on host "10.11.35.14" failed: another network error, wait for 15 seconds
              9629:20140813:075331.443 Zabbix agent item "net.tcp.port[128.2.100.2,134]" on host "10.11.35.14" failed: another network error, wait for 15 seconds
              9629:20140813:075346.462 resuming Zabbix agent checks on host "10.11.35.14": connection restored

              Comment

              • filipp.sudanov
                Senior Member
                Zabbix Certified Specialist
                • May 2014
                • 137

                #8
                Can you show the log exactly when it's querieng agent for that item? Are you sure it's level 4 - the level is applied when you restart the server.

                Comment

                • stelben
                  Junior Member
                  • Aug 2014
                  • 16

                  #9
                  You were too quick to reply:-)

                  I noticed when I clicked [Submit Reply] that I forgot to restart the service.
                  I'm parsing the logs now...

                  Comment

                  • stelben
                    Junior Member
                    • Aug 2014
                    • 16

                    #10
                    Finally found some errors...
                    I'm changing the port number in order to simulate a DOWN-state.
                    As you see, when using 135 it works, 134 not.
                    One additional information is that the agent.ping seems to fail at the same time as I change from 135 to 134... Strange.

                    Logs:

                    Item succeded:

                    23624:20140813:094351.854 In DCconfig_get_poller_nextcheck() poller_type:0
                    23624:20140813:094351.854 End of DCconfig_get_poller_nextcheck():1407915832
                    23624:20140813:094351.854 poller #1 [got 0 values in 0.000069 sec, idle 1 sec]
                    23659:20140813:094352.274 self-monitoring [processing data]
                    23659:20140813:094352.274 In collect_selfmon_stats()
                    23659:20140813:094352.274 End of collect_selfmon_stats()
                    23659:20140813:094352.274 self-monitoring [processed data in 0.000040 sec, idle 1 sec]
                    23628:20140813:094352.423 poller #5 [got 1 values in 0.017522 sec, getting values]
                    23628:20140813:094352.423 In get_values()
                    23628:20140813:094352.423 In DCconfig_get_poller_items() poller_type:0
                    23628:20140813:094352.423 End of DCconfig_get_poller_items():1
                    23628:20140813:094352.423 In substitute_key_macros() data:'net.tcp.port[128.2.100.2,135]'
                    23628:20140813:094352.423 End of substitute_key_macros():SUCCEED data:'net.tcp.port[128.2.100.2,135]'
                    23628:20140813:094352.423 In substitute_simple_macros() data:'10050'
                    23628:20140813:094352.423 In get_value() key:'net.tcp.port[128.2.100.2,135]'
                    23628:20140813:094352.423 In get_value_agent() host:'10.11.35.14' addr:'10.11.35.14' key:'net.tcp.port[128.2.100.2,135]'
                    23628:20140813:094352.423 Sending [net.tcp.port[128.2.100.2,135]]
                    23628:20140813:094352.441 get value from agent result: '1'
                    23628:20140813:094352.441 End of get_value():SUCCEED
                    23628:20140813:094352.441 In activate_host() hostid:10177 itemid:23782 type:0
                    23628:20140813:094352.441 End of activate_host()
                    23628:20140813:094352.441 End of get_values():1



                    Item failed:

                    23647:20140813:095230.275 End of process_escalations()
                    23647:20140813:095230.275 escalator [processed 0 escalations in 0.000114 sec, idle 3 sec]
                    23659:20140813:095230.365 self-monitoring [processing data]
                    23659:20140813:095230.365 In collect_selfmon_stats()
                    23659:20140813:095230.365 End of collect_selfmon_stats()
                    23659:20140813:095230.365 self-monitoring [processed data in 0.000056 sec, idle 1 sec]
                    23659:20140813:095231.365 self-monitoring [processing data]
                    23659:20140813:095231.365 In collect_selfmon_stats()
                    23659:20140813:095231.365 End of collect_selfmon_stats()
                    23659:20140813:095231.365 self-monitoring [processed data in 0.000041 sec, idle 1 sec]
                    23629:20140813:095231.461 Item [10.11.35.14:net.tcp.port[128.2.100.2,134]] error: Get value from agent failed: ZBX_TCP_READ() failed: [4] Interrupted system call
                    23629:20140813:095231.461 End of get_value():NETWORK_ERROR
                    23629:20140813:095231.461 In deactivate_host() hostid:10177 itemid:23782 type:0
                    23629:20140813:095231.461 query [txnlev:1] [begin;]
                    23629:20140813:095231.461 query [txnlev:1] [update hosts set disable_until=1407916366,error='Get value from agent failed: ZBX_TCP_READ() failed: [4] Interrupted system call' wh$
                    23629:20140813:095231.461 query [txnlev:1] [commit;]
                    23629:20140813:095231.463 Zabbix agent item "net.tcp.port[128.2.100.2,134]" on host "10.11.35.14" failed: another network error, wait for 15 seconds
                    23629:20140813:095231.463 deactivate_host() errors_from:1407916315 available:1
                    23629:20140813:095231.463 End of deactivate_host()
                    23629:20140813:095231.463 End of get_values():1
                    23629:20140813:095231.463 In DCconfig_get_poller_nextcheck() poller_type:1
                    23629:20140813:095231.463 End of DCconfig_get_poller_nextcheck():1407916348
                    23629:20140813:095231.463 unreachable poller #1 [got 1 values in 3.002560 sec, getting values]

                    Comment

                    • stelben
                      Junior Member
                      • Aug 2014
                      • 16

                      #11
                      I found the problem.

                      It seems that the timeout default setting in the agent is 3 seconds and during that time the item processor times out.

                      I changed the value and I got a "0" back.

                      I have no clue why this messes with the agent.ping...

                      ### Option: Timeout
                      # Spend no more than Timeout seconds on processing
                      #
                      # Mandatory: no
                      # Range: 1-30
                      # Default:
                      Timeout=1

                      Comment

                      Working...