Ad Widget

Collapse

Windows Client Stopped Reporting Data

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Trips2007
    Junior Member
    • Sep 2012
    • 17

    #1

    Windows Client Stopped Reporting Data

    I have a Zabbix server at (let's say) monitor.mydomain.com. There are Zabbix clients on Linux servers happily reporting in. I also have a Zabbix client on a Windows server. Let's call that server www.mydomain.com. It reported in for several weeks, but now has stopped. I'm sure something in the environment has changed, but I cannot figure out what.

    I tried restarting the client on the Windows machine with

    DebugLevel=4

    It still does not report any new data to the Zabbix server, but I can see a more detailed log. Do these log entries point to a cause? Where should I look to trouble shoot this problem?

    NOTE: The forum software thinks logged references to my server URLs are "links" and complains about them, so I had to edit the server URL's to be mydomaincom instead of mydomain.com to not trigger that error. In the original log, they were mydomain.com.

    Code:
    2632:20130126:224902.736 Starting Zabbix Agent [www.mydomaincom]. Zabbix 2.0.0 (revision 27673).
      2632:20130126:224902.742 In init_perf_collector()
      2632:20130126:224902.743 End of init_perf_collector()
      3596:20130126:224902.747 agent #0 started [collector]
      3596:20130126:224902.747 In init_cpu_collector()
      3596:20130126:224902.747 In get_counter_name() pdhIndex:238
      4012:20130126:224902.751 agent #1 started[listener]
      4004:20130126:224902.751 agent #2 started[listener]
      3372:20130126:224902.751 agent #3 started[listener]
      1004:20130126:224902.752 agent #4 started [active checks]
      1004:20130126:224902.752 In init_active_metrics()
      1004:20130126:224902.752 Buffer: first allocation for 100 elements
      1004:20130126:224902.752 In send_buffer() host:'monitor.mydomaincom' port:10051 values:0/100
      1004:20130126:224902.753 End of send_buffer():SUCCEED
      1004:20130126:224902.753 refresh_active_checks('monitor.mydomaincom',10051)
      1004:20130126:224902.760 Sending [{
    	"request":"active checks",
    	"host":"www.mydomaincom"}]
      1004:20130126:224902.760 Before read
      1004:20130126:224902.764 Got [{
    	"response":"success",
    	"data":[]}]
      1004:20130126:224902.764 In parse_list_of_checks()
      1004:20130126:224902.764 In disable_all_metrics()
      1004:20130126:224902.764 In process_active_checks('monitor.mydomaincom',10051)
      1004:20130126:224902.764 End of process_active_checks()
      1004:20130126:224902.764 In get_min_nextcheck()
      1004:20130126:224902.765 Sleeping for 1 second(s)
      3596:20130126:224903.214 End of get_counter_name():SUCCEED
      3596:20130126:224903.215 In get_counter_name() pdhIndex:6
      3596:20130126:224903.215 End of get_counter_name():SUCCEED
      3596:20130126:224903.215 In add_perf_counter() counter:'\Processor(_Total)\% Processor Time' interval:900
      3596:20130126:224903.227 add_perf_counter(): PerfCounter '\Processor(_Total)\% Processor Time' successfully added
      3596:20130126:224903.227 In add_perf_counter() counter:'\Processor(0)\% Processor Time' interval:900
      3596:20130126:224903.227 add_perf_counter(): PerfCounter '\Processor(0)\% Processor Time' successfully added
      3596:20130126:224903.227 In get_counter_name() pdhIndex:2
      3596:20130126:224903.228 End of get_counter_name():SUCCEED
      3596:20130126:224903.228 In get_counter_name() pdhIndex:44
      3596:20130126:224903.228 End of get_counter_name():SUCCEED
      3596:20130126:224903.228 In add_perf_counter() counter:'\System\Processor Queue Length' interval:900
      3596:20130126:224903.229 add_perf_counter(): PerfCounter '\System\Processor Queue Length' successfully added
      3596:20130126:224903.229 End of init_cpu_collector():SUCCEED
      3596:20130126:224903.229 In collector_diskdevice_add() devname:''
      3596:20130126:224903.230 End of collector_diskdevice_add():00000000005FDDB0
      3596:20130126:224903.230 In collect_perfstat()
      1004:20130126:224903.769 In send_buffer() host:'monitor.mydomaincom' port:10051 values:0/100
      1004:20130126:224903.770 End of send_buffer():SUCCEED
    
      ...
    
      3596:20130126:231501.420 In collect_perfstat()
      1004:20130126:231501.873 In send_buffer() host:'monitor.mydomaincom' port:10051 values:0/100
      1004:20130126:231501.873 End of send_buffer():SUCCEED
      1004:20130126:231501.874 Sleeping for 1 second(s)
      3596:20130126:231502.421 In collect_perfstat()
      1004:20130126:231502.874 In send_buffer() host:'monitor.mydomaincom' port:10051 values:0/100
      1004:20130126:231502.874 End of send_buffer():SUCCEED
      1004:20130126:231502.874 refresh_active_checks('monitor.mydomaincom',10051)
      1004:20130126:231502.877 Sending [{
    	"request":"active checks",
    	"host":"www.mydomaincom"}]
      1004:20130126:231502.877 Before read
      1004:20130126:231502.881 Got [{
    	"response":"success",
    	"data":[]}]
  • tchjts1
    Senior Member
    • May 2008
    • 1605

    #2
    Have you recently upgraded Zabbix agent on your hosts to 2.0?

    Also, take a look on your Zabbix server... look for zabbix_server.log and see what it is saying about that host.

    Comment

    • herta
      Senior Member
      • Sep 2011
      • 101

      #3
      firewall?

      Is there a firewall between your zabbix server and your windows system (including the windows firewall)?

      Comment

      • Trips2007
        Junior Member
        • Sep 2012
        • 17

        #4
        Both servers are running in AWS in security groups that allow ports 10050 and 10051. The Windows server has a firewall exception for both ports as well, for any IP address (will lock that down once this is working).

        Here's what zabbix_server.log shows after a fresh restart (the Linux based clients are reporting in data after the restart)

        Zabbix ~$ more /tmp/zabbix_server.log
        19880:20130129:170934.635 Starting Zabbix Server. Zabbix 2.0.2 (revision 29214).
        19880:20130129:170934.635 ****** Enabled features ******
        19880:20130129:170934.635 SNMP monitoring: YES
        19880:20130129:170934.636 IPMI monitoring: NO
        19880:20130129:170934.636 WEB monitoring: YES
        19880:20130129:170934.636 Jabber notifications: NO
        19880:20130129:170934.636 Ez Texting notifications: YES
        19880:20130129:170934.636 ODBC: NO
        19880:20130129:170934.636 SSH2 support: NO
        19880:20130129:170934.636 IPv6 support: YES
        19880:20130129:170934.636 ******************************
        19882:20130129:170934.777 server #1 started [configuration syncer #1]
        19884:20130129:170934.782 server #3 started [poller #1]
        19883:20130129:170934.783 server #2 started [db watchdog #1]
        19885:20130129:170934.789 server #4 started [poller #2]
        19886:20130129:170934.793 server #5 started [poller #3]
        19887:20130129:170934.797 server #6 started [poller #4]
        19890:20130129:170934.803 server #9 started [trapper #1]
        19889:20130129:170934.805 server #8 started [unreachable poller #1]
        19891:20130129:170934.807 server #10 started [trapper #2]
        19888:20130129:170934.809 server #7 started [poller #5]
        19895:20130129:170934.812 server #13 started [trapper #5]
        19894:20130129:170934.813 server #12 started [trapper #4]
        19896:20130129:170934.813 server #14 started [icmp pinger #1]
        19893:20130129:170934.814 server #11 started [trapper #3]
        19903:20130129:170934.820 server #17 started [timer #1]
        19902:20130129:170934.821 server #16 started [housekeeper #1]
        19902:20130129:170934.821 executing housekeeper
        19904:20130129:170934.821 server #18 started [http poller #1]
        19901:20130129:170934.822 server #15 started [alerter #1]
        19905:20130129:170934.826 server #19 started [discoverer #1]
        19906:20130129:170934.834 server #20 started [history syncer #1]
        19907:20130129:170934.834 server #21 started [history syncer #2]
        19916:20130129:170934.847 server #23 started [history syncer #4]
        19918:20130129:170934.849 server #25 started [proxy poller #1]
        19917:20130129:170934.849 server #24 started [escalator #1]
        19880:20130129:170934.853 server #0 started [main process]
        19914:20130129:170934.854 server #22 started [history syncer #3]
        19922:20130129:170934.854 server #26 started [self-monitoring #1]
        19902:20130129:170935.908 housekeeper deleted: 1511 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions

        Comment

        • tchjts1
          Senior Member
          • May 2008
          • 1605

          #5
          As per my previous post, did you recently upgrade your monitored Windows host to 2.0.x?

          If so, did you also upgrade your zabbix_agentd.conf file to include the ServerActive= parameter?

          Not sure what else it could be since your logs are not showing any errors.
          Where are you looking for recent data? Is it on screens/graphs, or are you looking under "Latest Data"?

          Comment

          • Trips2007
            Junior Member
            • Sep 2012
            • 17

            #6
            Originally posted by tchjts1
            As per my previous post, did you recently upgrade your monitored Windows host to 2.0.x?
            No, this was a fresh install of 2.0.2. The zabbix_agentd.conf file does have the ServerActive= parameter

            Originally posted by tchjts1
            Not sure what else it could be since your logs are not showing any errors.
            Where are you looking for recent data? Is it on screens/graphs, or are you looking under "Latest Data"?
            I'm looking on the graphs of CPU data. If I look on Latest Data for that server, I see the latest data was sent last month. I did not touch the Zabbix configuration during that time (though I'm sure something in the environment must have changed).

            Comment

            • herta
              Senior Member
              • Sep 2011
              • 101

              #7
              connection issue?

              The logs look normal. Anything abnormal in the windows event log?

              Can you telnet
              - from the windows server to port 10051 on the zabbix server
              - from the zabbix server to port 10050 on the windows server

              If you nslookup
              - the zabbix server from the windows server
              - the windows server from the zabbix server
              do you get the results you expect?

              Are the two servers located in the same subnet? If so, can you check the arp tables to see whether the ip addresses are still unique?

              Comment

              • tchjts1
                Senior Member
                • May 2008
                • 1605

                #8
                Originally posted by Trips2007
                No, this was a fresh install of 2.0.2. The zabbix_agentd.conf file does have the ServerActive= parameter
                Hmmm. According to your zabbix_agentd.log, you are not on 2.0.2 (For the host) But as long as you have the ServerActive=xxxxxxxxx parameter set, it's all good in that regard.

                Code:
                2632:20130126:224902.736 Starting Zabbix Agent [www.mydomaincom]. [B]Zabbix 2.0.0[/B]

                Comment

                • Trips2007
                  Junior Member
                  • Sep 2012
                  • 17

                  #9
                  Originally posted by herta
                  Can you telnet
                  - from the windows server to port 10051 on the zabbix server
                  That was it! I did not realize that 10051 was an outbound connection from the client.

                  Thanks for your help!

                  Comment

                  Working...