Ad Widget

Collapse

Problem with zabbix agent for windows

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Andreas Bollhalder
    Senior Member
    Zabbix Certified Specialist
    • Apr 2007
    • 144

    #16
    I'm still observing the problem on one of my machine (w2k sp1). Maybe the following is related.

    When Zabbix Agent disfunct, i have 190 connection (netstat) in "CLOSE_WAIT" state:
    Code:
      TCP    chzhwi50:10050         zabbix.foo.bar:32859  CLOSE_WAIT
      TCP    chzhwi50:10050         zabbix.foo.bar:33142  CLOSE_WAIT
      TCP    chzhwi50:10050         zabbix.foo.bar:33250  CLOSE_WAIT
    ...
    Is this normal behavior ? In the logs, I couldn't find anything special, but I can provide them when requested.

    Andreas
    Zabbix statistics
    Total hosts: 380 - Total items: 12190 - Total triggers: 4530 - Required server performance: 224.2

    Comment

    • NB-beheer
      Junior Member
      • May 2007
      • 11

      #17
      Same problem on two Windows 2003 servers running 1.4.1 client. It just suddenly stops working. After a restart of the client it works again.

      Zabbix GUI Configuration, Hosts:
      ZBX_TCP_READ() failed [Interrupted system call]

      zabbix_server.log:
      Get value from agent failed. Error: ZBX_TCP_READ() failed [Connection reset by peer]

      zabbix_agentd.log:
      3676:20070716:093952 Call to PdhCollectQueryData() failed: No data to return.
      3676:20070716:093953 Call to PdhCollectQueryData() failed: No data to return.
      3676:20070716:093954 Call to PdhCollectQueryData() failed: No data to return.
      3676:20070716:093955 Call to PdhCollectQueryData() failed: No data to return.

      After restarting the client:
      132:20070716:100054 Processing request.
      5132:20070716:100054 In check_security()
      5132:20070716:100054 Requested [system.cpu.util[,system,avg5]]
      5132:20070716:100054 Sending back [20.293333]
      3488:20070716:100054 Processing request.
      3488:20070716:100054 In check_security()
      3488:20070716:100054 Requested [vfs.fs.size[d:,free]]
      3488:20070716:100054 Sending back [35564908544]
      4668:20070716:100054 Call to PdhCollectQueryData() failed: No data to return.

      4668:20070716:100055 Call to PdhCollectQueryData() failed: No data to return.

      5836:20070716:100056 Processing request.
      5836:20070716:100056 In check_security()
      5836:20070716:100056 Requested [vm.memory.size[free]]
      5836:20070716:100056 Sending back [1310228480]
      4668:20070716:100056 Call to PdhCollectQueryData() failed: No data to return.

      4668:20070716:100057 Call to PdhCollectQueryData() failed: No data to return.

      5132:20070716:100058 Processing request.
      5132:20070716:100058 In check_security()
      5132:20070716:100058 Requested [system.cpu.util[,system,avg1]]
      5132:20070716:100058 Sending back [21.133333]
      4668:20070716:100058 Call to PdhCollectQueryData() failed: No data to return.

      Comment

      • davide
        Junior Member
        • May 2006
        • 6

        #18
        Mee too I have the problem PdhCollectQueryData() failed: No data to return."

        My system is a Windows 2003 SP2

        Anyone can help me?

        Comment

        • chispyder
          Junior Member
          • Jun 2007
          • 9

          #19
          Has any progress been made on this issue? Anything new to report?

          I have a win2K server that is having the same issue. 2 other (identical) win2K servers work fine.

          OS on the server having the issue is
          Windows RDC12 5.0.2195 Windows 2000 Service Pack 4 Intel IA-32

          Server does have the PDH.dll file supplied in win2K distro.

          Jeff

          Comment

          • chispyder
            Junior Member
            • Jun 2007
            • 9

            #20
            Oops, sorry - forgot to mention...

            Zabbix server runnign version 1.4.1. Windows 2K machine in question running zabbix_agentd.exe version 1.4.1.

            Jeff

            Comment

            • NOB
              Senior Member
              Zabbix Certified Specialist
              • Mar 2007
              • 469

              #21
              Hi,

              the performance counters and the CPU collectors (more or
              less the same) are initialised in src/zabbix_agent/stats.c.
              However, the return code of the initialisation is not checked.
              I suggest checking the error code and if not zero,
              exiting the zabbix_agentd collector. This ensures
              that a good error message is written to the log file.
              The agent will terminate, anyway, I think, because one of
              it's subprocesses exited.

              I'll check why there is no access for the performance counters
              and report any findings.

              Will compile a new 32-bit zabbix_agentd.exe, no 64-bit compiler
              available for me, over here. Is there an update to MS Visual Studio 6
              to create 64-bit applications or how do I compile a 64-bit agentd ?

              Regards,

              Norbert.

              Comment

              • NOB
                Senior Member
                Zabbix Certified Specialist
                • Mar 2007
                • 469

                #22
                Hi,

                it's even worse with the 1.4 series.
                On all 4 systems I tried with 1.3.8 the
                performance counters - modulo the [] bug - were successfully read.
                But the new agent (1.4, 1.4.1 and pre-1.4.2) can no longer access
                the Performance Counters.

                I didn't change anything on the servers and on one server I am
                absolutely sure about it, because I am the only user/manager.

                As soon as I have more time I'll try to figure out what did change
                in this area in the code and what can be the problem.
                It happens on the latest Win2k3 R2 as well as on w2k SP4 !

                Regards,

                Norbert.

                Comment

                • Alexei
                  Founder, CEO
                  Zabbix Certified Trainer
                  Zabbix Certified SpecialistZabbix Certified Professional
                  • Sep 2004
                  • 5654

                  #23
                  Please give me several performance counters which do not work under 1.4.x.
                  Alexei Vladishev
                  Creator of Zabbix, Product manager
                  New York | Tokyo | Riga
                  My Twitter

                  Comment

                  • chispyder
                    Junior Member
                    • Jun 2007
                    • 9

                    #24
                    Well, here are a couple that I monitor(ed)...

                    Alias=Apache.mem.workingerf_counter[\Process(Apache)\Working Set]Alias=Apache1.proc.timeerf_counter[\Process(Apache#1)\% User Time]

                    In as far as process memory utilization goes, I monitor 7 processes all specific to my applications. I also monitor Apache and Java processes.

                    The processor monitoring ("% User Time") does not work as a result of the "%" sign. Although I read the other post I can not figure out how to apply the "fix" ("[&'%").

                    All agents are 1.4.1 with the server also at 1.4.1.

                    Comment

                    • NOB
                      Senior Member
                      Zabbix Certified Specialist
                      • Mar 2007
                      • 469

                      #25
                      Originally posted by Alexei
                      Please give me several performance counters which do not work under 1.4.x.
                      Every single one.
                      So, say, the CPU Total, etc.
                      For every performance counter I request there is the
                      error message written in so many posts in the thread:

                      Code:
                      1184:20070723:143248 Call to PdhCollectQueryData() failed: No data to return.
                      
                      1184:20070723:143249 Call to PdhCollectQueryData() failed: No data to return.
                      
                      1184:20070723:143250 Call to PdhCollectQueryData() failed: No data to return.
                      I am thinking that this is a general problem, but I can not investigate it
                      further, right now.

                      Norbert.

                      Comment

                      • NOB
                        Senior Member
                        Zabbix Certified Specialist
                        • Mar 2007
                        • 469

                        #26
                        Hi,

                        here are my tests / checks:

                        First conclusion: the 1.4 agentd.exe (64-bit) seems to work fine modulo the [] bug, while the 1.4.1 agentd.exe (64-bit) does not.

                        Here are the logfiles of three sessions, the first one is my 1.4 agent with
                        the [] bug fixed, the second is the original 1.4 agent incl. the [] bug
                        and the third is the latest 1.4.1 agent:

                        Code:
                        49680:20070724:080611 zabbix_agentd started. ZABBIX 1.4.
                        49688:20070724:080611 zabbix_agentd collector started
                        49692:20070724:080611 zabbix_agentd listener started
                        49696:20070724:080611 zabbix_agentd listener started
                        49700:20070724:080611 zabbix_agentd listener started
                        49704:20070724:080611 zabbix_agentd active check started [10.0.0.1:10051]
                        50144:20070724:081053 zabbix_agentd started. ZABBIX 1.4.
                        50148:20070724:081053 zabbix_agentd collector started
                        50152:20070724:081053 zabbix_agentd listener started
                        50156:20070724:081053 zabbix_agentd listener started
                        50160:20070724:081053 zabbix_agentd listener started
                        50164:20070724:081053 zabbix_agentd active check started [10.0.0.1:10051]
                        48728:20070724:081253 zabbix_agentd started. ZABBIX 1.4.1.
                        49060:20070724:081253 zabbix_agentd collector started
                        48672:20070724:081253 zabbix_agentd listener started
                        48640:20070724:081253 zabbix_agentd listener started
                        48632:20070724:081253 zabbix_agentd listener started
                        48184:20070724:081253 zabbix_agentd active check started [10.0.0.1:10051]
                        49060:20070724:081253 Call to PdhCollectQueryData() failed: No data to return.
                        
                        49060:20070724:081254 Call to PdhCollectQueryData() failed: No data to return.
                        The corresponding requests for the first session:

                        Code:
                        s3% zabbix_get -s i90 -k 'agent.version'
                        1.4
                        s3% zabbix_get -s i90 -k 'perf_counter[\Network Interface(Intel[R] Pro_1000 MT Network Connection)\Packets Received Errors]'     
                        0.000000
                        The requests for the second session (original 1.4 64-bit Agent):
                        Code:
                        s3% zabbix_get -s i90 -k 'agent.version'
                        1.4
                        ss3% zabbix_get -s i90 -k 'perf_counter[\Network Interface(Intel[R] Pro_1000 MT Network Connection)\Packets Received Errors]'
                        ZBX_NOTSUPPORTED
                        s3% zabbix_get -s i90 -k 'system.uname' 
                        Windows I90 5.2.3790 Windows Server 2003 Service Pack 2 AMD-64
                        s3% zabbix_get -s i90 -k 'perf_counter[\Processor(_Total)\% Processor Time]'
                        0.003840
                        s3% zabbix_get -s i90 -k 'perf_counter[\Processor(_Total)\% Processor Time]'
                        0.003840
                        Now the requests for the last sessions (original 1.4.1 64-bit Agent):

                        Code:
                        s3% zabbix_get -s i90 -k 'agent.version'
                        zabbix_get [12618]: Timeout while executing operation.
                        s3% zabbix_get -s i90 -k 'perf_counter[\Processor(_Total)\% Processor Time]'
                        zabbix_get [12763]: Timeout while executing operation.
                        I didn't have time to check for the differences in the source code.
                        I guess, Alexei will find it sooner than I'll have time to even start doing it

                        Added later:

                        On one system, even the 1.4 64-bit agentd.exe crashes, if active checks
                        are enabled in the conf file.

                        The logfile contains:
                        Code:
                         15344:20070724:084603 zabbix_agentd started. ZABBIX 1.4.
                         13748:20070724:084603 zabbix_agentd collector started
                         15264:20070724:084603 zabbix_agentd listener started
                         13920:20070724:084603 zabbix_agentd listener started
                         11336:20070724:084603 zabbix_agentd listener started
                         14496:20070724:084603 zabbix_agentd active check started [10.0.0.1:10051]
                         14496:20070724:084603 In init_active_metrics()
                         14496:20070724:084603 In refresh_metrics('10.0.0.1',10051)
                         14496:20070724:084603 get_active_checks('10.0.0.1',10051)
                         14496:20070724:084603 Sending [ZBX_GET_ACTIVE_CHECKS
                        i91
                        ]
                         14496:20070724:084603 Before read
                         14496:20070724:084603 In parse_list_of_checks('eventlog[application]:30:7330
                        perf_counter[\Processor(_Total)\% Processor Time]:30:0
                        proc.num[lsass.exe]:30:0
                        service_state[McAfee Framework Service]:30:0
                        service_state[McAfee McShield]:30:0
                        service_state[McAfee Task Manager]:30:0
                        service_state[Server]:30:0
                        service_state[Workstation]:30:0
                        system.cpu.load[,avg15]:20:0
                        system.cpu.load[,avg1]:5:0
                        system.cpu.load[,avg5]:10:0
                        system.swap.size[,free]:30:0
                        system.uptime:300:0
                        vfs.fs.size[c:,pfree]:30:0
                        vm.memory.size[free]:30:0
                        ZBX_EOF
                        ')
                         14496:20070724:084603 In disable_all_metrics()
                         14496:20070724:084603 Parsed [eventlog[application]:30:7330]
                         14496:20070724:084603 In add_check('eventlog[application]', 30, 7330)
                         14496:20070724:084603 Parsed [perf_counter[\Processor(_Total)\% Processor Time]:30:0]
                         14496:20070724:084603 In add_check('perf_counter[\Processor(_Total)\% Processor Time]', 30, 0)
                         14496:20070724:084603 Parsed [proc.num[lsass.exe]:30:0]
                         14496:20070724:084603 In add_check('proc.num[lsass.exe]', 30, 0)
                         14496:20070724:084603 Parsed [service_state[McAfee Framework Service]:30:0]
                         14496:20070724:084603 In add_check('service_state[McAfee Framework Service]', 30, 0)
                         14496:20070724:084603 Parsed [service_state[McAfee McShield]:30:0]
                         14496:20070724:084603 In add_check('service_state[McAfee McShield]', 30, 0)
                         14496:20070724:084603 Parsed [service_state[McAfee Task Manager]:30:0]
                         14496:20070724:084603 In add_check('service_state[McAfee Task Manager]', 30, 0)
                         14496:20070724:084603 Parsed [service_state[Server]:30:0]
                         14496:20070724:084603 In add_check('service_state[Server]', 30, 0)
                         14496:20070724:084603 Parsed [service_state[Workstation]:30:0]
                         14496:20070724:084603 In add_check('service_state[Workstation]', 30, 0)
                         14496:20070724:084603 Parsed [system.cpu.load[,avg15]:20:0]
                         14496:20070724:084603 In add_check('system.cpu.load[,avg15]', 20, 0)
                         14496:20070724:084603 Parsed [system.cpu.load[,avg1]:5:0]
                         14496:20070724:084603 In add_check('system.cpu.load[,avg1]', 5, 0)
                         14496:20070724:084603 Parsed [system.cpu.load[,avg5]:10:0]
                         14496:20070724:084603 In add_check('system.cpu.load[,avg5]', 10, 0)
                         14496:20070724:084603 Parsed [system.swap.size[,free]:30:0]
                         14496:20070724:084603 In add_check('system.swap.size[,free]', 30, 0)
                         14496:20070724:084603 Parsed [system.uptime:300:0]
                         14496:20070724:084603 In add_check('system.uptime', 300, 0)
                         14496:20070724:084603 Parsed [vfs.fs.size[c:,pfree]:30:0]
                         14496:20070724:084603 In add_check('vfs.fs.size[c:,pfree]', 30, 0)
                         14496:20070724:084603 Parsed [vm.memory.size[free]:30:0]
                         14496:20070724:084603 In add_check('vm.memory.size[free]', 30, 0)
                         14496:20070724:084603 Parsed [ZBX_EOF]
                         14496:20070724:084603 In process_active_checks('10.0.0.1',10051)
                        and then the agent crashed. The agent on the other two systems
                        are doing fine with these checks.

                        Regards,

                        Norbert.
                        Last edited by NOB; 24-07-2007, 09:04. Reason: Added more debugging info (active vs. passive checks)

                        Comment

                        • rolandsym
                          Member
                          • Jul 2007
                          • 76

                          #27
                          Also have this issue

                          Hi,
                          I'm also having the same problem with the win32 client on both windows xp sp2 machine and windows 2003 server. It involves active checks. It worked fine without active checks but the log still gives the pdh error in the log. Also after about an hour the service seems to fix itself for a little bit.
                          Zabbix looks great but this is kind of a big deal for me not being able to monitor windows machines reliably. I'm using 1.4.1 client and server on ubuntu.
                          Any information I can give to help out?

                          Comment

                          • oliverm
                            Senior Member
                            • May 2006
                            • 155

                            #28
                            Is there any news on this? We have just tried rolling out the 1.4 agent to several SBS2003 boxes and all the checks on all the machines are failing and the log just shows

                            Call to PdhCollectQueryData() failed: No data to return.


                            Help !!! its very embarassing.

                            Olly

                            Comment

                            • Alexei
                              Founder, CEO
                              Zabbix Certified Trainer
                              Zabbix Certified SpecialistZabbix Certified Professional
                              • Sep 2004
                              • 5654

                              #29
                              Originally posted by oliverm
                              Is there any news on this? We have just tried rolling out the 1.4 agent to several SBS2003 boxes and all the checks on all the machines are failing and the log just shows

                              Call to PdhCollectQueryData() failed: No data to return.


                              Help !!! its very embarassing.

                              Olly
                              Yes, the problem has been fixed today. As a workaround, you may add any performance counter to zabbix_agentd.conf to get rid of the messages.

                              Note that the issue DOES NOT affect functionality of ZABBIX agent!
                              Alexei Vladishev
                              Creator of Zabbix, Product manager
                              New York | Tokyo | Riga
                              My Twitter

                              Comment

                              • oliverm
                                Senior Member
                                • May 2006
                                • 155

                                #30
                                if it doesnt affect the agent then perhaps we are suffering from two different problems as each agent with that problem also shows as being not available in the zabbix server page. Agents without that problem show as available.

                                Could this be related at all? We can telnet to port 10050 on each of the agents from the zabbix box, so communication isnt the issue.

                                Olly

                                Comment

                                Working...