Ad Widget

Collapse

Monitoring services Windows cluster

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Kapersky91
    Junior Member
    • Apr 2023
    • 4

    #1

    Monitoring services Windows cluster

    Hello,

    I am trying to monitor WIndows services (switching after 6h) in a cluster - Failover Cluster Instance. I have:
    xx-01-tst-cs xxx.xxx.xxx.3 cluster and two active/passive nodes
    xx-01-tst-vm xxx.xxx.xxx.1
    xx-02-tst-vm xxx.xxx.xxx.2

    In the node configuration I set
    Hostname: xx-01,02-tst-vm
    ListenIP: xxx.xxx.xxx.1,2
    ListenPort: 10050
    LogFile: C:\Zabbix\Node\zabbix_agentd.conf

    On these VMs I added a second Zabbix agent in another directory C:\Zabbix\Cluster and started.
    Hostname: xx-01-tst-cs
    ListenIP: xxx.xxx.xxx.3
    ListenPort: 10050
    LogFile: C:\Zabbix\Cluster\zabbix_agentd.conf

    OS: Microsoft Windows Server 2019
    Zabbix 5.0.3 agent

    The cluster agent is started on the active node and stopped on the passive node. It is switched along with the services.

    On zabbix servers, the cluster and nodes are discovered, and the services are also discovered, but the services are flapping. It retrieves values from both nodes.

    I noticed that services not flapping when agent on xx-02-tst-vm is stopped.

    Does anyone have an idea how I can do this?

    Attached Files
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4806

    #2
    You discover services only through that 3rd, cluster agent?
    Flapping occurs no matter on which side the cluster agent is?
    I doubt you log into your config file.. But is there anything in logfiles?

    Comment

    • Kapersky91
      Junior Member
      • Apr 2023
      • 4

      #3
      Earlier I was discovered services on every agent. Now only on cluster agent. Yes, services flapping no matter on which side - only if node 1 or 2 agent working. Offcourse I log into config file. Meybe exist other way to monitoring services in Windows Cluster?


      13916:20230420:151725.827 Zabbix Agent stopped. Zabbix 5.0.3 (revision 146855bff3).
      1804:20230420:151834.631 Starting Zabbix Agent [xxxx-02-tst-vm]. Zabbix 5.0.3 (revision 146855bff3).
      1804:20230420:151834.631 **** Enabled features ****
      1804:20230420:151834.632 IPv6 support: YES
      1804:20230420:151834.633 TLS support: YES
      1804:20230420:151834.634 **************************
      1804:20230420:151834.634 using configuration file: C:\Zabbix\Node\zabbix_agentd.win.conf
      1804:20230420:151835.416 agent #0 started [main process]
      6448:20230420:151835.418 agent Home started [collector]
      13844:20230420:151835.419 agent Forum started[listener #1]
      4564:20230420:151835.419 agent #3 started[listener #2]
      5948:20230420:151835.420 agent #4 started[listener #3]

      Cluster log with DebugLevel=5

      3624:20230421:125929.410 Starting Zabbix Agent [xxxxx-01-tst-cs]. Zabbix 5.0.3 (revision 146855bff3).
      3624:20230421:125929.410 **** Enabled features ****
      3624:20230421:125929.411 IPv6 support: YES
      3624:20230421:125929.412 TLS support: YES
      3624:20230421:125929.412 **************************
      3624:20230421:125929.413 using configuration file: c:\zabbix\Cluster\zabbix_agentd.conf
      3624:20230421:125930.079 agent #0 started [main process]
      13076:20230421:125930.080 agent Home started [collector]
      11780:20230421:125930.081 agent Forum started[listener #1]
      8712:20230421:125930.082 agent #3 started[listener #2]
      3316:20230421:125930.082 agent #4 started[listener #3]
      12640:20230421:125930.083 agent #5 started [active checks #1]
      10280:20230421:130911.197 In init_active_metrics()
      12248:20230421:130911.197 In get_counter_name() pdhIndex:6
      10280:20230421:130911.198 buffer: first allocation for 100 elements
      12248:20230421:130911.198 End of get_counter_name():SUCCEED
      10280:20230421:130911.199 End of init_active_metrics()
      12248:20230421:130911.199 In add_perf_counter() counter:'\Processor(_Total)\% Processor Time' interval:900
      10280:20230421:130911.200 In send_buffer() host:'proxy' port:10051 entries:0/100
      12248:20230421:130911.200 add_perf_counter(): PerfCounter '\Processor(_Total)\% Processor Time' successfully added
      10280:20230421:130911.201 End of send_buffer():SUCCEED
      12248:20230421:130911.202 End of add_perf_counter(): SUCCEED
      10280:20230421:130911.202 In refresh_active_checks() host:'proxy port:10051
      12248:20230421:130911.203 In add_perf_counter() counter:'\Processor(0)\% Processor Time' interval:900
      12248:20230421:130911.204 add_perf_counter(): PerfCounter '\Processor(0)\% Processor Time' successfully added
      12248:20230421:130911.204 End of add_perf_counter(): SUCCEED
      12248:20230421:130911.217 End of add_perf_counter(): SUCCEED
      12248:20230421:130911.218 In get_counter_name() pdhIndex:2
      12248:20230421:130911.219 End of get_counter_name():SUCCEED
      12248:20230421:130911.219 In get_counter_name() pdhIndex:44
      12248:20230421:130911.220 End of get_counter_name():SUCCEED
      12248:20230421:130911.220 In add_perf_counter() counter:'\System\Processor Queue Length' interval:900
      12248:20230421:130911.223 add_perf_counter(): PerfCounter '\System\Processor Queue Length' successfully added
      12248:20230421:130911.223 End of add_perf_counter(): SUCCEED
      12248:20230421:130911.224 End of init_cpu_collector():SUCCEED​
      Last edited by Kapersky91; 21-04-2023, 13:38.

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4806

        #4
        I never had to do it for windows clusters. But approach seems absolutely reasonable to move agent instance around with cluster resources...
        Just throwing random thoughts around here... If you start that cluster agent on different port? 11050 for example. What happens then?

        Comment

        • Kapersky91
          Junior Member
          • Apr 2023
          • 4

          #5
          I set port 11050 on node 1 agent and wasn't flapping. Nextly I was set port 11050 on node 2 where cluster agent working and cluster stop collecting data

          Comment

          • cyber
            Senior Member
            Zabbix Certified SpecialistZabbix Certified Professional
            • Dec 2006
            • 4806

            #6
            My windows-fu is not strong enough without hands-on testing options.. Maybe someone else with more windows experience has something useful to say...

            Comment

            • bamorim
              Junior Member
              • Mar 2024
              • 1

              #7
              I believe these steps can help:

              Key "wmi.getall[root\MSCLUSTER, select Id from MSCluster_Resource]" to dicovery the resources

              Preprocessing to filter the resources's id

              Key wmi.getall[ROOT\MSCLUSTER, Select State From MSCluster_Resource Where ID='{#RSC.ID}'] to monitore the state.

              Install Zabbix Agent on two servers, but not as a cluster resource.

              The Cluster has its own IP and hostname.

              Comment

              • Brain2000
                Junior Member
                • Aug 2024
                • 16

                #8
                I'm doing this now. I run the Zabbix agent on both nodes. I then create a host for each set of services that can be passed around.
                For example, I have a SQL cluster with 4 instances, giving me six Zabbix hosts.
                One for each node, and four for the SQL instances. I have the nodes handle the C: drive, and each instance is assigned to monitor its own physical/logical drive.

                We stamp out all of our SQL clusters the same way, so I was able to create templates for all of this, and import just enough from the MSSQL for Zabbix Template as well as the Windows Server Zabbix Template, and use macros to define which drives are included or excluded for each instance.

                Comment

                • nick_kok
                  Junior Member
                  • Mar 2024
                  • 1

                  #9
                  Hello!

                  How did you manage to start more than one service for the MSSQL instances on each node?
                  The way I have it set up, I can only start the node service and one of the instance services
                  When I try to start up the second one, it fails without even writing logs


                  Lets say my cluster consists of

                  Node1 -> 10.10.10.11
                  Node2 -> 10.10.10.12

                  Instance1 -> 10.10.10.21
                  Instance2 -> 10.10.10.22
                  Instance3 -> 10.10.10.23
                  Instance4 -> 10.10.10.24

                  Instances 1+3 are "native" to node1
                  Instances 2+4 are "native" to node2

                  I have one conf file for each instance on both nodes
                  I have set up the listener IP for those conf files to the respective instance IP
                  I have set up separate mssql conf files for each instance with the following

                  Plugins.MSSQL.System.Path=C:\Program Files\Zabbix Agent 2\zabbix-agent2-plugin-mssql.exe
                  Plugins.MSSQL.Default.Uri=sqlserver://instance1:1433

                  I am at my wit's end with this and there is no conclusive information or instructions out there

                  Comment

                  Working...