Ad Widget

Collapse

Monitoring Clustered Items

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • gregmurphy
    Junior Member
    • Jan 2014
    • 3

    #1

    Monitoring Clustered Items

    I'm using Zabbix 2.2.1 on Linux (Ubuntu 13.04) to monitor my environments, and am in the process of deploying some Corosync/Pacemaker failover clusters. I'm struggling to find a way to effectively monitor these clusters in a templated way.

    For example - each cluster will have DRBD/MySQL/Apache installed, but MySQL and Apache will only ever be running on one of the nodes at a time due to it being a failover cluster.

    As a result, if I take the traditional approach, I'll always at least have a ton of unsupported items on one of the hosts at any time (and a large number of emails when a failover occurs and the items go supported/unsupported).

    Triggers are also a challenge, as I could write custom ones for each cluster - taking the example from https://www.zabbix.com/documentation...ers/expression I could write
    {node1roc.num[apache2,,,].last(0)}<1&{node2roc.num[apache2,,,].last(0)}<1
    But I don't know how (if?) I can do this in a template. Ideally I want the trigger to be running against a pseudo-host (the cluster) rather than on each actual host, so it would only check/alert once for the cluster rather than alerting from both hosts. I'm also not sure how I'd pass the two actual hostnames to the template if I implemented the example above.

    I've seen some examples of people monitoring clusters by creating another host against the cluster's VIP, but unfortunately this isn't an option for me as I'm running in the public cloud (MS Azure) and so can't have any VIPs running on the hosts themselves.

    So, my questions are:
    1. Is there any plans to introduce the concept of "pseudo-hosts" such as clusters to Zabbix that could exist across multiple real hosts?
    2. Is there any way I can easily get my items to "follow" the service as it fails over from one node to another?
    3. Is it possible to create triggers that monitor the status of two hosts in a template?

    Thanks in advance

    Greg Murphy
  • steveboyson
    Senior Member
    • Jul 2013
    • 582

    #2
    As we are having a comparable situation (via DRBD replicated LVM volume groups with either attached iSCSI-daemons or "switching" NFS/Samba resources used as datastores for VMware)

    we solved it that way:

    - created a new, third IP address for our cluster resources
    - created a template "cluster" which checks disk usage etc. on our cluster/shared resources
    - assigned that template to our "all-time-on" cluster IP

    Then:
    - created a template "cluster node"
    - defined checks in the template which check heartbeat status, cluster status and all other needed metrics (e.g. when a cluster switch is performed, one cluster goes to "Secondary" while the other goes to "Primary") like cluster counters (read, write, ...), diskstate and what else is needed
    - assigned that tempate to the cluster nodes

    Comment

    • gregmurphy
      Junior Member
      • Jan 2014
      • 3

      #3
      Thanks for the suggestion. This is the way I ordinarily would have approached the problem, but as I mentioned in my post my environment is deployed in MS Azure which doesn't allow more than one IP on each host. (Our HTTP connection failover is managed outside of the VMs by the Azure load balancer)

      As I'm typing this reply though, its made me think of a way I might be able to implement that approach.

      I could maybe configure a Zabbix agent endpoint on this load-balancer and make sure this endpoint fails over along with the HTTP one. Its a bit ugly, but might work, so I'll give it a try.

      It would still be great if Zabbix had the concept of a cluster to avoid workarounds like this!

      Comment

      • steveboyson
        Senior Member
        • Jul 2013
        • 582

        #4
        Glad you've found a way. But on the other hand: what is the real relevant part of the whole story?

        We decided that it is a running service as seen by our users - so we keep an eye "from the outside" what means our checks behave as if they would live outside of the cluster, not knowing anything about switched resources at all.

        Of course we want to know when a cluster node switches. That is why we placed additional items on the cluster nodes to get their metrics as well.

        Comment

        • gregmurphy
          Junior Member
          • Jan 2014
          • 3

          #5
          I quite agree about monitoring the user experience. I think Zabbix is generally an excellent piece of software, but where it falls down (in my opinion) is that it is still too focussed on the server rather than the service.

          The workaround of creating a dummy host against a virtual IP works to an extent, but doesn't cover all use cases - for example, true "outside-in" web monitoring of a load-balanced web site couldn't be done against the public IP address of the site without a Zabbix agent being available on that IP - not something you'd want to expose on a public IP address.

          Comment

          Working...