Ad Widget

Collapse

Templates and Dependencies in a large and complex setup

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mpotter-xiss
    Junior Member
    • Jul 2008
    • 10

    #1

    Templates and Dependencies in a large and complex setup

    This is likely to be a long post and for that I apologize. In order to ask the questions I need to ask I need to describe the environment we will be monitoring. I posted this to the mailing list first and got very little response.

    Environment:
    3 core switches
    < 5000 compute nodes divided into ~70 node clusters
    1 switch per standard rack
    Infrastructure to support these clusters

    At as simple a level as I can describe and still paint the picture, each cluster has two racks of server in a compute node role. In each rack are the compute nodes and a switch. At present these are the elements I am attempting to setup for monitoring. There will be more later but the questions I need to ask here will give me the necessary information to move forward.

    We need to make each node dependent on it's own switch and likewise each switch dependent on the core switch to which it is attached. With the volume of monitored items manual configuration of the triggers and dependencies is out of the question.

    My first question, and I admittedly come from a Nagios background, is whether or not the monitored services are automatically dependent on the host being up. In other words if I define a host alive check and that check fails will the services automatically go into "disabled' mode since the host itself is down?

    The scenario I see for setting up templates with dependencies is hard to describe but I will do my best to try and make it clear. Keep in mind we are trying to automate as much of this as possible through the template system.
    1. Assign each core switch a host-alive check and trigger
    2. Create one template per core to assign to the switches that contains only the host-alive check for the switch to which it will be assigned. In this template the host-alive check's trigger will have a dependency on the appropriate host-alive check on the core that serves the switch.
      1. Create various templates for the different switches in use to monitor traffic, ports, etc.
    3. Create one template per switch to be assigned to the nodes that are served by the switch. This template contains on the host-alive check for the nodes and the trigger for the host alive check will be dependent on the trigger for the switch's host-alive check.
      1. Create as many templates as necessary for the nodes based on standard services, hardware, etc.


    My biggest question is: In points 2.a and 3.a the monitored services won't have any dependencies defined as we would like to have a single template or single set of templates that can be assigned to any given host based on its role. Will the services being monitored outside of the templates described in points 2 and 3 be automatically dependent on a host alive check?

    This is the simplest way I can see to setup automatic dependency creation for the sheer scale of what we are monitoring. This is a necessary piece of the puzzle and if the above method will not work then we could use some guidance on how to automate as much of the dependency creating as possible. We would prefer to use the templates to eliminate as much human error as possible. Even using mass updates could introduce errors simply based on the scale of this project. Using templates we can clone them and reduce the human error factor by a very large amount.

    I have some much less desirable alternatives for consideration but would prefer to exhaust all possible methods of automating this before testing them. Any and all help will be greatly appreciated.
  • mpotter-xiss
    Junior Member
    • Jul 2008
    • 10

    #2
    Quick bump. I am not getting any answers on the mailing list or here as of yet. I would really appreciate any and all help with this piece of the installation.

    Comment

    • makini
      Member
      • Jul 2006
      • 59

      #3
      Well, that was long...

      First question:
      Yes and no - the monitored services on a host are dependent upon the host being alive. Same goes for other hosts dependent upon it.
      It's quite simple really. If you monitor a host\switch\etc. created via template or even manually, and the host stops returning data - timeouts, software failure, network latency, any reason really, all the "Items" that collect data for that host turn into an 'error' state and will not be considered active from Zabbix' point of view. Any items or triggers dependent upon those turn into "unknown" state until the host starts returning data again (there are various options in zabbix_server.conf to control timeouts and retry parameters for this). Those items will not generate an alert or pop the trigger assigned to'm. There is a way to confront this with itemname.nodata(sec) on triggers. This will alert you that the host's item in question stopped returning data and should be investigated.

      Second q.:
      I think the second question is partially repeating the first...
      I would only add that even if you manually create "dependencies" on triggers not inside a template, for the host's 'alive' check it might not be error proof from the same reasons as above - if the host stops returning data, your 'alive' check might not return data too... However, all in all, if set up correctly, trigger dependencies you require are possible to implement.

      Hope this helps...

      Comment

      • xs-
        Senior Member
        Zabbix Certified Specialist
        • Dec 2007
        • 393

        #4
        There's a mailing list? where?

        On dependencies, zabbix does not support ' host dependency' only trigger dependencies. Yes in large setups, where you rely heavily on templates, it's not usable. What you can do, in your situation, is create several 'base templates' which are linked to 'cluster templates' where you add the dependency for the layer down/up (whichever method you prefer). If your cluster/processing nodes float somewhat freely between clusters you can also use group linkage for the correct template and this dependency assignments.
        I hope host dependency will be implemented at some point tho.

        Comment

        • makini
          Member
          • Jul 2006
          • 59

          #5
          Yepp...

          Originally posted by xs-
          There's a mailing list? where?
          Yepp, with archives even! Here: http://sourceforge.net/mail/?group_id=23494
          and an irc channel (not sure how active though, never been there), all from here: http://www.zabbix.com/support_free.php

          Comment

          Working...