Ad Widget

Collapse

DRAFT - Remotly Monitoring a Location reliably

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Wolfgang
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified Specialist
    • Apr 2005
    • 116

    #1

    DRAFT - Remotly Monitoring a Location reliably

    Hello,

    based on another thread i was thinking about an extension to zabbix that would allow to

    "Monitor a remote site reliable"


    Idea
    Basically we want to monitor a couple of sites connected via WAN-Links, VPN-Links or just the Internet (using an SSH-Tunnel or Stunnel etc.)

    Usally we do not need to get access to all collected data 24/7, _but_ need to get informed if something goes wrong in one of these sites.

    Lets assume that each site runs its own zabbix server with its own database and its individual items and triggers. This will keep that traffic for monitoring within the site.

    Now we need a piece of *middleware* that does the following:
    -A kind of Watchdog that monitors that the Zabbix-Server on Site is functional
    -A kind of Watchdog that monitors that the WAN/VPN/Internet Link is functional
    -A mechanism that forwards an event in case a triggers fires on that site.

    In some cases it might be benefical, if a standard protocol like http can be used to transfer these data to avoid any firewall issues on site.
    Also it would be great if zabbix wouldn't require patches.


    Terminolgy
    Central Server = Server that gets data (Watchdog and Events) from one ore more Satellite Server.
    Satellite Server = Server on the site in question that passes Events to the Central Server if a trigger fires.
    Watchdog = Mechanism to ensure that the Satellite has a connection to the Central Server.
    Event = Data that is send from the Satellite Server to the Central Server because a trigger fires.


    Draft
    As outlined before, the zabbix Satellite Server would be configured as usal.
    The only exception would be to:
    -add a user item that validates that the connection to the Central Server is online (Watchdog).
    This custom item can be a perlscript or shellscript or a simple wget that calls a specific webpage and passes ServerID+Password+DateTime+Status etc.
    This webpage would be a php or perl script that passes the received data to a database (like mysql or postgre). Easiest would be a table within the zabbic database.
    Adding also the local DateTime of the Central Server would allow to work even if the time between Central Server and Satellite Server would run out of sync.

    -add a custom media that passes data to the Central Server in case a trigger fires on the Satellite Server. (Event)
    If a trigger gets fired on the Satellite Server, in addition to the standard alerting procedure on that site, the resulting data would also be passed via a custom script (defined as an additional media) to the Central Server.
    The mechanism could be very simlar to the way the watchdog is implemented.

    How does the Central Server get the data provided by the Satellite Server from the database?

    The Central Server would define two custom items for each Satellite Server.
    -One to monitor the Watchdog to ensure that the Satellite Server _could_ send data if needed.
    If there is now watchdog-record in the table within a given time, the Central Server would know that the connection is down.
    -One to get Events out of the database in case the Satellite Server has fired a trigger. If there is no data, but the watchdog is ok, then everthing would be fine.
    Otherwise the data would contain the Event with the corresponding state (Trigger On/Off). The state is needed to get notified if something bad works ok again.

    Open questions in this draft
    Q: Where to define the Id's for the Satellite Server.
    A: Without modifications to zabbix, one option would be to add user on the Central Server and use that User/Password combos.

    Q: How to pass data / watchdogs from the Satellite Server?
    A: A simple way would be to use http/https and get/post via a simple script.

    Q: How to encrpyt data/passwords being passed
    A: Easiest would be to use https or use encryption of data with the scripts that put/get the data.

    Note
    Of cause, if zabbix provides such kind of functionallity in the future, all of this would become obsolete ;-)

    Any comments?
    Last edited by Wolfgang; 06-03-2006, 23:31.
    http://www.intellitrend.de
    Specialised in monitoring large environments and Zabbix API programming.
  • crs9
    Member
    • Feb 2006
    • 35

    #2
    About your Questions

    Q: Where to define the Id's for the Satellite Server.
    What about adding a table specifically for satellite specific info and config?

    Q: How to pass data / watchdogs from the Satellite Server?
    I think you idea would be best as far as using a script or such.

    Q: How to encrpyt data/passwords being passed?
    I guess for a version 1 setup, make this the responsibility of the overall system design and not make it a requirment of the product at this time?

    My 2 cents about your statment:
    "Usally we do not need to get access to all collected data 24/7, _but_ need to get informed if something goes wrong in one of these sites"

    What I was thinking more of was a master/ drone situation, where the drones(satellite server) is reporting all info back to the master database. This includes all collected data. In this fashion, you can have reliable end user SLA all located on a central repository for analysis. Sure, you wont see the data in real time at the central location, but that can be adjustable based on your update interval.

    Anyone else see a need for this type of setup, or have comments?

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Thanks for your ideas. Distributed monitoring will be seriously considered in a next version of ZABBIX, but not in 1.1.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • Wolfgang
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified Specialist
        • Apr 2005
        • 116

        #4
        I found a single but important showstopper in the draft.
        -add a custom media that passes data to the Central Server in case a trigger fires on the Satellite Server. (Event)
        If a trigger gets fired on the Satellite Server, in addition to the standard alerting procedure on that site, the resulting data would also be passed via a custom script (defined as an additional media) to the Central Server.
        Well, this will not work because if trigger "a" is on ERROR and trigger "b" is on ERROR and then trigger "a" changes back to OK, then the final result on the Central-Server would be OK for that site which is wrong.

        So what would be needed is an item to query the status of a host as a result of all triggers assigned to that host
        and
        an item for the site that queries the statuses of all hosts.

        But as far as i am concerned, there is no single item that could be asked for via a trigger, to get information about the status of a host.
        In other words: I am not aware of any trigger (check_status[host] is just an example) like:

        {zabbix.sf.net:check_status[host].last(0)}=error
        {zabbix.sf.net:check_status[host].last(0)}=ok

        where "error" indicates that at least one trigger reports a failure for that host and "ok" indicates that none of the triggers reports an issue for that host.

        ...And also a query on the consolidated statuses of all hosts that gets monitored by one zabbixserver (check_status[site] is again just an example):

        {zabbix.sf.net:check_status[site].last(0)}=error
        {zabbix.sf.net:check_status[site].last(0)}=ok

        where "error" indicates that at least one trigger reports a failure of any host for that site and "ok" indicates that none of the triggers of any of the the hosts of that site reports an issue.

        Remark: Beside this attempt for monitoring a remote site, i think these sort of trigger would be good extension for zabbix in general.


        @crs9
        Thank your for comments.
        What I was thinking more of was a master/ drone situation, where the drones(satellite server) is reporting all info back to the master database. This includes all collected data. In this fashion, you can have reliable end user SLA all located on a central repository for analysis. Sure, you wont see the data in real time at the central location, but that can be adjustable based on your update interval.
        I see where you are coming from. The original design focused on a simple implementation without modifications to zabbix. Synchronizing the tables in database would require:
        -either unique id's for items, hosts, trigger etc. on each zabbixserver.
        or
        a modification of how zabbix creates ids based on a given "siteid". i.e itemid = siteid-itemsubid, hostid=siteid-hostsubid etc.
        or
        an extension to the database scheme for the tables in question with some sort of unique "siteid" which in my opinion would be the better.

        Now, going further on this route, the question would be wether the inital design would become obsolete.

        Well i do not think so - at least not all of it.
        The way the status of a host or an entire site is determind, could be used for a compressed sort of view.
        Example: Lets say we have 10 sites. Each site has 20 hosts. Each host has 15 items. Each item has an trigger assigned.

        There can be an overview on:
        -all-sites-level that would show 10 entries.
        -all-hosts-per-selected-site-level that would show 20 entries.
        -all-triggers-per-selected-host-level that would show 15 entries.

        So still reasonable.
        Also, except "all-sites-level", the other views would also be useful for a single instance zabbix installation.

        Of cause, the details of how data and watchdog information is passed between satellite/central server, could be integrated more tight, because modifications to zabbix would have to be done anyway when going this route.

        Q: Where to define the Id's for the Satellite Server.
        What about adding a table specifically for satellite specific info and config?
        Well the basic idea for the design was to make as few changes as possible to the zabbix core. Adding a table would require at least modification to the frontend.
        However, if going the "database" route, then an additional table would make a lot of sense.


        @Alexei
        Thank you for your update. I certainly understand that distributed monitoring will not be part of Version 1.1. I hope you do not mind brainstorming
        Last edited by Wolfgang; 10-03-2006, 22:12.
        http://www.intellitrend.de
        Specialised in monitoring large environments and Zabbix API programming.

        Comment

        • crs9
          Member
          • Feb 2006
          • 35

          #5
          Totally understand something like this couldn'y make it into v1.1. Regardless of the route took, I think centralized monitoring would easily set zabbix apart from all other NMS currently available, open and closed source.

          Comment

          Working...