Ad Widget

Collapse

Spurious alerts from Trigger prototypes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rajlistuser
    Junior Member
    • Mar 2018
    • 11

    #1

    Spurious alerts from Trigger prototypes

    I am usig zabbix to monitor docker containers in docker hosts. I have a script in docker host which queries docker and returns values. It is called from zabbix agent using the following configuration

    Code:
    UserParameter=docker.containers.discovery,/etc/zabbix/scripts/docker.sh discovery
    UserParameter=docker.containers.count,/etc/zabbix/scripts/docker.sh count
    UserParameter=docker.containers.discovery.all,/etc/zabbix/scripts/docker.sh discovery_all
    UserParameter=docker.containers.count.all,/etc/zabbix/scripts/docker.sh count_all
    
    # First parameter: container id
    # Second parameter: one of netin, netout, cpu, disk, memory, uptime, up or status
    UserParameter=docker.containers[*],/etc/zabbix/scripts/docker.sh "$1" "$2"
    UserParameter=docker.status[*],/etc/zabbix/scripts/docker.sh "$1" "$2"
    
    #######################################################################
    # Compatibility with www.monitoringartist.com docker templates
    
    UserParameter=docker.discovery,/etc/zabbix/scripts/docker.sh discovery
    UserParameter=docker.up[*],/etc/zabbix/scripts/docker.sh "$1" up
    
    # Ignore the second argument for docker.cpu (system vs user)
    UserParameter=docker.cpu[*],/etc/zabbix/scripts/docker.sh "$1" cpu
    
    # Ignore the second argument for docker.mem (total_cache vs total_rss vs total_swap)
    UserParameter=docker.mem[*],/etc/zabbix/scripts/docker.sh "$1" memory
    The scripts were taken from https://github.com/digiapulssi/zabbi...toring-scripts

    I added a template with discovery rules to discover new containers, and a trigger prototype to alert when a container goes down. I have created a media type of type script and uses it in an action to send alert to external web server.

    My problem is that when one container goes down I get an alert for all containers which are in discovered hosts.

    Image shows the status before eager_blackwell goes down: If status is 1 container is up and zero when down.
    Docker status when eager_blackwell was UP


    When container eager_blackwell goes down, this is the alert I expect: (This json is given as the Default Message in Action -> Operation)

    Code:
    {
        "eventId": "5981",
        "eventTime": "12:44:48",
        "itemValue": "0",
        "hostName": "eager_blackwell",
        "triggerSeverity": "High",
        "eventDate": "2018.04.20",
        "triggerId": "17315",
        "itemKey": "docker.containers[eager_blackwell, up]",
        "triggerName": "Docker container down",
        "itemName": "Container eager_blackwell up:",
        "triggerUrl": "",
        "triggerStatus": "PROBLEM"
    }
    Status after eager_blackwell goes down


    Please note that the hostName and itemKey have same host.

    Along with this I also get alerts for other container which are present in the docker (or which have been removed previously, but its "Keep lost resources period" is not over yet) like the two samples below:

    This container is not present in docker host, but zabbix maintains the state as its "Keep lost resources period" is not over.
    Code:
    {
        "eventId": "5988",
        "eventTime": "12:44:56",
        "itemValue": "0",
        "hostName": "happy_franklin",
        "triggerSeverity": "High",
        "eventDate": "2018.04.20",
        "triggerId": "17290",
        "itemKey": "docker.containers[eager_blackwell, up]",
        "triggerName": "Docker container down",
        "itemName": "Container eager_blackwell up:",
        "triggerUrl": "",
        "triggerStatus": "PROBLEM"
    }
    As you can see the hostname in hostName and itemKey are different.

    This one is present in the docker host and is still online as it can be seen from the screen shot above:

    Code:
    {
        "eventId": "5989",
        "eventTime": "12:45:02",
        "itemValue": "0",
        "hostName": "tender_pasteur",
        "triggerSeverity": "High",
        "eventDate": "2018.04.20",
        "triggerId": "17343",
        "itemKey": "docker.containers[eager_blackwell, up]",
        "triggerName": "Docker container down",
        "itemName": "Container eager_blackwell up:",
        "triggerUrl": "",
        "triggerStatus": "PROBLEM"
    }
    Over all I received 8 alerts while I expect only one.

    I only want the alert to be generated only if hostName and itemKey have same host. I am not sure how to spedify it.

    My trigger prototype is shown as below.
    Trigger Prototype Screen shot



    I have also attached the template as xml for review. Any help to get this working would be much appreciated.
  • rajlistuser
    Junior Member
    • Mar 2018
    • 11

    #2
    After reading the docs again carefully, it seems that I am missing correct conditions in the Action. The current action screen shot is as follows:

    zabbix template actions


    There is no condition specified which limits alerts to the host that just went down, so if I am right it sends alert to all hosts which have the template Template App Docker - digiapulssi.

    I only want alerts to be generated for the host which went down and not for the rest. The hosts themselves are created using an LLD I do not know how to use a condition involving a host discovered using LLD (for which I cannot predict the host name in advance) and triggered using Trigger Prototype

    Comment

    • rajlistuser
      Junior Member
      • Mar 2018
      • 11

      #3
      Saw two tickets for issue I am referring to:
      1. https://support.zabbix.com/browse/ZBXNEXT-2064
      2. https://support.zabbix.com/browse/ZBX-5309

      Comment

      • rajlistuser
        Junior Member
        • Mar 2018
        • 11

        #4
        Another way to look at this is the reason why so many [spurious] problems are being generated for a single host, as seen in the screen shot below:

        Click image for larger version

Name:	problems_list.png
Views:	751
Size:	123.4 KB
ID:	357630


        In the above screen shot, only container-ase-1040 was down, while others were up and running, but zabbix generated one alert for each host linked to template Template App Docker - digiapulssi. I am expecting only one alert from the trigger, but got nine.

        I am not able to figure out why this happens!

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          If there is no dependencies between the triggers you will see all activated alarms.
          https://www.zabbix.com/documentation...s/dependencies
          Last edited by kloczek; 22-04-2018, 18:54.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • rajlistuser
            Junior Member
            • Mar 2018
            • 11

            #6
            Thanks for the answer!

            There is only one trigger which is a Trigger prototype activated on discovered host created using LLD. Since there is only one trigger, I do not quite understand how can I use dependencies to activate alarm only for the host that went down.

            Comment

            • kloczek
              Senior Member
              • Jun 2006
              • 1771

              #7
              You must design exact logic of the trigger like "host is down" then all other triggers should depend on such trigger.
              Doesn't matter are we talking about set of triggers populated by LLD(s) or just fixed set of triggers.
              If other triggers will be activated and they will depend on this one they will be not visible and only trigger which will be listed as active will be "host is down".

              Look on the logic of such triggers in my template https://github.com/kloczek/zabbix-te...ter/OS%20Linux.
              In this template in critical state many of the triggers triggers are activated but because additional dependencies only "SYS::Host is down" is visible.

              Nevertheless, this is not an issue with zabbix but designed triggers logic issue which is implemented or not in exact template (doesn't matter did some triggers relays on monitoring data of exact host, group of hosts or any other metrics data).
              PS. I've been not looking on the details of your template and I'm not going to help you design such logic (sorry). Helping you would require understand what exact logic you are going to implement on alarming layer.
              http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
              https://kloczek.wordpress.com/
              zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
              My zabbix templates https://github.com/kloczek/zabbix-templates

              Comment

              • rajlistuser
                Junior Member
                • Mar 2018
                • 11

                #8
                Thanks for the reply. I think I am stuck at the design exact logic of the trigger like "host is down" phase. There seems to be some thing about zabbix alerting that I do not yet understand.

                My Problem expression is {Template App Docker - digiapulssi:docker.containers[{#CONTAINERNAME}, up].last()}=0 which I expect to fire only for a single container when it goes down.

                When A container goes down I can see that this value is indeed 0, and all other containers has value 1. But it is firing for all containers linked to template Template App Docker - digiapulssi. Is that the expected behavior when LLD macro like {#CONTAINERNAME} is used?

                I such a case, how can a trigger be written to filter this exact host? I am not looking for a specific hand holding, just pointers to documentation.

                Comment

                • kloczek
                  Senior Member
                  • Jun 2006
                  • 1771

                  #9
                  As I've not been looking on the details of your template I can guess that your main problem is that metric which provides state of the container is outside of the container monitoring template.
                  If it is true you may try to move somehow this metric to the container template. As long as metric and the trigger on top of the metric will be within the template construct dependency tree of the triggers will be easier.
                  As long as your docker.containers[{#CONTAINERNAME}, up] metric is outside problem is how to automatically add dependencies between the triggers which are not generated by single template.

                  Other possible solution is to add inter host triggers dependencies on create container by the orchestration using API. This orchestration should know where docker.containers of the exact host are and that this metric should have some dependencies added to exact container triggers.

                  PS. IMO problem is that currently available containerization technologies are still not mature, and/or none of them are fully working (as they should) in enough wast case scenarios.
                  Many people are still working on new Linux containerization because existing bugs (instead fixing those bugs many people decides to develop own/new containerization introducing even more problems).
                  IMO it will take few years until most of those implementation will die and only one of few will remain.
                  Up to this moment it will be really hard to implement consist monitoring of those type of the objects .. because at the moment most of the people are focused only on some containerisation aspects like HA, orchestration, resource management but separately, and real monitoring problems are at the end of the priorities list.
                  Proper containerization should solve all those aspect together without putting them on kind of priorities/importance list because initial design assumptions must be compliant with all those aspects issues.

                  IMO most mature containerisation at the moment it is Solaris zoning (it is available already almost 14 years). However on Linux designing similar technology cannot be developed because typical Linux NIH syndrome (Not Invented Here) and because most of the containerisation software developers never have been working with Solaris zoning or on they work they've been only basing on other people description how Solaris zoning works
                  Look .. even used by you docker template has some logical flaws because containers[{#CONTAINERNAME},up] metric on first look is half metric and half alarm (kind of "not dog and not an otter" syndrome )
                  http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                  https://kloczek.wordpress.com/
                  zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                  My zabbix templates https://github.com/kloczek/zabbix-templates

                  Comment

                  • kloczek
                    Senior Member
                    • Jun 2006
                    • 1771

                    #10
                    Just as an exercise I've downloaded to my zabbix your template.
                    I think that you are making kind of mistake.
                    Your template is listed in "host prototype" host template. It creates kind of loop.
                    Look on standard zabbix VMWare templates about how it is implemented here "host prototypes"
                    http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
                    https://kloczek.wordpress.com/
                    zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
                    My zabbix templates https://github.com/kloczek/zabbix-templates

                    Comment

                    Working...