Ad Widget

Collapse

A proper way to avoid false triggering after server update

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • it01011001
    Junior Member
    • Jul 2021
    • 4

    #1

    A proper way to avoid false triggering after server update

    Hello all,

    I have several Nginx reverse proxies sending traffic to several upstreams. We had a number of problems when Nginx on reverse proxies said that it cannot contact upstreams, so we decided to monitor that to isolate the problem (e.g. reverse proxy Nginx, NGFW between reverse proxies and upstreams, or upstreams) and now I am trying to set up a proper monitoring that upstreams are accessible from reverse proxies.

    I cannot use web check or HTTP agent for that because they are executed from Zabbix server or proxy, so I decided to do a simple web.page.get check for that, and extract HTTP code e.g.:
    Code:
    items:
    -
    uuid: 45767972cbfc41e9a55bdc0b676696e2
    name: 'Upstream 1 HTTP Code'
    type: ZABBIX_ACTIVE
    key: 'web.page.get[10.10.10.10,,80]'
    history: 14d
    preprocessing:
    -
    type: LTRIM
    parameters:
    - HTP/1.
    -
    type: REGEX
    parameters:
    - '([0-9]+)'
    - \0
    triggers:
    -
    uuid: f344c18d6f7d497cbf1f1865eb7fbaa3
    expression: 'last(/Reverse Proxy Web Checks/web.page.get[10.10.10.10,,80])<>200'
    name: 'Upstream 1 Non-200 HTTP Code'
    priority: AVERAGE
    -
    uuid: 8b3b2f7fc21e4f6d8d6540dd692c7ba6
    expression: 'nodata(/Reverse Proxy Web Checks/web.page.get[10.10.10.10,,80],3m)=1'
    name: 'Upstream 1 Unavailable for more than 3 minutes'
    priority: AVERAGE
    It works well most of the time, but if I restart zabbix server (not agent), I see a lot of fired triggers about upstream unavailability that resolve quickly. It is quite funny that I still have HTTP code 200 in OPDATA field of those triggers, so the upstreams are actually available. Moreover, when I am looking at HTTP code graph for upstream items I see that there were no actual nodata periods, e.g. agents successfully collected all data and uploaded it to the Zabbix server.

    It is obvious that if I created the above item and triggers at the host level, I could use the agent.ping item to distinguish the unavailability of the upstream from the unavailability of the host itself. But if I create a template to monitor a number of servers, I cannot use agent.ping because it is located in a template module that is already used.

    Of course I can clone the entire "OS Linux by Zabbix Agent Active" template and add the following triggers to it. But if I need to monitor the availability of a number of different items on different hosts, I will end up with a lot of cloned templates and it will ruin all modularity of templates I am trying to maintain.

    I wonder if there is a more elegant way to avoid such false positives.

    p.s. I actually think that it looks rather like a bug than a misconfiguration but I am a novice in Zabbix and I know I could be wrong.
Working...