Ad Widget

**BNC** · 10-12-2019, 12:31

Thanks for you (valuable) input.

You raise a lot of good points, but please allow me to review them to see if I understand them correctly...
Also, for what it's worth, the update interval is the default: 30 seconds.

Adding the results is so simple I'm flabbergasted I didn't think about it (let's call that the learning curve !), yet it might not apply to what I need (see below)
For your proposed trigger to work, it looks like I'd have to use the servers' names with the keys, instead of the template's (else on each server the trigger will compare the addition of all results to both 3 and 0, which will always return 0). That would give us:
- Code:
```
[SIZE=12px](({ipsecA1:ipsec.status.last()} + {ipsecA1:keepalived.private.last()}
		+ {ipsecA1:keepalived.public.last()}) <> 3) and (({ipsecA2:ipsec.status.last()}
		+ {ipsecA2:keepalived.private.last()} + {ipsecA2:keepalived.public.last()}) <> 0)[/SIZE]
```
  But that means I'll have to create items and triggers on each machine (as I can't call a server item directly from a template)...
  Doesn't that defeat the idea of having templates deploy items, triggers and such on servers ?
  Or am I missing something ?
Last but not least, I'm not sure I understand the part about nodata:
- On one line you propose to use nodata(3m) on triggers, and the next line (and the example after that), you use last()...
  I'm a bit lost here
- Also, if I understand nodata correctly (and I quote its wiki definition):
  - Code:
```
[SIZE=12px]Returns:
			1 - if no data received during the defined period of time
			0 - otherwise[/SIZE]
```
  - Thing is, the item is getting data every 30 seconds, be it 0 or 1, so it will always return 1 (unless the server is down, I suppose)
    According to the wiki, it checks for any data, not new data
- What I first understood from your sentence "So we add triggers with nodata(3m) on that items to get that info" was that it was possible to add trigger functions (such as nodata(3m), min(3m), max(3m), avg(3m)...) to items, and then call last() on the triggers to get something akin to what I expected last(3m) to do...
  Is that actually possible, or even desirable ?
  So far, just about all of my items are merely referencing the keys (like ipsec.status), and that's it...

Since the time I posted on SO, I've been trying a different, more targeted approach, for my trigger.
At first I used min(3m) instead of the deceptive last(3m).
I tested it on the master/primary, where items return 1 when OK, and it seemed fine... but obviously it wouldn't work on the secondary.
I've then tried with delta(3m), but I don't know if it's the best function to use, and I didn't get much action on the servers to have a real-life test of my trigger.

I also added ping checks to both the IPs managed by keepalived, but I'm still wondering if it wouldn't be better to have them issued by the Zabbix server, rather than the clients themselves.

Lastly, since after some time the errors that resulted in an alarm were pushed away, the results were normalized to the errors (effectively making the alarm disappear), I added the TRIGGER.VALUE combo (my IPsec master is on the Zabbix Server 2.4.7) to keep it relevant.

It was of course before you showed me the addition possibility, and it gave something like this (I exploded the expression to make it readable):

Code:

[SIZE=12px](
    {TRIGGER.VALUE}=0 and (
        (
            {Template IPsec:ipsec.status.delta(3m)}<>{Template IPsec:keepalived.private.delta(3m)}
        ) or (
            {Template IPsec:ipsec.status.delta(3m)}<>{Template IPsec:keepalived.public.delta(3m)}
        ) or (
            {Template IPsec:keepalived.private.delta(3m)}<>{Template IPsec:keepalived.public.delta(3m)}
        )
    ) and (
        (
            {Template IPsec:icmpping[pr.iv.ate.IP].min(3m)}<>1
        ) or (
            {Template IPsec:icmpping[pu.bl.ic.IP].min(3m)}<>1
        )
    )
) or (
    {TRIGGER.VALUE}=1 and (
        (
            {Template IPsec:icmpping[pr.iv.ate.IP].min(3m)}<>1
        ) or (
            {Template IPsec:icmpping[pu.bl.ic.IP].min(3m)}<>1
        )
    )
)[/SIZE]

The issue with this trigger is that it won't go off problems with the tunnel or IPs (sometimes keepalived will detect something that it doesn't like, and enable the IPs on the secondary server, even if they are still present on the primary) unless the IPs are down.
I might have to make another trigger to specifically check if keepalived hasn't gone crazy.
And probably even another to make sure the tunnel is OK on both sides (even if one side should be enough).
I was hoping to have all of this in just one trigger, but is it even possible ?

NB: the 3 minutes I use in my trigger functions is a side-effect of strongSwan sometimes going AWOL for the whole duration of the retransmission timeout, which is 165 secs.
Since 98% of the time it seems to have no effect on the tunnel (I'll see with the team at strongSwan to clear that up), I set the 3 minutes to avoid unnecessary alarms.
There is also the occasional blip that doesn't have an impact on the tunnel, but will trigger an alarm nonetheless, and the idea is to ignore them.

Ad Widget

Proper Zabbix trigger function(s) to check multiple boolean(ish) values ?

Proper Zabbix trigger function(s) to check multiple boolean(ish) values ?

Comment