Ad Widget

Collapse

Recovery for BGP sessions - Logic failing

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • stevencnz
    Junior Member
    • Feb 2017
    • 15

    #1

    Recovery for BGP sessions - Logic failing

    Hi have BGP monitoring setup as follows:

    Problem Trigger
    ================
    {snmptrap["Peer: {#PEER_IP} has transitioned from \w+ to \w+"].str(from Established)}=1
    or
    (not({bgpPeerOperationalStatus[{#PEER_IP}].last()}=6) and {bgpPeerAdminStatus[{#PEER_IP}].last()}=2)

    Meaning... IF you get trap stating BGP is down OR if the last poll says the BGP state is NOT 6 (Established) and the Admin State is 2 (start... meaning it isn't manually shut down) - THEN trigger.



    Recovery condition
    ====================
    Recovery: {snmptrap["Peer: {#PEER_IP} has transitioned from \w+ to \w+"].str(to Established)}=1
    or
    ({bgpPeerOperationalStatus[{#PEER_IP}].last()}=6)
    or
    ({bgpPeerAdminStatus[{#PEER_IP}].last()}=1)

    Meaning... IF you get a trap saying it's up OR if the last poll says BGP operational state is 6 (Established) OR if the Admin State is 1 (stop... meaning it is manually shut down) - THEN recover.


    Triggers are working fine. Zabbix alerts appropriately. Meaning if I shutdown a session both devices on either end of the BGP session will trigger as being down.

    But the recovery fails. On the next polling cycle, Zabbix should see that one of the two BGP sessions is manually shut down (e.g. Admin State is 1). This should recover. But it doesn't.

    I do manual SNMP walks and check the latest data from Zabbix itself. It all matches. I even test the expression using the expression tester. But Zabbix just will not recover.

    Can anyone assist?
  • stevencnz
    Junior Member
    • Feb 2017
    • 15

    #2
    I think I've managed to figure it out. It seems Zabbix doesn't evaluate the recovery condition unless the trigger condition no longer evaluates to true. This means that the recovery condition is used more to prevent flapping and to make sure recover only take place when the problem is truly resolved.

    I will have to try a different approach to achieve that I am after.

    Comment

    Working...