Ad Widget

Collapse

Complex SNMP Traps: Best Practices?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lolstrup
    Junior Member
    • Jul 2021
    • 11

    #1

    Complex SNMP Traps: Best Practices?


    Hi,

    Background: Our setup has 2 Zabbix servers (in HA cluster), and all collection is made by Docker-deployed Zabbix proxies in different network zones.
    We have around 2500 hosts - mostly VMs monitored by agents. We will be adding lots of different network devices, most of which have templates already available, but some platforms will need custom developed templates.

    With our old monitoring system, we would be able to map interesting fields in the traps to a decent event model and simply display those.
    In Zabbix, it seems we need to create an item for every single event type, meaning we have to preemptively know exactly what devices are able to send.

    The concrete case is a trap that has 5 fields to determine its uniqueness:
    • NodeId
    • AlarmName
    • ApplicationName
    • InstanceId
    • TableKey

    Then of course they have severity, description and a bunch of other fields. So far I've seen >100 different combinations of these fields.
    I've seen you can use tags and recovery expressions to solve this partly.

    Is this really the best way?
    1. Do I need to use tons of regex expressions to extract the tags for every single trigger?
    2. What if an alarm can have multiple different severities (not just up/down)? Do I need a separate trigger for each of those?
    3. Does it make sense to use a trap parser like SNMPTT, and will this break existing templates using OIDs/SNMP names?
    Any help or comments are appreciated, thanks!
  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #2
    Good background info on your environment -- lots of people asking questions here don't provide any details about their environment, so what you've provided is really helpful.

    You don't say what Zabbix version you're using, but I don't think that will matter for this particular question, as I think the answer will be similar for Zabbix 5.x or 6.x.

    Specifically for question #3, about breaking existing templates: Are you using any templates that have SNMP TRAP items? Or, are all the SNMP templates you're using SNMP polling only?

    With current Zabbix versions it's always necessary to use some kind of "bridge" between "snmptrapd", which receives the TRAP and hands it off to the "bridge" for processing, and Zabbix. Without this intermediate layer, I'm not aware of any way to inject TRAPs into Zabbix. Ultimately, I think this means you need to use something, whether it's SNMPTT or something else.

    Going back to the "will this break existing templates", I don't really see how it would. Most existing templates that I'm aware of that use SNMP at all use polling. If you have templates that have TRAP items, then I would think that the templates would require you to set up the "bridge" layer anyway. As I said, I'm not aware of any other way currently to get TRAPs into Zabbix.

    For your question 2, about whether separate triggers are required: I would say separate triggers are definitely the natural way to handle that in Zabbix. There are certainly cases where you could do everything with one trigger, but the trigger expression logic generally would become more complex when you try to do that. It's usually going to be more obvious to others that might have to look at what you've set up if you keep the expressions as simple as you can for your needs. In my experience, that's usually going to mean separate triggers for each severity range.

    I don't really know how to address your first question. Hopefully someone else here that makes heavier use of SNMP TRAPs than my site currently does can chime in and suggest what works for their environment. I can say that you don't need to use tags, but depending upon how you've set up the rest of your environment, using tags may be the best idea for your environment. I think the same can be said for regexes. You don't need to use them, but you have to process the trap somehow and "direct" it to Zabbix. Regexes are a common way to do that with the existing scripts, but that isn't the only possibility. For example, I can imagine a big site with an SNMP expert on staff could write their own "bridge" software (in something like Go) that processes the traps directly and gets its "routing" information from a config file, rather than a bunch of regexes compiled into the program. Until someone writes something like that and shares it with the rest of us, the existing scripts and regexes are kind of the lowest common denominator method.

    Hopefully some of this helps, and hopefully some others that are heavily using SNMP TRAPs can share what they're doing. My site is only doing a small amount of TRAP processing with Zabbix currently, so the perl script and a few regexes have been easy enough for my site's needs.

    Comment

    • lolstrup
      Junior Member
      • Jul 2021
      • 11

      #3
      Thank you for the great response!

      I forgot to mention we always keep Zabbix updated, so we're at 6.2.1. But yes, I don't think it makes a difference for this.

      Yes, some of the built-in SNMP templates use snmptrap items as well. Like the "EtherLike-MIB SNMP" template - it has snmptrap prototypes for each interface.
      The SNMP template I'm building is only using traps. That's how this vendor's application sends alarms to us.

      We do use a bridge between snmptrapd and the Zabbix proxies. It comes with the zabbix-snmptraps image. It simply dumps the traps into a file that is read by Zabbix and attached to each host based on the source. We also feed it all our MIBs, which works fine.
      In our old monitoring system, we built this layer ourselves using Perl modules called by snmptrapd, which parsed the trap's fields into more generic fields. Then we handled statefulness in the application.

      I don't have any experience with SNMPTT, but to my understanding it would alter the format of the traps (I don't even know if OIDs persist), so I'm worried existing templates might break if they expect specific patterns in the traps.
      It does seem to carry some benefits, and from what I hear you're supposed to use it in a production environment, but like I said, I don't know much about it.

      Right now, with this template, I'm putting all the AlarmChange trap into one snmptrap item, where I make them a bit more readable with preprocessing (also to avoid extremely long regex patterns).
      Let's take an example trap (after preprocessing):
      NbAlarmName PROCESS
      NbApplicationName XMS
      NbClassId monitor
      NbInstanceId monitor-00
      NbTableKey lcmd-00
      NbSeverity 6
      NbDescription Process is not running.


      Here, the AlarmName is the "type" of alarm. The description tells us what's wrong, but I don't use that.
      According to their documentation, there are multiple different InstanceIds all with 10-15 TableKeys (processes). Several ApplicationNames use the same AlarmName, too. A severity of 6 means it's not running, and 2 would mean it's cleared/now running again.
      I match each one using AlarmName, while I make a uniquely identifying tuple using the other fields (ApplicationName, InstanceId, TableKey) and put that into a tag. This lets me set "PROBLEM event generation mode" to Multiple, and it seems to work alright.
      I think the tags are definitely a must here, since otherwise I would have to make an insane amount of triggers.

      Another issue is that there can be more states than just up/down. Like this PROCESS alarm. It also has a "Process is starting" state, where it sets severity to 3.
      For this spefific alarm we don't really care, so I just ignore it, but for others I might have to make multiple triggers that are closed by the same expression, I'm guessing?

      I really just want to make sure that I'm not missing any obvious shortcuts, before we start making lots of templates in this same manner.
      It does feel like a lot of work compared to just making a small script that can parse all of the different types of traps correctly, but I also understand the importance of relational data (which we didn't have much of before).

      It might be worth it to make our own trap processor (at least for specific TrapOIDs anyway) for some systems, but do we then put it into text files like with normal traps, use zabbix_sender to dump data directly into items, or something entirely different?

      Comment

      Working...