Ad Widget

Collapse

Fake problems/alarms without a corresponding snmp trap

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • chaos.entalpico
    Junior Member
    • Aug 2023
    • 8

    #1

    Fake problems/alarms without a corresponding snmp trap

    Hello,

    I'm struggling with a weird problem on Zabbix. Basically, we have a device that acts as a proxy and sends snmp traps on our Zabbix server. Using some macro and manipulation of the trap's text, we are able to reconstruct the original node and fault that generate the event.
    It works ok for the most part, however there are times when it generate some false alarms. For further investigation we checked the snmptrap.log and noticed that there is no trap related to those events. Sometimes, but not always, a false event is triggered at the same time other traps reach the system, triggering correctly other problems or resolutions for the same node or others.
    I can understand the server to lose some traps due to overload, but to generate triggers based on nothing is something I have yet to understand

    Any tips out there?
    Thanx!​
  • tim.mooney
    Senior Member
    • Dec 2012
    • 1427

    #2
    Originally posted by chaos.entalpico
    Sometimes, but not always, a false event is triggered at the same time other traps reach the system
    That sounds like some kind of parsing problem with the intermediate script or program you're using with snmptrapd. Is it possible it's getting one TRAP but mishandling it and writing it out was two separate entries in the intermediate file that Zabbix reads? Or is it possible there's some text (line feeds or newlines) in part of the received trap, that is causing the intermediate script or program to incorrectly split one incoming TRAP into two separate entries in the file?

    My environment doesn't have a high volume of TRAPs, as we only use TRAP reception for a small number of dumb devices that we can't monitor any other way, but you bring up a point I had never considered with the particular intermediate script we're using: I don't remember if the script my site is using does any locking or anything else to ensure that two nearly simultaneous traps couldn't end up interspersed in the log file. I would have to do some reading to see if that's even a possibility (it's possible snmptrapd forces the scripts to run sequentially without any need for locking in the script itself), but it's not really a problem I would encounter in my environment.

    You may want to add some additional logging to the script or program you're using, to see if there are cases where the script is being run more than once at the same time.

    Comment

    • chaos.entalpico
      Junior Member
      • Aug 2023
      • 8

      #3
      Originally posted by tim.mooney
      You may want to add some additional logging to the script or program you're using, to see if there are cases where the script is being run more than once at the same time.
      Hi Tim,
      thanks for the feedback. I've checked all the logs I could find but I didn't find anything useful. You mention to add some additional logging. We are using the pearls script suggested in the official installation guide... do you know how can I enable more logs?

      Comment

      • tim.mooney
        Senior Member
        • Dec 2012
        • 1427

        #4
        If you're talking about the script 'zabbix_trap_receiver.pl', then my apologies: now that I look at it, it doesn't currently have any built in logging. If you don't already know how to program in the language perl, adding logging wouldn't be very straightforward for you. It had been a while since I had looked at that script (we use a locally-modified version of that script for our minimal trap reception), and I had forgotten that it didn't have any built-in logging.

        Knowing that, I'm not sure what to suggest. When I'm trying to debug a weird problem like what you're running into, I usually try make sure that logging is enabled and perhaps temporarily the log-level is increased, so I get more verbose messages about what's happening. Depending upon the developers of the software, that may or may not be useful -- they may not be doing any logging in the parts of the code that I need more information about. Since there's currently no logging in the perl receiver scripts, about all you could do for logging is to try enable logging for snmptrapd (see the LOGGING section of the snmpcmd(1) man page).

        Sorry I don't have a better suggestion, but for a problem that happens infrequently, trying to be prepared with logging for "information capture" and hoping you get enough info the next time it happens is often the easiest approach to tracking down the problem.​

        Comment

        • chaos.entalpico
          Junior Member
          • Aug 2023
          • 8

          #5
          Hi Tim,
          no problem, thanks for your feedback anyway.

          Comment

          • chaos.entalpico
            Junior Member
            • Aug 2023
            • 8

            #6
            Hello,

            I couldn't find any piece of hint in any log, so I tried to play around with things.
            I've managed to improve the situation a bit by removing the recovery expression.

            Before I had something like this:

            Problem: last(/whatever/snmptrap.alarm.code)=123 and last(/whatever/snmptrap.alarm.resolution)=1
            Recovery: last(/whatever/snmptrap.alarm.code)=123 and last(/whatever/snmptrap.alarm.resolution)=2

            Now:

            last(/whatever/snmptrap.alarm.code)=123 and (last(/whatever/snmptrap.alarm.resolution)=1 or last(/whatever/snmptrap.alarm.resolution)>2)

            This works almost fine, except sometimes the problems are marked as cleared by Zabbix without the appropriate trap. The logic behind, I believe, is that when Zabbix receives a trap with alarm code different than 123 and a resolution value of 2 right after the trigger, the condition is interpreted as FALSE and the alarm is cleared. So, if a node is generating different traps with different alarm codes, there is a chance that some alarm is marked as clear. So I have to find a workaround for another workaround.

            I think I'm going insane

            Comment

            • cyber
              Senior Member
              Zabbix Certified SpecialistZabbix Certified Professional
              • Dec 2006
              • 4807

              #7
              Each trigger is recalculated, if any of used items is receiving new value. It can be microseconds apart, but it will be recalculated for each new value.

              Comment

              • chaos.entalpico
                Junior Member
                • Aug 2023
                • 8

                #8
                Originally posted by cyber
                Each trigger is recalculated, if any of used items is receiving new value. It can be microseconds apart, but it will be recalculated for each new value.
                Yes, I thought so. Any tip to reach the light ?
                Thanks!

                Comment

                Working...