Ad Widget

Collapse

Log File Monitoring reporting too many lines leading to multiple trigger firing

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mwildam
    Member
    • Feb 2021
    • 72

    #1

    Log File Monitoring reporting too many lines leading to multiple trigger firing

    I have an item prototype and a trigger prototype in a template and both are created during a low-level-discovery script.
    Item Prototype:
    Code:
    log["{#BATCHLOGDIR}/{#BATCHLOGNAME}.{#BATCHLOGEXT}","{$BATCHLOGOKKEYWORDS}|{$BATCHLOGFAILKEYWORDS}",,,skip]
    Trigger Prototype:
    Code:
    {MyTemplate:log["{#BATCHLOGDIR}/{#BATCHLOGNAME}.{#BATCHLOGEXT}","{$BATCHLOGOKKEYWORDS}|{$BATCHLOGFAILKEYWORDS}",,,skip].regexp("(?i)^.*({$BATCHLOGFAILKEYWORDS}).*$")}=1
    Recovery expression:
    Code:
    {MyTemplate:log["{#BATCHLOGDIR}/{#BATCHLOGNAME}.{#BATCHLOGEXT}","{$BATCHLOGOKKEYWORDS}|{$BATCHLOGFAILKEYWORDS}",,,skip].regexp("(?i)^.*({$BATCHLOGOKKEYWORDS}).*$")}=1
    Item effective example:
    Code:
    log["/var/.../PersonImport.log","{$BATCHLOGOKKEYWORDS}|{$BATCHLOGFAILKEYWORDS}",,,skip]
    Trigger effective example:
    Code:
    {somehost:log["/var/.../PersonImport.log","{$BATCHLOGOKKEYWORDS}|{$BATCHLOGFAILKEYWORDS}",,,skip].regexp("(?i)^.*({$BATCHLOGFAILKEYWORDS}).*$")}=1
    Trigger effective recovery expression:
    Code:
    {somehost:log["/var/.../PersonImport.log","{$BATCHLOGOKKEYWORDS}|{$BATCHLOGFAILKEYWORDS}",,,skip].regexp("(?i)^.*({$BATCHLOGOKKEYWORDS}).*$")}=1
    Everything works fine except the fact, that on recovery I get the same message as for the error/failure/trigger/event.
    I did not set an individual message for the media type and I have a message template configured for Problem recovery in Media types.
    I am using Zabbix 5.2.4 BTW.

    I have no idea where and if I could change that.
  • mwildam
    Member
    • Feb 2021
    • 72

    #2
    In my configuration there was first another issue with the recovery notification - so that wasn't sent in any case. I got the second notification and thought it was the recovery message, but in reality it was a multiple event occurrence what caused the second message. After repairing that I got multiple PROBLEM and RESOLVED messages each time I added a line to the logfile.

    After really a lot of additional tests with different notification types, changing notification from "notify all" to "Send Message", reducing number of lines per second to 5 and 3 etc etc etc - I must come to the conclusion, that logfile monitoring is always sending/reporting too many lines to the server. I already thought it always runs through the whole file but finally cannot even confirm that.
    Fact is, that via the item there are always many lines reported. I add 4 Lines to a log and in the history of latest data I can see A LOT more. From the documentation it should only look at the new lines - I mean, yeah, that was the reason for using that log[...] item.

    On https://www.zabbix.com/documentation...ypes/log_items it says under "Important Notes":
    "On UNIX/GNU/Linux systems it is assumed that the file systems where log files are stored report inode numbers, which can be used to track files."

    So then my doubt was, that this maybe my problem - I am on AIX [room for compassion].
    I have NFS shares and journalling file system on the mounted volumes. I do not really get it, why relying on inodes and not simply using file size and offset.

    Finally to proove, I tried it with a logfile residing directly on the zabbix server where I fortunately have a debian buster box. -> But the same issue here!

    I could not believe and reviewed the video from Dmitry which was my original inspiration for how to do logfile monitoring right. And if you look carefully at the output from time position 12:05, https://youtu.be/3ljYwiVt1CA?t=725, you will see, that Dmitry has the same problem! However, the video stops when the data appears in the log for the first time and there is neither a trigger created. So he didn't notice the problem either.

    I watched other videos regarding this topic, e.g. https://youtu.be/EUZSI-7Bu4M - here is a trigger created, but no notification is configured - or at least not shown. That way, you either won't notice the multiple firing, as long as the last noticed line will match the recovery expression. So basically I could not find any proove of the concept which indicates, that it should work and I am just doing something wrong.

    If there is anybody who got this to work properly from the beginning (item) until the bitter end (trigger with problem notification and finally recovery notification), please let me know!

    Click image for larger version

Name:	Old-Log-Entries-reported.png
Views:	530
Size:	79.9 KB
ID:	422023

    Comment

    • mwildam
      Member
      • Feb 2021
      • 72

      #3
      Oh, and BTW: I also tried it with a plain item without the use of any discovery and prototypes - Doesn't work either!

      Comment

      • mwildam
        Member
        • Feb 2021
        • 72

        #4
        In my attempt to test this on the zabbix server directly, the issue is even worse there - got >300 notifications after adding two lines in a not so big log and using the interval 1s (as recommended for log monitoring), the issue is worse than using 1m. - And memory goes under the roof too, having several log items - problem multiplies then. I really wonder, how others implement the log monitoring.

        Comment

        • mwildam
          Member
          • Feb 2021
          • 72

          #5
          I even optimized that really only interesting logfiles get reported by LLD, so no logrt items - deleted all those - and still - even monitoring a few logs (~25, but some a bigger) makes the whole thing pretty resource hungry.
          I still hope, I am simply doing something wrong. Anybody having a hint for me?

          Comment

          • mwildam
            Member
            • Feb 2021
            • 72

            #6
            Unfortunately I cannot change the title of the topic - because it does not really match the latest findings any more. Basically the topic applies to all log monitoring - is not limited to protypes and not related to recovery notification. This is a general issue with log file monitoring.

            Comment

            • dimir
              Zabbix developer
              • Apr 2011
              • 1080

              #7
              What would you like to change the topic to?

              Comment

              • mwildam
                Member
                • Feb 2021
                • 72

                #8
                Originally posted by dimir
                What would you like to change the topic to?
                I would change it to "Log File Monitoring reporting too many lines leading to multiple trigger firing" for example.

                Comment

                • mwildam
                  Member
                  • Feb 2021
                  • 72

                  #9
                  Thanks for renaming.

                  Originally posted by cyber
                  try with
                  log["/var/.../PersonImport.log","{$BATCHLOGOKKEYWORDS}|{$BATCHLO GFAILKEYWORDS}",,,skip,,,mtime-noreread]
                  I was pretty sure, I tried that already, however, when retrying I noticed that at least for my last tests I had a mistake in comma count.
                  Anyway, I retried, and no - same issue, here is my current item:
                  Code:
                  log["{#BATCHLOGDIR}/{#BATCHLOGNAME}.{#BATCHLOGEXT}","{$BATCHLOGOKKEYWORDS}|{$BATCHLOGFAILKEYWORDS}",,5,skip,,,mtime-noreread]

                  Comment

                  • mwildam
                    Member
                    • Feb 2021
                    • 72

                    #10
                    Oh, B TW: I also tried it without the 5 (limitation of lines to read).
                    And I made another test. From my experience it depends on how long there was no change in the keywords from the regular expressions. If for a long while there was only success, I only get one problem notification. Not, if problem and success were alternating multiple times. See example:

                    Click image for larger version

Name:	Single-Problem-report.png
Views:	440
Size:	99.3 KB
ID:	422583
                    Click image for larger version

Name:	Multiple-Problem-report.png
Views:	437
Size:	620.0 KB
ID:	422584
                    Attached Files
                    Last edited by mwildam; 09-04-2021, 18:15.

                    Comment

                    Working...