Ad Widget

Collapse

Linux log monitor by occurrences

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pmurtey
    Member
    • Mar 2020
    • 91

    #1

    Linux log monitor by occurrences

    Hi All, We are currently using the following expression to search a log for "runner has stopped" and generate an alarm. Then clear if no further occurrences after 10 minutes. --- {server1:log[/server1log01/logs/server1-cs-3.19.18/server1-cs.1b/server1.cs.main.cdc.log.0,"Runner has stopped",,,skip,,].nodata(10m)}=0 This works great, but how do we modify this expression to say only alert after 5 occurrences are detected occurring within 1 minute, then clear the alarm if no more are detect within 30 minutes ? TIA
  • cyber
    Senior Member
    Zabbix Certified SpecialistZabbix Certified Professional
    • Dec 2006
    • 4807

    #2
    theres key "log.count" which returns count of occurrencies instead of lines... you can use sum function to count them during 5m... Instead of nodata() you probably need a recovery expression there, like sum(30m) =0 or something. Otherwise it closes earlier, when trigger expression does not match any more...

    Comment

    • pmurtey
      Member
      • Mar 2020
      • 91

      #3
      Hi Cyber, We have googled for some examples of how sum can be used in the context of log monitoring, and we have not been able to find any examples. Could you please expound a little further to show how the sum expression would fit into the log trigger expression we have shown above.?

      Comment

      • cyber
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Dec 2006
        • 4807

        #4
        Umm... as I said.. log.count returns numbers, so its just a matter of using sum function with them... Based on example (expression syntax), you seem to have some older version of Zabbix.
        your item would be "log.count[/server1log01/logs/server1-cs-3.19.18/server1-cs.1b/server1.cs.main.cdc.log.0,"Runner has stopped",,,skip]" item type numeric(unsigned).
        your trigger expression would be "{server1:log.count[/server1log01/logs/server1-cs-3.19.18/server1-cs.1b/server1.cs.main.cdc.log.0,"Runner has stopped",,,skip].sum(60)}>=5"
        your recovery expression would be "{server1:log.count[/server1log01/logs/server1-cs-3.19.18/server1-cs.1b/server1.cs.main.cdc.log.0,"Runner has stopped",,,skip].sum(30m)}=0"

        Comment

        • pmurtey
          Member
          • Mar 2020
          • 91

          #5
          Awesome Sir, your the best.

          Comment

          Working...