Ad Widget

Collapse

Need advice on log monitoring

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Moebius
    Member
    • Dec 2022
    • 43

    #1

    Need advice on log monitoring

    The scenario is as follows:
    Zabbix 7.4.5 is monitoring a log file on an Ubuntu server, where several remote clients write a status message every 60 seconds.
    The status messages are something like: "STATION xxx - OK", or "STATION xxx - OFFLINE".

    So the log monitoring item is like this:
    Click image for larger version

Name:	immagine.png
Views:	22
Size:	24.7 KB
ID:	510230

    I'm trying to create a trigger that fires if a station is OFFLINE for 3 (or more) consecutive minutes. So the trigger is like this:

    Click image for larger version

Name:	immagine.png
Views:	22
Size:	40.9 KB
ID:	510231

    What's wrong with this?
    Previously I used another trigger that fires immediately at the first "OFFLINE" occurrence, using 'find' instead of 'count', which worked perfectly. Now I want the trigger to fire if the station X stays offline for 3 or more minutes, but the trigger I designed does not seem to work.
    Attached Files
  • Moebius
    Member
    • Dec 2022
    • 43

    #2
    I think I figured this out.
    The problem is that this way the trigger cannot discriminate the third "OFFSITE" occurrence from the same station. It would fire if three consecutive "OFFSITE" status messages appeared, regardless of the sending station, which is unlikely to happen - and it is not what I want anyway.

    Comment

    • cyber
      Senior Member
      Zabbix Certified SpecialistZabbix Certified Professional
      • Dec 2006
      • 4806

      #3
      You have multiple endpoints writing there... so using count(..#3,"regexp","OFFLINE")=3 does not mean, it is all the same endpoint... You can have 3 different endpoints there writing that they are offline... #3 is "last 3 checks"... your item is with interval 1s, so within last 3 sec basically...
      additionally to that, your recovery expression is pretty much useless here... look it up, when and how recovery expressions are actually used.. (it is an additional condition, that has to be true AFTER original expression has been calculated to false)...

      Comment

      • Moebius
        Member
        • Dec 2022
        • 43

        #4
        Yes, that's what I have figured out.
        I was trying to build a trigger that would fire if the last 3 entries from the same endpoint are all "OFFLINE". It seems that in my case there are no ways to have the trigger do what I want it to do, unfortunately.
        There are about 150 endpoints that write one status line each every minute. And yes, the recovery expression works fine in the original setup (the one using "find", with the trigger that fires at first OFFLINE occurrence).

        I was also wondering if #3 means "last 3 log file checks" or "last 3 log file entries". You state it refers to the checks to the item. Thank you for making this clear to me!

        Comment

        • cyber
          Senior Member
          Zabbix Certified SpecialistZabbix Certified Professional
          • Dec 2006
          • 4806

          #5
          #3 is last 3 values... are they collected in one run or multiple ones... does not really matter.
          Maybe my example was worded a bit badly... considering 1 entry per sec... Which still can happen, but if you have 150 endpoints writing there once a minute... you probably get more than 3 per check ... and not 3 times from same endpoint...

          Comment

          Working...