Ad Widget

Collapse

Monitor logs

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ptader
    Member
    • Sep 2007
    • 52

    #1

    Monitor logs

    Does Zabbix have the same functionality as the UNIX utility called Swatch (i.e alerting us everytime a regular expression match is made in a text file)? I've read several post and the documentation about monitoring log files, all very helpful, but just can't get it to work. More specifically, watching for the regular expression "TEST" in /var/log/messages. I've setup the Item, Trigger and test by running "logger TEST" which adds a line to syslog like:

    Mar 10 13:33:32 server root: TEST

    Permissions on /var/log/messages have been changed to be world-readable so the zabbix user can read it but the fact remains, I'm never alerted when "TEST" appears in the log.

    Thank You,
    -ptader
    Attached Files
  • cstackpole
    Senior Member
    Zabbix Certified Specialist
    • Oct 2006
    • 225

    #2
    Hey,
    I know that this was posted a while ago. However, I have some experiance in this and thought I would post just in case ptader or anyone else could use the information I have.

    The Key posted above is:
    log[/var/log/messages,TEST]
    The Expression is:
    {server:log[/var/log/messages,TEST].str(TEST)}=1

    That string will never evaluate into a trigger and even if it does, it will not turn off.

    My Key is the same but my Expression is
    {server:log[/var/log/messages,TEST].nodata(0)}#1

    That expression says something along the lines of "I got a bit of data and have not gotten anything else for at least 1 second" and it becomes True and then will turn False again right away.

    If someone knows a better way of doing this, please post. I have dozens of expressions on dozens of computers so if there is a better way that works just as good as this does, I would like to use it.

    As for Swatch and the comparisons between Swatch and Zabbix.

    The functionality is very similar but there are a few differences.

    Zabbix will only pull back 10 lines a second (I believe..it's in the PDF and all over the forums I just don't want to look right now). It does the filtering on the Agent side but that too appears to be capped at how many lines a second it can read.

    I use Zabbix to monitor ALL my log files that are /SLOWLY/ updated. Swatch to monitor all the other log files.

    Meaning that if the log file is updated regularly or even if it just has spurts of time where it dumps a lot of data to the log file, I won't use Zabbix. I have run into too many problems where an application crashed and dumped 100+ lines of data into the log file and it took Zabbix +10 minutes to read the log file and send the alert 10 minutes after I should have been doing something (or worse after I have finished fixing the problem after noticing the problem on my own).

    We also have a few near real-time programs that dump 20-30 lines a second into the log file. I once tried to run that seriously through Zabbix (starting at 7:30am) and by the end of the day (3:30pm) it was still pulling back data from 11am. Swatch never has that problem, always kept up with the log file, and I clearly saw that Swatch ran with much less resources on the same test.

    So here is what I did:
    Create an item in Zabbix: Is Swatch running? proc.num[swatch]
    My trigger is set to run a command on the system and I run swatch with a configuration file: swatch -v /etc/swatch.txt
    That configuration file has things I need to watch for and calls out to zabbix_sender to send the message back to Zabbix:
    Code:
    watchfor /ERROR/
    	exec=/usr/local/sbin/zabbix_sender -z ZabbixServer -p 10051 -s host -k ERROR -v -o \"$_\"\&
    watchfor /ALERT/
    	exec=/usr/local/sbin/zabbix_sender -z ZabbixServer -p 10051 -s host -k ALERT -v -o \"$_\"\&
    watchfor /HEARTBEAT/
    	exec=/usr/local/sbin/zabbix_sender -z ZabbixServer -p 10051 -s host -k HEARTBEAT -v -o \"$_\"\&

    So Swatch watches the log files and sends the information I need back to a zabbix trapper. Zabbix also makes sure that swatch is always running.
    If you want more details I can certainly give them, but this works really well for me.

    Hope that helps and answers more then it generates questions
    Have Fun!

    [Edit] I forgot I had put this into my blog sometime ago. http://www.zabbix.com/forum/blog.php?b=13
    That has a bit better description then this post so you might want to read it instead. If you have questions, please post them.
    Last edited by cstackpole; 13-03-2008, 21:54.

    Comment

    • ptader
      Member
      • Sep 2007
      • 52

      #3
      This is good information. The systems that I want to monitor are not expected to generate a lot of log messages. Even if there was a spike in traffic (say an app or process dumps 20 - 30 lines into the log all at once) waiting a minute for the alert wouldn't be disasterous considering what were looking for.

      That said I'd like to use Zabbix's log monitoring functions, but I just can't get it to work. Referencing the screen shot below, if I send 'TEST' to the log I'm never alerted by the trigger that it found the regular expression. Can you see anything misconfigured?

      I like, and considered your same approach using Swatch. One question, how do you clear the alert?

      Thanks,
      ptader

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        The 10 lines per second is really an artificial limit, which can be easily changed to something more reasonable, say 100 lines per second (see MAX_LINES_PER_SECOND in active.h).

        ZABBIX 1.6 communication protocol is much more efficient, so the change will not introduce any network related problems (high usage of bandwith, etc).

        Actually I just increased the number to 100 in 1.5.x (pre 1.6).
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • Crazy Marty
          Member
          • Sep 2007
          • 75

          #5
          Originally posted by Alexei
          The 10 lines per second is really an artificial limit, which can be easily changed to something more reasonable, say 100 lines per second (see MAX_LINES_PER_SECOND in active.h).

          ZABBIX 1.6 communication protocol is much more efficient, so the change will not introduce any network related problems (high usage of bandwith, etc).

          Actually I just increased the number to 100 in 1.5.x (pre 1.6).
          It sure seems to me that this MAX_LINES_PER_SECOND should be a parameter in the key (the dead giveaway is the above comment that the limit is artificial!). That way, those of us who have rapidly growing logfiles -- and the available horsepower to do the string match on lots of lines -- can actually make use of this feature. For some of my applications today, even 100 is way too small for the feature to be useful.

          Please make it a parameter to the key for 1.5 (preferred), or at least 1.6!

          Comment

          • ptader
            Member
            • Sep 2007
            • 52

            #6
            cstackpole,

            How do you clear the trigger alert once swatch (or Zabbix) trips the trigger?

            Thanks,
            Paul

            Comment

            • cstackpole
              Senior Member
              Zabbix Certified Specialist
              • Oct 2006
              • 225

              #7
              Sorry for not seeing this message until now...oh well I will post in case someone else stumbles across this...

              I use the:
              {Host:Trigger.nodata(0)}#1

              to turn off the trigger. The downside is that multiple messages may come in at once time and you will only get one message. I have been unable to find a better method and no one has suggested better yet (I have like 3 other posts asking for better methods).

              Hope this helps.

              Comment

              Working...