Ad Widget

Collapse

count 503s in log file and trigger if > 200?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cirrhus9.com
    Member
    • Feb 2012
    • 58

    #1

    count 503s in log file and trigger if > 200?

    zabbix 1.8.10

    Client wants us to monitor 503s in the varnish.log but with this caveat:
    The alert should be triggered by a rate of 200 messages over 1 minute for host1 and host4.

    Here's the entire request:
    Code:
    Need to set up varnish access log monitoring (/mnt/varnish/access.log) for the pattern that 
    detects 503 response code.
    
    IMPORTANT:  the alert should be rate-triggered and rate limited. I.e. during  downtime we 
    can get thousands of 503s and that should not cause us to  receive thousands of alerts.
    There should be a 30 minute falloff time for the alarm.
    I can stare at this for days and bang out "something" but it wouldn't be efficient and probably would have to be 'modified' eventually.

    How can I go about this?

    Thank you for your time.
  • cirrhus9.com
    Member
    • Feb 2012
    • 58

    #2
    Well, I dug around and finally came up with this Item entry:

    Code:
    log["/mnt/varnish/varnish-access.log","HTTP/1.1 503","UTF-8",500]
    and set it to Zabbix agent (active) and "Type of information" is Log

    The item shows "Active" but I get nothing in Latest Data.

    I also have a concern about the Log time format
    where the log file has [22/Nov/2013:10:12:51 -0800] so I'm not quite clear on how to formulate that with the correct syntax in the item.
    something like
    Code:
    dd/ppp/y:h:m:s
    would work?

    My references are



    Thank you.

    Comment

    • cirrhus9.com
      Member
      • Feb 2012
      • 58

      #3
      bumpity bump bump?

      Comment

      • LenR
        Senior Member
        • Sep 2009
        • 1005

        #4
        Check the agent log on the client. The zabbix agent will need read access to the log file, which is often broken by log rotation processes.

        Comment

        • cirrhus9.com
          Member
          • Feb 2012
          • 58

          #5
          Thanks LenR:

          Perms are good.
          Code:
          sudo -u zabbix test -r /mnt/varnish/varnish-access.log && echo OK || echo NOK
          OK
          zabbix_agent rotated log.2 on the client shows quite a few of these entries. Nothing else sticks out...
          Code:
          /var/log/zabbix-agent/zabbix_agentd.log.2.gz: 23541:20131122:105812.303 End of process_active_checks()
          /var/log/zabbix-agent/zabbix_agentd.log.2.gz: 23541:20131122:105912.324 In process_active_checks('71.19.xxx.xxx',10051)
          I also tried a zabbix_sender with 2 options:
          Code:
          zabbix_sender -z 71.19.xxx.xxx -k logrt["/mnt/varnish/varnish-access.log",,500] -s varnish2 -o 1 
          Info from server: "Processed 0 Failed 1 Total 1 Seconds spent 0.000017"
          sent: 1; skipped: 0; total: 1
          root@varnish2:~# zabbix_sender -z 71.19.xxx.xxx -k log["/mnt/varnish/varnish-access.log",,500] -s varnish2 -o 1Info from server: "Processed 0 Failed 1 Total 1 Seconds spent 0.000016"
          sent: 1; skipped: 0; total: 1
          No data anywhere.

          Thank you,

          Comment

          • cirrhus9.com
            Member
            • Feb 2012
            • 58

            #6
            Code:
            #!/bin/bash
            THISHOUR=$(date '+%I:%M')
            LASTHOUR=$(date '+%I:%M' --date="1 hour ago")
            sed -n "/$LASTHOUR/,/$THISHOUR/p" /mnt/varnish/varnish-access.log | grep "HTTP/1.1\" 503"
            #EOF
            real 0m11.342s
            vs
            logwatch --service varnish --range today --detail medium --print
            real 17m2.425s
            is a no brainer!

            Thanks.

            Comment

            • avecsi
              Member
              • Nov 2013
              • 40

              #7
              Hi,
              So in the second post you wrote log
              Originally posted by cirrhus9.com
              ...
              Code:
              log["/mnt/varnish/varnish-access.log","HTTP/1.1 503","UTF-8",500]
              and set it to Zabbix agent (active) and "Type of information" is Log

              ....
              for log you dont need ""

              but in the logrt you need ""

              Comment

              • cirrhus9.com
                Member
                • Feb 2012
                • 58

                #8
                Originally posted by avecsi
                Hi,
                So in the second post you wrote log


                for log you dont need ""

                but in the logrt you need ""
                Thank you very much.

                Comment

                • cirrhus9.com
                  Member
                  • Feb 2012
                  • 58

                  #9
                  So I changed things up just a bit...

                  The cron is now every minute:
                  Code:
                  * * * * * /etc/zabbix/count_varnish_503s.sh  > /tmp/503s.out
                  and the script itself is counting "$LASTMINUTE" AND "$THISMINUTE" using:
                  Code:
                  #!/bin/bash
                  THISMINUTE=$(date '+%I:%M')
                  LASTMINUTE=$(date '+%I:%M' --date="1 minute ago")
                  sed -n "/$LASTMINUTE/,/$THISMINUTE/p" /mnt/varnish/varnish-access.log | grep "HTTP/1.1\" 503" | wc -l
                  The item is "503s.every.minute" instead of "503s.last.hour" and the Update Interval is 60s.

                  and I think my last piece to the puzzle is the trigger:
                  Code:
                  {Varnish2:503s.every.minute.last(0)}> 199
                  which is what I'd need help verifying that against the original request of "The alert should be triggered by a rate of 200 messages over 1 minute for varnish2"

                  Does that look correct or do I need to revisit the logic of this solution?
                  Thanks.

                  Comment

                  • cirrhus9.com
                    Member
                    • Feb 2012
                    • 58

                    #10
                    Solved

                    All done.
                    Trigger is good. I tested it.

                    BTW:
                    Should someone else need it, here's a 'better' way using a varnish-related tool
                    Code:
                    varnishtop -1 -i TxStatus | grep 503
                    that can be used as a userparameter.

                    Enjoy!

                    Comment

                    • cirrhus9.com
                      Member
                      • Feb 2012
                      • 58

                      #11
                      How can I implement this?
                      "There should be a 30 minute falloff time for the alarm."?

                      (current trigger)
                      {Varnish2:503s.every.minute.last(0)}> 199

                      I'm checking into escalations now...

                      Comment

                      Working...