Ad Widget

Collapse

How to prevent error overwriting using a cluster?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • ira
    Member
    • Nov 2010
    • 39

    #1

    How to prevent error overwriting using a cluster?

    Hello dear all,

    I have a Radius-Cluster with 3 Server, which I wish to monitor with Zabbix.
    I already wrote an external script which send a Radius-Request to the cluster and the script is working really good with Zabbix.

    The problem ist, that I can't send Radius-Requests to every single Radius-Server but only to the cluster and then the Request goes to one of the three servers (load distribution).
    If one Radius-Server fails, the script gets an error, but bevor it can eskalate and trow a message, it is overwritten through the responce of the other two Radius-Servers. In this way I don't really get the event eskalated and I don't receive a messages, that something is wrong. The error is shown only in the "Events" section.

    I am sure, there are parameters in Zabbix, which could adjust this behavior. How can I configure it in a way, that the event eskalate bevor the next responce comes from the other servers?

    Thank you for any help!
    Best Regards,
    Ira

    My configuration:
    Zabbix 1.8

    Item
    --------------
    Host Template_Radius
    Description Radius Authentication
    Type External Check
    Key check_radius3.sh
    Type of information Numeric
    Data type Decimal
    Units
    Use custom multiplier
    Update interval (in sec) 60
    Flexible intervals (sec) No flexible intervals
    New flexible interval
    Delay 50
    Period 1-7,00:00-23:59
    Keep history (in days) 30
    Keep trends (in days) 365
    Status Active
    Store value As is
    Show value As is

    Script check_radius3.sh
    ---------------------------
    #This script is used to check running radius processes
    #
    # USAGE: check_radius.sh
    # 12.10.2010
    #
    LOG=/tmp/check_radius3.log
    #echo "args:" $* >> $LOG
    PROCESS=`radtest "[email protected]" "pass" Cluster-IP-Address 0 'pass' | grep "rad_recv: Access-Accept" | wc -l`


    if [ $PROCESS -eq 1 ]
    then
    echo 1
    exit 1
    else
    echo 0
    exit 0
    fi

    Trigger
    --------
    Name Radius Authentication on failed
    Expression (Toggle input method) {Template_Radius:check_radius3.sh.last(0)}=0
  • alixen
    Senior Member
    • Apr 2006
    • 474

    #2
    Hi,

    Originally posted by ira
    Trigger
    --------
    Name Radius Authentication on failed
    Expression (Toggle input method) {Template_Radius:check_radius3.sh.last(0)}=0
    You can change your expression to :
    {Template_Radius:check_radius3.sh.min(600)}=0
    Your trigger will stay active for 10 minutes.

    Regards,
    Alixen
    http://www.alixen.fr/zabbix.html

    Comment

    • ira
      Member
      • Nov 2010
      • 39

      #3
      How to prevent error overwriting using a cluster?

      Thank you alixen,

      I changed the expression to what you proposed, but it seems that it is other problem.

      When the authentication request is not successful then the external script returns Status "Unknown". When it is successful it returns "Ok". I think that it doesn't come to send a message beacuse it is "Unknown".

      In the Events I can see this:
      2010.Nov.24 12:42:51 xxxx Radius Authentication on xxxx failed OK High 23h 30m 38s No
      -
      2010.Nov.24 12:34:54 xxxx Radius Authentication on xxxx failed UNKNOWN High 7m 57s No

      But in Dashboard I don't see any failed messages...

      I thougt that if I put
      echo 0
      exit 0
      then it would know that the script failed. Is that right?
      Could it be a problem with the delay? I tried to change the delay for the item with the script from 50 auf 120 but after save, it still shows 50.

      Do you have any ideas?

      Regards
      Ira
      -

      Comment

      • alixen
        Senior Member
        • Apr 2006
        • 474

        #4
        Hi,

        Originally posted by ira
        When the authentication request is not successful then the external script returns Status "Unknown". When it is successful it returns "Ok". I think that it doesn't come to send a message beacuse it is "Unknown".
        It is possible that when your script fails it takes too long.
        Zabbix server sets a timeout on external scripts.
        Default value is 1 second, it may be too short.

        You can change it with Timeout parameter in /etc/zabbix/zabbix_server.conf.
        You can set it to 30 (maximum allowed value) and restart zabbix_server.

        Regards,
        Alixen
        http://www.alixen.fr/zabbix.html

        Comment

        • ira
          Member
          • Nov 2010
          • 39

          #5
          How to prevent error overwriting using a cluster?

          Thank you!

          It works fine now

          Have a nice day!

          Comment

          Working...