Ad Widget

Collapse

Notify Admin when Zabbix crashes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • soarin
    Junior Member
    • May 2006
    • 1

    #1

    Notify Admin when Zabbix crashes

    Hi,

    I am really impressed with Zabbix, it is a great product.

    Today Zabbix Server (1.1b7) crashed and was down 3 hours. I went to check the screens and the error message was "Database too Many Connections". When I checked the status, I found that the server was not running any more.

    Now I feel like I need a Zabbix to monitor Zabbix to tell me when it is down so I can get it back up again quickly.

    Can we have a monitor that sends a message to the Admin when the server is unreachable (for say 5 minutes). The Zabbix Agent was still running, and all I had to do was restart the server. If the agent could email me if it can't talk to the server, then I could relax more.

    If this is already a feature of Zabbix and I did not enable it, can it be a very prominent first thing to enable for the new admin?

    MySQL threads went to 94 just before Zabbix crashed -- I need to set it to notify on large numbers of threads, but when the Zabbix_Server is down, there is not much notifying that it can do.

    Thanks!
  • krusty
    Senior Member
    • Oct 2005
    • 222

    #2
    Hi,

    our zabbix_server process crashes nearly every day. I have written a bash script which will restart the zabbix_server process if there is no one running. Perhaps you have to do the same.

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Originally posted by krusty
      our zabbix_server process crashes nearly every day.
      Every day?! Please report all issues to ZABBIX team. We are very interested! Don't silently restart ZABBIX.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • Nate Bell
        Senior Member
        • Feb 2005
        • 141

        #4
        soarin, if I were you, I would take krusty's approach, but with a slight tweak. Instead of automatically restarting, have the script send an email to you (and anyone else involved). Something like:
        Code:
        cat zabbix_server.log | mail -s 'Zabbix Server Crashed' [email protected]
        So that way it will dump the server's logs and send the mail with a timestamp of when the crash happened. It could also automatically restart the server to keep the data flowing, but as Alexei said, make sure the restart, and conditions surrounding the restart, are documented.

        It might also be wise to create a similar script on another server that just pings the Zabbix server every few minutes to make sure it's still up. That way, if the Zabbix box itself crashes, you will be notified.

        Nate

        Comment

        • DiedX
          Senior Member
          • Oct 2004
          • 106

          #5
          VERY dirty:

          #!/bin/bash

          A=`ps aux |grep "/usr/local/bin/zabbix_server" -o -c`
          if [ $A -lt 2 ]; then
          rm /var/tmp/zabbix_server.pid
          #/etc/init.d/zabbixd start
          /usr/local/bin/zabbix_server
          fi

          ---

          Also, if needed: I can check 10051 from my zabbix machine, and email you if needed?
          https://www.diederik.nl

          Comment

          • krusty
            Senior Member
            • Oct 2005
            • 222

            #6
            Btw. here is my testing and debuging script.

            Code:
            #!/bin/bash
            
            #################################################################################
            #                                                                               #
            # Please look at first how many processes zabbix_server and zabbix_agentd have. #
            # Then change the entries in the lines.                                         #
            # Change the email address!                                                     #
            #        created by Krusty                                                      #
            #                                                                               #
            #################################################################################
            
            date=`date`
            prozessserver=`ps -ef | grep zabbix_server | grep -v grep | grep -v tail | wc -l`
            serverpid="/home/zabbix/tmp/zabbix_server.pid"
            #echo "Debug server: $prozessserver"
            if [ $prozessserver -lt 11 ];
            then
                    echo "There is no zabbix_server process running. I will start this process now! $date"
                    if [ -f $serverpid ]
                    then
                            echo "File $serverpid exists. I will remove this file know!"
                            rm $serverpid
                    fi
                    /usr/local/bin/zabbix_server
                    echo "At $date the zabbix_server process was not running. But the server will be started immediately." | mail -s "Zabbix Server crashed at $date" [email protected]
            fi
            
            prozessagentd=`ps -ef | grep zabbix_agentd | grep -v grep | grep -v tail | wc -l`
            agentdpid="/home/zabbix/tmp/zabbix_agentd.pid"
            #echo "Debug agentd: $prozessagentd"
            if [ $prozessagentd -lt 6 ];
            then
                    echo "There is no zabbix_agentd process running. I will start this process now! $date"
                    if [ -f $agentdpid ]
                    then
                            echo "File $agentpid exists. I will remove this file know!"
                            rm $agentdpid
                    fi
                    /usr/local/bin/zabbix_agentd
                    echo "At $date the zabbix_agentd process was not running. But the agentd will be started immediately." | mail -s "Zabbix Agentd crashed at $date" [email protected]
            fi
            
            exit 0

            Comment

            • ruckus37
              Member
              • Oct 2004
              • 57

              #7
              Take a deep breath and look at your system health, most nix systems run zabbix without a hiccup.

              Comment

              • DiedX
                Senior Member
                • Oct 2004
                • 106

                #8
                ruckus,

                I fully agree, but for instance: MySQL crashes, and takes zabbix with him?
                https://www.diederik.nl

                Comment

                • ruckus37
                  Member
                  • Oct 2004
                  • 57

                  #9
                  Ok take a double breath and look at the reason mysql crashes, I am not trying too be smart but there is a reason your system is crashing and I bet 100 to none that it is not the zabbix database or binary crashing your system.

                  Comment

                  • Alexei
                    Founder, CEO
                    Zabbix Certified Trainer
                    Zabbix Certified SpecialistZabbix Certified Professional
                    • Sep 2004
                    • 5654

                    #10
                    Originally posted by ruckus37
                    Ok take a double breath and look at the reason mysql crashes, I am not trying too be smart but there is a reason your system is crashing and I bet 100 to none that it is not the zabbix database or binary crashing your system.
                    This is very valid point. Please try to dig to the root of the crashes. Would you trust a system if it crashes because of an unknown reason?!
                    Alexei Vladishev
                    Creator of Zabbix, Product manager
                    New York | Tokyo | Riga
                    My Twitter

                    Comment

                    • krusty
                      Senior Member
                      • Oct 2005
                      • 222

                      #11
                      Originally posted by Alexei
                      Every day?! Please report all issues to ZABBIX team. We are very interested! Don't silently restart ZABBIX.
                      Hi Alexei,
                      today two of the zabbix_agentd processes dies. I have looked into the log files but i can't see any reason why. Sometimes the server process dies and another time the agentd dies. And the log file says nothing.

                      Comment

                      • DiedX
                        Senior Member
                        • Oct 2004
                        • 106

                        #12
                        ok,

                        for more information: you still have good points

                        however: I do not fully agree. I've compiled zabbix under Debian, and thrown everything in /usr/local/bin (everythings' standard)

                        Then I start zabbix_server.

                        Then I use apt-get update, which wants to upgrade mysql-server. Ok! Cool, so I upgrade mysql-server. Zabbix-server can't connect, and dies with a large scream in a pool of blood. My update works perfectly, monitoring is down.

                        I've taken care of this with 2 things:

                        - the bad cronscript I gave above (if dead, then reanimate, till true)
                        - monitoring from outside (TCP-port 10051)
                        https://www.diederik.nl

                        Comment

                        Working...