Ad Widget

Collapse

When agentd stops, I want script to execute

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mudbone333
    Junior Member
    • Sep 2011
    • 7

    #1

    When agentd stops, I want script to execute

    Hey, I am trying to get a script to execute when the agentd process stops running. The weird thing is that I have the remote command, and the action, and the trigger all set up, but the script executes when I start up zabbix, not when zabbix stops. It is doing just the opposite of what I want it to do. What I want to do is that whenever the agentd process stops, this script will kick off, thus re-starting zabbix. The trigger that I am using is {zabbixhostroc.num[zabbix_server].last(0)}<1. I've tried different triggers, but no matter which one I use, the scriipt kicks off when the agentd starts. I've even tried with no trigger and the same thing happens. I can't find any examples on how to set this up. All I am trying to do is start zabbix via a script when the agentd process stops. Does anyone feel my pain of what I am trying to accomplish? All help is appreciated.
  • frater
    Senior Member
    • Oct 2010
    • 340

    #2
    If zabbix isn't running anymore how can you let it do things?
    Could you maybe re-phrase your question and don't use "zabbix", but either "zabbix_server" or "zabbix_agentd"?

    Maybe this will answer your question.
    It's my solution to keep zabbix_server/zabbix_agentd running:

    /etc/crontab
    Code:
    # Watchdog for Zabbix-server
    * * * * * root netstat -lntp | grep -q zabbix_server || /etc/init.d/zabbix-server start
    # Watchdog for Zabbix-agent
    * * * * * root netstat -lntp | grep -q zabbix_agentd || /etc/init.d/zabbix-agent start

    On top of that I created a trigger that will check the log and tell me if the server is started....
    Last edited by frater; 07-10-2011, 22:41.
    Zabbix agents on Linux, FreeBSD, Windows, AVM-Fritz!box, DD-WRT and QNAP

    Comment

    • mudbone333
      Junior Member
      • Sep 2011
      • 7

      #3
      Let me see if I can word this correctly. When agentd stops running, for whatever reason, let's say the server was rebooted, I want the agentd startup script to execute. I thought that you can create a trigger that will check to see if agentd is running, and if it is not you can create an action based upon that trigger, to execute a remote command to kick off the agentd start script. I hope that's clear, if it isn't clear enough, I'll be happy to try and say it again. I know that there are alternative ways to do this, and your suggestion sounds good, but if Zabbix has so many built-in functions, I want to take advantage of them. I have seen other posts, where the system admin is setting up triggers to log files, alert them and do so many other things, so why can't a trigger tell me that the agentd is not running and then I can create an action based upon that trigger, to run my remote command. Sorry to be so long winded, but I wanted to make sure that you see what I wanted done. I'm sure that there is a solution to this.
      Thanks for your reply.

      Comment

      • frater
        Senior Member
        • Oct 2010
        • 340

        #4
        You can use Zabbix to detect the agent isn't running anymore, but you can't take any action because they are passed to the agent and that's not running.


        You can't use a dead doctor to cure himself.

        BTW. I haven't used remote actions thus far (the webif confuses me)...
        It might be possible to start the agent on the server itself, but that's of very limited use (it will only keep the agent alive on the machine that's running the zabbix_server.. IF it is possible at all)
        Last edited by frater; 10-10-2011, 17:28.
        Zabbix agents on Linux, FreeBSD, Windows, AVM-Fritz!box, DD-WRT and QNAP

        Comment

        • mudbone333
          Junior Member
          • Sep 2011
          • 7

          #5
          Okay, that takes care of that, you know a lot more about this than I do. I guess I'll try your method of using netstat to detect when agentd has stopped running and re-start it. I tried the options "nltp" with netstat and there is no "l and t" on a solaris 10 operating system. I stopped agentd and then used "netstat -np | grep -q zabbix_agentd" just to see if it would find agentd, but nothing came up. Is the "l and t" options to netstat, important when looking for agentd. If it is then I can't use your method. Is there any other way, if I can't use those options?
          Thanks for you help

          Comment

          • frater
            Senior Member
            • Oct 2010
            • 340

            #6
            This will probably work...
            Code:
            # Watchdog for Zabbix-server
            * * * * * root ps -e | grep -q [z]abbix_server || /etc/init.d/zabbix-server start
            # Watchdog for Zabbix-agent
            * * * * * root ps -e | grep -q [z]abbix_agentd || /etc/init.d/zabbix-agent start
            Maybe you need to replace "-e" with "ax"
            I have never used Solaris, but I believe it doesn't support one of these....
            Last edited by frater; 10-10-2011, 17:26.
            Zabbix agents on Linux, FreeBSD, Windows, AVM-Fritz!box, DD-WRT and QNAP

            Comment

            • mudbone333
              Junior Member
              • Sep 2011
              • 7

              #7
              That's a good idea. I did think of using ps in the past, but this is the problem that I am having with it. I have found that if zabbix isn't writting to the zabbix log files, no matter what the ps command shows, agentd isn't running. I have run across situations where you run the ps command and you see output, and thinking that the agentd is running and your host is being monitored, yet only to find that there is no enty to your log file, or no log file at all. I was looking at old processes with the ps command, but nothing was writing to the log files. So, I know that if agentd isn't writing to the log files, the ps command or no ps command isn't going to prove anything.
              I have also found out that you can have entries in that log file, and you are thinking, my agentd is running because my log file has something in it. But I have seen the message "Child process stopped", in the log file and you find out that the agentd isn't running, ps or no ps.
              This is why I was hoping that I could use the trigger, and the action, because of the conditions above. Sorry to be so lengthy, but I wanted to make sure that you understand the conditions.
              Once again thanks.

              Comment

              • frater
                Senior Member
                • Oct 2010
                • 340

                #8
                Then try this:

                Code:
                # Watchdog for Zabbix-agent
                * * * * * root expr `netstat -n | grep -c ':10050 '` \< 6 >/dev/null && etc/init.d/zabbix-agent restart
                # Watchdog for Zabbix-server
                * * * * * root expr `netstat -n | grep -c ':10051 '` \< 6 >/dev/null && etc/init.d/zabbix-server restart
                Last edited by frater; 10-10-2011, 19:25.
                Zabbix agents on Linux, FreeBSD, Windows, AVM-Fritz!box, DD-WRT and QNAP

                Comment

                • mudbone333
                  Junior Member
                  • Sep 2011
                  • 7

                  #9
                  Hey Frater,
                  That one liner does work, to a certain point. I did put it in cron earlier today to test it. I stopped zabbix, and a minute or two later it did restart zabbix. So I left that in cron, but when I came back this evening my log file was extremely large. It is continoulsy writing to the log file, after it had restarted zabbix. I had to stop cron, remove the one-liner and restart cron again, so that it would stop writing to my log file. The log file would have " File [zabbix_agentd.pid] exists and is locked. Is this process already running ?
                  ERROR: File [zabbix_agentd.pid] exists and is locked. Is this process already running ?" I must have had 500 lines of these.
                  I see your point where you have an entry in every time slot, because you want cron to run the script all the time, it's just that no matter how the time is set up in cron it will continoulsy write to the log file, filling it up. And I don't know any way around that.
                  Do you think nulling out the log file after it reaches a certain length, or having the script to stop writing to the log files once it sees those errors, would work?
                  Thanks for you help.

                  Comment

                  • frater
                    Senior Member
                    • Oct 2010
                    • 340

                    #10
                    At the moment I'm running the 1.9.6 beta which is running quite unstable on 64-bit Ubuntu... I had it for 2 days on 32-bit Ubuntu where it ran fine...

                    I experimented with different watchdogs and eventually I came up with this.
                    Let me know if it does it for you too...


                    Code:
                    #!/bin/sh
                    
                    if netstat -lntp | grep -q zabbix_server ; then
                     # Zabbix-server is running... but is it really working???
                     # Let's try and discover more than 1 connection....
                    
                     CONNS=`netstat -natp | grep -c zabbix_server`
                     n=1
                     while sleep .2 ; do
                      NCONNS=`netstat -natp | grep -c zabbix_server`
                      [ $CONNS -lt $NCONNS ] && CONNS=$NCONNS
                      [ $CONNS -gt 1 ] && break
                      [ $n -gt 50 ] && break
                      let n+=1
                     done
                    
                     # Zabbix Server is running....
                     if [ $CONNS -lt 2 ] ; then
                       killall zabbix_server
                       n=1
                       while sleep 1 ; do
                        ps -ef | grep -q [z]abbix_server || break
                        [ $n -gt 20 ]                    && break
                        let n+=1
                       done
                       killall -9 zabbix_server 2>/dev/null
                       sleep 1
                       /etc/init.d/zabbix-server start
                     fi
                    else
                     /etc/init.d/zabbix-server start
                    fi
                    
                    netstat -lntp | grep -q zabbix_agentd || /etc/init.d/zabbix-agent start
                    Zabbix agents on Linux, FreeBSD, Windows, AVM-Fritz!box, DD-WRT and QNAP

                    Comment

                    Working...