Ad Widget

**Alexei** · 22-12-2008, 20:49

Originally posted by jarek

In my apps, I'm doing it in that way, that I have a master process which has only fork and waitpid in main loop. If something will crash, it just restarted.

Who restarts the master process?

**jarek** · 27-02-2009, 21:13

Originally posted by Alexei

Who restarts the master process?

The master process is quite simple, so there is very little risk of crash.
Let see how it can look:

Code:

int main(int argc, char **argv)
{
        zbx_task_t      task  = ZBX_TASK_START;
        char    ch      = '\0';

        int     nodeid = 0;
        pid_t   pid;

        progname = argv[0];

        while( 1 )
        {
                pid = fork();
                if( pid == 0 )
                        break; // We are child, go on
                waitpid( pid, NULL, 0 ); // Wait for child to finish
                sleep(5); //Prohibit too fast respawning
        }

        /* Parse the command-line. */
        while ((ch = (char)zbx_getopt_long(argc, argv, shortopts, longopts,NULL)) != (char)EOF)
        switch (ch) {

Of course for production solution, while(1) can be replaced with some variable, which can be changed i.e. by INTR. Also some interrupt handling can be helpful.
If you like this idea, I can write more efficient solution.

Best regards
Jarek

**nelsonab** · 02-03-2009, 05:14

Automagic respawning to me is a bad idea. If the server is stopping for a reason such as an error the program should stop. Now if it's stopping due to aberrant behavior then ok, maybe there is a point to a restart however there is the risk the previous crash left the application in an unstable state upon restart, ie there is bad data in the DB which will cause subsequent restarts to fail. Yes, Zabbix does like to quit on the rare occasion, and yes this is not very good, however there are other ways to restart the app rather than have the master thread fork itself.

A better solution might be to have a heartbeat script which is tied to a cron job. Every 5 minutes you check to see if you have a zabbix process, if you don't fire an email. This way you can check to see if something is truly gorked before you restart the server.

Though if you really want to get crazy I did read about a solution from one of the early timesharing systems from the 60's/70's called Robin Hood and the Sheriff. Both process would be running concurrently and looking for each other. If you killed one process, say Robin Hood, the Sheriff would restart Robin Hood. If you killed the sheriff process Robin Hood would restart the sheriff process. Extrapolate that to Zabbix and you have an answer to Alexei's question... There is no one watcher... so a watcher watches a watcher who watches zabbix... while watching the watcher.....

Ok... I step away from the comptuer now...

**pesadilla** · 02-03-2009, 10:42

Originally posted by nelsonab

Automagic respawning to me is a bad idea. If the server is stopping for a reason such as an error the program should stop. Now if it's stopping due to aberrant behavior then ok, maybe there is a point to a restart however there is the risk the previous crash left the application in an unstable state upon restart, ie there is bad data in the DB which will cause subsequent restarts to fail. Yes, Zabbix does like to quit on the rare occasion, and yes this is not very good, however there are other ways to restart the app rather than have the master thread fork itself.

A better solution might be to have a heartbeat script which is tied to a cron job. Every 5 minutes you check to see if you have a zabbix process, if you don't fire an email. This way you can check to see if something is truly gorked before you restart the server.
..

agree with this idea

**Tenzer** · 02-03-2009, 10:58

You can also set up Monit to monitor the Zabbix server. It can be configured to e-mail you when the server goes down, it can automatically restart the Zabbix server, and you can specify thresholds for how many times it may restart the Zabbix server before it gives up.

**jarek** · 02-03-2009, 14:47

I agree, that autorespawning is not a cure for all bugs, but it is simple solution which decreases risk of data loss.
External watchdogs can be more flexible, but it general they are doing exactly same.
Regarding database: the respawning should be done before intialization of database connection - in this case there is no risk of any data corruption.
If you don't like the idea, it is quite simple to add configuration parameter, which disables the feature.
Most of you are using apache which has the same autorespawning. Is it bad ?

**Saftnase** · 05-03-2009, 17:30

Originally posted by Tenzer

You can also set up Monit to monitor the Zabbix server. It can be configured to e-mail you when the server goes down, it can automatically restart the Zabbix server, and you can specify thresholds for how many times it may restart the Zabbix server before it gives up.

I agree on you Tenzer, but i did it with the Zabbix Server itself.

When i had the problem with net-snmp memory leak, i monitored the free swap space, and when it dropped below the trigger i just restarted the hole machine. (Takes only 5 min., so no problem for monitored hosts)

Ad Widget

Immortal zabbix_server

Immortal zabbix_server

Comment

Comment

Comment

Comment

Comment

Comment

Comment