Ad Widget

Collapse

Immortal zabbix_server

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jarek
    Member
    • May 2005
    • 35

    #1

    Immortal zabbix_server

    Hello developers!
    It looks, that not only me is experiencing crashes of zabbix_server process. Can you consider changing behavior of server, so it will respawn automatically if something will crash ?
    Of course it is good idea, to have perfect, bug-free code, but in reality is very difficult to reach.
    In my apps, I'm doing it in that way, that I have a master process which has only fork and waitpid in main loop. If something will crash, it just restarted.
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Originally posted by jarek
    In my apps, I'm doing it in that way, that I have a master process which has only fork and waitpid in main loop. If something will crash, it just restarted.
    Who restarts the master process?
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • jarek
      Member
      • May 2005
      • 35

      #3
      Originally posted by Alexei
      Who restarts the master process?
      The master process is quite simple, so there is very little risk of crash.
      Let see how it can look:

      Code:
      int main(int argc, char **argv)
      {
              zbx_task_t      task  = ZBX_TASK_START;
              char    ch      = '\0';
      
              int     nodeid = 0;
              pid_t   pid;
      
              progname = argv[0];
      
              while( 1 )
              {
                      pid = fork();
                      if( pid == 0 )
                              break; // We are child, go on
                      waitpid( pid, NULL, 0 ); // Wait for child to finish
                      sleep(5); //Prohibit too fast respawning
              }
      
              /* Parse the command-line. */
              while ((ch = (char)zbx_getopt_long(argc, argv, shortopts, longopts,NULL)) != (char)EOF)
              switch (ch) {
      Of course for production solution, while(1) can be replaced with some variable, which can be changed i.e. by INTR. Also some interrupt handling can be helpful.
      If you like this idea, I can write more efficient solution.

      Best regards
      Jarek

      Comment

      • nelsonab
        Senior Member
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2006
        • 1233

        #4
        Automagic respawning to me is a bad idea. If the server is stopping for a reason such as an error the program should stop. Now if it's stopping due to aberrant behavior then ok, maybe there is a point to a restart however there is the risk the previous crash left the application in an unstable state upon restart, ie there is bad data in the DB which will cause subsequent restarts to fail. Yes, Zabbix does like to quit on the rare occasion, and yes this is not very good, however there are other ways to restart the app rather than have the master thread fork itself.

        A better solution might be to have a heartbeat script which is tied to a cron job. Every 5 minutes you check to see if you have a zabbix process, if you don't fire an email. This way you can check to see if something is truly gorked before you restart the server.

        Though if you really want to get crazy I did read about a solution from one of the early timesharing systems from the 60's/70's called Robin Hood and the Sheriff. Both process would be running concurrently and looking for each other. If you killed one process, say Robin Hood, the Sheriff would restart Robin Hood. If you killed the sheriff process Robin Hood would restart the sheriff process. Extrapolate that to Zabbix and you have an answer to Alexei's question... There is no one watcher... so a watcher watches a watcher who watches zabbix... while watching the watcher.....

        Ok... I step away from the comptuer now...
        RHCE, author of zbxapi
        Ansible, the missing piece (Zabconf 2017): https://www.youtube.com/watch?v=R5T9NidjjDE
        Zabbix and SNMP on Linux (Zabconf 2015): https://www.youtube.com/watch?v=98PEHpLFVHM

        Comment

        • pesadilla
          Member
          • Nov 2006
          • 69

          #5
          Originally posted by nelsonab
          Automagic respawning to me is a bad idea. If the server is stopping for a reason such as an error the program should stop. Now if it's stopping due to aberrant behavior then ok, maybe there is a point to a restart however there is the risk the previous crash left the application in an unstable state upon restart, ie there is bad data in the DB which will cause subsequent restarts to fail. Yes, Zabbix does like to quit on the rare occasion, and yes this is not very good, however there are other ways to restart the app rather than have the master thread fork itself.

          A better solution might be to have a heartbeat script which is tied to a cron job. Every 5 minutes you check to see if you have a zabbix process, if you don't fire an email. This way you can check to see if something is truly gorked before you restart the server.
          ..
          agree with this idea

          Comment

          • Tenzer
            Senior Member
            • Nov 2007
            • 316

            #6
            You can also set up Monit to monitor the Zabbix server. It can be configured to e-mail you when the server goes down, it can automatically restart the Zabbix server, and you can specify thresholds for how many times it may restart the Zabbix server before it gives up.

            Comment

            • jarek
              Member
              • May 2005
              • 35

              #7
              I agree, that autorespawning is not a cure for all bugs, but it is simple solution which decreases risk of data loss.
              External watchdogs can be more flexible, but it general they are doing exactly same.
              Regarding database: the respawning should be done before intialization of database connection - in this case there is no risk of any data corruption.
              If you don't like the idea, it is quite simple to add configuration parameter, which disables the feature.
              Most of you are using apache which has the same autorespawning. Is it bad ?

              Comment

              • Saftnase
                Member
                • Jul 2006
                • 30

                #8
                Originally posted by Tenzer
                You can also set up Monit to monitor the Zabbix server. It can be configured to e-mail you when the server goes down, it can automatically restart the Zabbix server, and you can specify thresholds for how many times it may restart the Zabbix server before it gives up.
                I agree on you Tenzer, but i did it with the Zabbix Server itself.

                When i had the problem with net-snmp memory leak, i monitored the free swap space, and when it dropped below the trigger i just restarted the hole machine. (Takes only 5 min., so no problem for monitored hosts)

                Comment

                Working...