Ad Widget

Collapse

Watchdog doesn't work in v1.4.2

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • gustav
    Junior Member
    • Sep 2007
    • 10

    #1

    Watchdog doesn't work in v1.4.2

    I shutdown my mysql server and zabbix_server went down more or less imediately < 2 sek.

    I was expecting the zabbix_server to send me an e-mail and then try to contact mysql as soon as it went up again. As I read that zabbix watchdog should do for me.

    [root@spik zabbix]# zabbix_server --version
    ZABBIX Server (daemon) v1.4.2 (20 August 2007)
    Compilation time: Sep 5 2007 14:38:19

    Extract from the server log:
    7373:20070911:173538 Query failed:MySQL server has gone away [2006]
    7391:20070911:173543 Query::select druleid,iprange,delay,nextcheck,name,status
    from drules where status=0 and nextcheck<=1189524943 and mod(druleid,1)=0 and d
    ruleid>=100000000000000*0 and druleid<=(100000000000000*0+99999999999999)
    7391:20070911:173543 Query failed:MySQL server has gone away [2006]
    7385:20070911:173543 Failed to connect to database: Error: Can't connect to My
    SQL server on 'bi.sthlm.se.eds.com' (111) [2003]
    7369:20070911:173543 One child process died. Exiting ...
    7369:20070911:173545 ZABBIX Server stopped

    Our mysql server is actually very stable and is even though residing on an HA. Nevertheles it would be such a nice feature if this would work. That would make it easier for us to maintain our MySQL server, which by the way doesn't reside on the same machine as the Zabbix server.

    What do you guys say, any suggestions?

    /Gustav Karlman
  • alj
    Senior Member
    • Aug 2006
    • 188

    #2
    It would take alot of time to clean up all the code to avoid crashes and leaks. What would make sense is that zabbix should not die if one of the children exits. In fact like in apache every child can be configured to exit by itself after 100k requests or so to to avoid memory fragmentation and avoid consequences of memory leaks.

    The process nanny should restart children as they die.

    The easiest way - just to copy that piece of code from apache-prefork model. It has neat features like avoiding fork-storms (it creates only 1 process per second).


    The next step would be to implement smart children management. I e config would have only one option - max number of children (to not trip on database connection limit), then zabbix would dynamically decide how many pollers/trappers/http pollers to fork or not to fork (after they exit) based on recent statistics.

    Comment

    • gustav
      Junior Member
      • Sep 2007
      • 10

      #3
      I supose it should work already.

      As I understand it, it is suposed to work... It is listed as a new feature in this release, so I asume I did something wrong or it is a bug?

      /Gustav

      Comment

      • gustav
        Junior Member
        • Sep 2007
        • 10

        #4
        Found the problem in the souce code...

        I found and corrected the problem in the source code, I asume at least, since I haven't studied it in detail.

        The problem resided in db_connect and that the errno from mysql connect wasn't handled.

        The errno was 2003.
        #define CR_CONN_HOST_ERROR 2003

        I just added it in the switch and set ret to ZBX_DB_DOWN.

        You, Alexei, tell me if it is correct?

        Anyway, now it works, if I loose any transactions? I didn't analyze it that carefully...

        Comment

        Working...