Ad Widget

Collapse

ARGH!! One child process died. Exiting ...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • erozen
    Junior Member
    Zabbix Certified Specialist
    • Apr 2007
    • 18

    #1

    ARGH!! One child process died. Exiting ...

    In my logs this today, the ubiquitous:
    988:20081007:164842 One child process died. Exiting ...


    I still can't figure out why this makes sense.

    To paraphrase:
    'I detected a problem. You no longer have monitoring.'
    'I couldn't have a lollipop, so i thew all my toys out of the pram and held my breath until i turned blue.'

    What's the rationale behind this??? Why doesn't it try to restart the thread, and only if it continually fails, stop respawning that _particular_ thread and send out a notification?

    In fact, that's a good point - why doesn't it notify when this happens? Much like when the DB dies? I might write a patch to do this, unless anyone can give me a good reason not too....


    I understand most of zabbix now - once learnt it makes sense - but i find this perpetually perplexing.
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Would you like to find ZABBIX Server trying to restart a failing thread indefinitely? I do not think so.

    I agree with you on some better reporting in case of a server failure. Currently it is nearly impossible to understand why this happened due to lack of additional details. This must be improved and I already saw contributed patches to fix this.

    Anyway, registered as a problem, ZBX-539.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • erozen
      Junior Member
      Zabbix Certified Specialist
      • Apr 2007
      • 18

      #3
      Originally posted by Alexei
      Would you like to find ZABBIX Server trying to restart a failing thread indefinitely? I do not think so.
      Indefinitely? Of course not. Once or twice? Absolutely!
      Most of the time these are edge cases that happen once every 6 months or so - restarting the thread and seeing if it continues is a valid reaction. Certainly, that's what i do manually whenever this happens.

      And if it's happening in a trapper thread, why must my pollers be taken down? Diminished is better than nothing - especially if the server could communicate it's degraded state to agents, and have them adapt. For example, in the above case, it could convert all active items to polled.
      Well perhaps not all, maybe ones used in triggers with severity > average, or somesuch?

      Comment

      Working...