Ad Widget

Collapse

zabbix_agentd instability on Solaris

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • untergeek
    Senior Member
    Zabbix Certified Specialist
    • Jun 2009
    • 512

    #1

    zabbix_agentd instability on Solaris

    I have a large number of servers which are failing to keep the zabbix_agentd alive. Even Debug=4 doesn't reveal any reasons for this. It seems to die silently. Does anyone else have this problem?

    I've had to put a cron job in to check every minute and restart the zabbix_agentd process if it's not running. I've grepped the number of restarts since the 29th of January as follows:

    Code:
      1755:20100129:075701.226 zabbix_agentd active check started [REDACTED:10051]
     10870:20100129:095607.040 zabbix_agentd active check started [REDACTED:10051]
     12575:20100129:103305.450 zabbix_agentd active check started [REDACTED:10051]
     13877:20100129:105808.606 zabbix_agentd active check started [REDACTED:10051]
      2293:20100129:151300.744 zabbix_agentd active check started [REDACTED:10051]
      8747:20100129:165605.471 zabbix_agentd active check started [REDACTED:10051]
     20017:20100129:184801.321 zabbix_agentd active check started [REDACTED:10051]
      9760:20100129:224601.437 zabbix_agentd active check started [REDACTED:10051]
     11031:20100130:045201.142 zabbix_agentd active check started [REDACTED:10051]
      8426:20100130:221101.552 zabbix_agentd active check started [REDACTED:10051]
     20998:20100201:064301.607 zabbix_agentd active check started [REDACTED:10051]
     11816:20100201:105300.887 zabbix_agentd active check started [REDACTED:10051]
     11976:20100201:165301.157 zabbix_agentd active check started [REDACTED:10051]
      2774:20100201:205201.353 zabbix_agentd active check started [REDACTED:10051]
      9759:20100201:221400.898 zabbix_agentd active check started [REDACTED:10051]
     19800:20100202:001003.929 zabbix_agentd active check started [REDACTED:10051]
     12193:20100202:160920.099 zabbix_agentd active check started [REDACTED:10051]
     15250:20100202:162906.687 zabbix_agentd active check started [REDACTED:10051]
     20341:20100202:171125.565 zabbix_agentd active check started [REDACTED:10051]
     20971:20100202:171701.694 zabbix_agentd active check started [REDACTED:10051]
     24990:20100202:175914.278 zabbix_agentd active check started [REDACTED:10051]
     27230:20100202:181033.783 zabbix_agentd active check started [REDACTED:10051]
     27631:20100202:181314.174 zabbix_agentd active check started [REDACTED:10051]
     28203:20100202:181502.648 zabbix_agentd active check started [REDACTED:10051]
     29283:20100202:181805.318 zabbix_agentd active check started [REDACTED:10051]
      3289:20100202:183619.808 zabbix_agentd active check started [REDACTED:10051]
      4964:20100202:184307.865 zabbix_agentd active check started [REDACTED:10051]
      6323:20100202:184701.774 zabbix_agentd active check started [REDACTED:10051]
      7446:20100202:185458.747 zabbix_agentd active check started [REDACTED:10051]
      7838:20100202:185815.595 zabbix_agentd active check started [REDACTED:10051]
      8518:20100202:190113.273 zabbix_agentd active check started [REDACTED:10051]
    These are run-of-the-mill Solaris boxes, from older Ultra IIi boxes to spankin' new 5120s with 16G of RAM, running Solaris 8, 9 and 10. This is with the provided binaries (from zabbix.com) and the new 1.8.1 binaries do not behave any differently.

    This did NOT happen with 1.6.x. Why is 1.8.x so unstable?
  • douglaz
    Junior Member
    • May 2009
    • 9

    #2
    I confirm the instability using 1.8 agents on solaris. My solution was to keep using 1.6.x agents on solaris hosts.

    But my agents show a message that they are exiting because some child died (sorry, I don't have the logs right now).
    Last edited by douglaz; 03-02-2010, 20:26.

    Comment

    • untergeek
      Senior Member
      Zabbix Certified Specialist
      • Jun 2009
      • 512

      #3
      Yeah, I think I do see child died messages but no reason WHY, which was my complaint.

      I would use 1.6 agents, but I LOVE the new Include ability of 1.8 and built an entire package and configuration around it. I'd have to start from scratch again without that, which I do not want to do.

      I've also noticed that I see more errors/restarts on my Solaris 8 and 9 boxes than Solaris 10 ones, but I still see it on all of them.

      It's also darned near impossible to find the 1.6.8 agent tarball that Zabbix used to provide. Did they include that functionality in that release?

      Comment

      Working...