I have a large number of servers which are failing to keep the zabbix_agentd alive. Even Debug=4 doesn't reveal any reasons for this. It seems to die silently. Does anyone else have this problem?
I've had to put a cron job in to check every minute and restart the zabbix_agentd process if it's not running. I've grepped the number of restarts since the 29th of January as follows:
These are run-of-the-mill Solaris boxes, from older Ultra IIi boxes to spankin' new 5120s with 16G of RAM, running Solaris 8, 9 and 10. This is with the provided binaries (from zabbix.com) and the new 1.8.1 binaries do not behave any differently.
This did NOT happen with 1.6.x. Why is 1.8.x so unstable?
I've had to put a cron job in to check every minute and restart the zabbix_agentd process if it's not running. I've grepped the number of restarts since the 29th of January as follows:
Code:
1755:20100129:075701.226 zabbix_agentd active check started [REDACTED:10051] 10870:20100129:095607.040 zabbix_agentd active check started [REDACTED:10051] 12575:20100129:103305.450 zabbix_agentd active check started [REDACTED:10051] 13877:20100129:105808.606 zabbix_agentd active check started [REDACTED:10051] 2293:20100129:151300.744 zabbix_agentd active check started [REDACTED:10051] 8747:20100129:165605.471 zabbix_agentd active check started [REDACTED:10051] 20017:20100129:184801.321 zabbix_agentd active check started [REDACTED:10051] 9760:20100129:224601.437 zabbix_agentd active check started [REDACTED:10051] 11031:20100130:045201.142 zabbix_agentd active check started [REDACTED:10051] 8426:20100130:221101.552 zabbix_agentd active check started [REDACTED:10051] 20998:20100201:064301.607 zabbix_agentd active check started [REDACTED:10051] 11816:20100201:105300.887 zabbix_agentd active check started [REDACTED:10051] 11976:20100201:165301.157 zabbix_agentd active check started [REDACTED:10051] 2774:20100201:205201.353 zabbix_agentd active check started [REDACTED:10051] 9759:20100201:221400.898 zabbix_agentd active check started [REDACTED:10051] 19800:20100202:001003.929 zabbix_agentd active check started [REDACTED:10051] 12193:20100202:160920.099 zabbix_agentd active check started [REDACTED:10051] 15250:20100202:162906.687 zabbix_agentd active check started [REDACTED:10051] 20341:20100202:171125.565 zabbix_agentd active check started [REDACTED:10051] 20971:20100202:171701.694 zabbix_agentd active check started [REDACTED:10051] 24990:20100202:175914.278 zabbix_agentd active check started [REDACTED:10051] 27230:20100202:181033.783 zabbix_agentd active check started [REDACTED:10051] 27631:20100202:181314.174 zabbix_agentd active check started [REDACTED:10051] 28203:20100202:181502.648 zabbix_agentd active check started [REDACTED:10051] 29283:20100202:181805.318 zabbix_agentd active check started [REDACTED:10051] 3289:20100202:183619.808 zabbix_agentd active check started [REDACTED:10051] 4964:20100202:184307.865 zabbix_agentd active check started [REDACTED:10051] 6323:20100202:184701.774 zabbix_agentd active check started [REDACTED:10051] 7446:20100202:185458.747 zabbix_agentd active check started [REDACTED:10051] 7838:20100202:185815.595 zabbix_agentd active check started [REDACTED:10051] 8518:20100202:190113.273 zabbix_agentd active check started [REDACTED:10051]
This did NOT happen with 1.6.x. Why is 1.8.x so unstable?
Comment