Hi,
We have about 100 machines w/ agents & single server. All was working stably on 1.1.7 up until the server migration to 1.4.5, agents stayed 1.1.7
After the migration, server process started randomly crashing under heavy loads of activity - e.g. agents going down, many alerts triggered at once, latency on connection to agents and etc.
Initially, errors about connection to the database (mysql 5.0.27) appeared in the log with "[Interrupted system call]" and same message related to timeouts on connection to agents. After reading the posts here on this we changed the trappers value in conf file, set up mysql for more connections and increased the timeouts. Those errors were gone but the sporadic crashes of the server remained. Error log doesn't help much beyond "One child process died. Exiting ..."
There was a post here regarding a bad 1.4.5 version that got cached after release - checked it, the correct one was used.
Setup is Centos 5 (2.6.18-8.1.8.el5), 1GB mem, 3Ghz dual Xeon.
The machine itself is not under a heavy load during the crash. Conf files, error log and my.cnf are attached.
Any clues?
We have about 100 machines w/ agents & single server. All was working stably on 1.1.7 up until the server migration to 1.4.5, agents stayed 1.1.7
After the migration, server process started randomly crashing under heavy loads of activity - e.g. agents going down, many alerts triggered at once, latency on connection to agents and etc.
Initially, errors about connection to the database (mysql 5.0.27) appeared in the log with "[Interrupted system call]" and same message related to timeouts on connection to agents. After reading the posts here on this we changed the trappers value in conf file, set up mysql for more connections and increased the timeouts. Those errors were gone but the sporadic crashes of the server remained. Error log doesn't help much beyond "One child process died. Exiting ..."
There was a post here regarding a bad 1.4.5 version that got cached after release - checked it, the correct one was used.
Setup is Centos 5 (2.6.18-8.1.8.el5), 1GB mem, 3Ghz dual Xeon.
The machine itself is not under a heavy load during the crash. Conf files, error log and my.cnf are attached.
Any clues?


Comment