I am very much new to Linux, so forgive my lack of certain basic knowledge
We have had this happen on two separate deployments now. Both are still up and in the process of being troubleshot.
We first start it up and everything is fine. We upload all our servers to manage and setup up items and triggers, also fine.
First server email alerts was setup and it worked fine for a bit, then one day it went offline. In that scenario the disk space had run out so we made the presumption that something became corrupted when space ran out, even though we allocated it more space.
Second server went offline as I was setting up email alerts. The only difference in this setup is I had hooked in LDAP 3 hours prior to the crash. Disk space is fine
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 50G 12G 39G 23% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 84K 1.9G 1% /dev/shm
tmpfs 1.9G 88M 1.8G 5% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/centos-home 46G 37M 46G 1% /home
/dev/sda1 497M 251M 246M 51% /boot
tmpfs 380M 16K 380M 1% /run/user/42
tmpfs 380M 0 380M 0% /run/user/0
Both servers are doing the same thing. The zabbix-server service is in a constant state of loaded activating auto restart. When we refresh the UI sometimes it is online, but mostly offline. We have tried various suggestions on how to fix this, but to no avail. Here is what we get with systemctl status zabbix-server
[root@tor-zabbix02 tmp]# systemctl status zabbix-server.service
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2016-09-07 10:44:14 EDT; 8s ago
Process: 14673 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=1/FAILURE)
Process: 14614 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
Main PID: 14616 (code=exited, status=0/SUCCESS)
Sep 07 10:44:14 servername kill[14673]: -a, --all do not restrict the name-to-pid conversion t...esses
Sep 07 10:44:14 servernamel[14673]: with the same uid as the present process
Sep 07 10:44:14 servername kill[14673]: -s, --signal <sig> send specified signal
Sep 07 10:44:14 servername kill[14673]: -q, --queue <sig> use sigqueue(2) rather than kill(2)
Sep 07 10:44:14 servername kill[14673]: -p, --pid print pids without signaling them
Sep 07 10:44:14 servername kill[14673]: -l, --list [=<signal>] list signal names, or convert one to a name
Sep 07 10:44:14 servername kill[14673]: -L, --table list signal names and numbers
Sep 07 10:44:14 servername kill[14673]: -h, --help display this help and exit
Sep 07 10:44:14 servername kill[14673]: -V, --version output version information and exit
Sep 07 10:44:14 servername kill[14673]: For more details see kill(1).
Hint: Some lines were ellipsized, use -l to show in full.
we have set selinux to permissive, we have raised the max connections in the mysql to 300. Plus a few other things we have found but so far nothing works
It has been suggested that perhaps one of our caches is filling up, but this is where my lack of linux knowledge of is shining through
We have had this happen on two separate deployments now. Both are still up and in the process of being troubleshot.
We first start it up and everything is fine. We upload all our servers to manage and setup up items and triggers, also fine.
First server email alerts was setup and it worked fine for a bit, then one day it went offline. In that scenario the disk space had run out so we made the presumption that something became corrupted when space ran out, even though we allocated it more space.
Second server went offline as I was setting up email alerts. The only difference in this setup is I had hooked in LDAP 3 hours prior to the crash. Disk space is fine
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 50G 12G 39G 23% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 84K 1.9G 1% /dev/shm
tmpfs 1.9G 88M 1.8G 5% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/centos-home 46G 37M 46G 1% /home
/dev/sda1 497M 251M 246M 51% /boot
tmpfs 380M 16K 380M 1% /run/user/42
tmpfs 380M 0 380M 0% /run/user/0
Both servers are doing the same thing. The zabbix-server service is in a constant state of loaded activating auto restart. When we refresh the UI sometimes it is online, but mostly offline. We have tried various suggestions on how to fix this, but to no avail. Here is what we get with systemctl status zabbix-server
[root@tor-zabbix02 tmp]# systemctl status zabbix-server.service
● zabbix-server.service - Zabbix Server
Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2016-09-07 10:44:14 EDT; 8s ago
Process: 14673 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=1/FAILURE)
Process: 14614 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
Main PID: 14616 (code=exited, status=0/SUCCESS)
Sep 07 10:44:14 servername kill[14673]: -a, --all do not restrict the name-to-pid conversion t...esses
Sep 07 10:44:14 servernamel[14673]: with the same uid as the present process
Sep 07 10:44:14 servername kill[14673]: -s, --signal <sig> send specified signal
Sep 07 10:44:14 servername kill[14673]: -q, --queue <sig> use sigqueue(2) rather than kill(2)
Sep 07 10:44:14 servername kill[14673]: -p, --pid print pids without signaling them
Sep 07 10:44:14 servername kill[14673]: -l, --list [=<signal>] list signal names, or convert one to a name
Sep 07 10:44:14 servername kill[14673]: -L, --table list signal names and numbers
Sep 07 10:44:14 servername kill[14673]: -h, --help display this help and exit
Sep 07 10:44:14 servername kill[14673]: -V, --version output version information and exit
Sep 07 10:44:14 servername kill[14673]: For more details see kill(1).
Hint: Some lines were ellipsized, use -l to show in full.
we have set selinux to permissive, we have raised the max connections in the mysql to 300. Plus a few other things we have found but so far nothing works
It has been suggested that perhaps one of our caches is filling up, but this is where my lack of linux knowledge of is shining through