Hi All,
We are running a Couple of Zabbix Servers in an Primary/Secondary kindof model and today our primary server suddenly went down.
We are running a PostgreSQL database on the same box as the primary server
which is also the same DB the other server connects to.
On each of our server we also run the JavaGateway to support JMX metrics.
I checked that the PostgreSQL server was running normally and noticed nothing untoward in the /var/logs/messages and dmesg logs.
-bash-4.1$ dmesg | grep -i memory
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-0000000075ddf000
init_memory_mapping: 0000000100000000-000000307ffff000
Reserving 141MB of memory at 48MB for crashkernel (System RAM: 198655MB)
PM: Registered nosave memory: 0000000000095000 - 0000000000096000
PM: Registered nosave memory: 0000000000096000 - 0000000000098000
PM: Registered nosave memory: 0000000000098000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
PM: Registered nosave memory: 0000000075dcc000 - 0000000075dde000
PM: Registered nosave memory: 0000000075ddf000 - 0000000090000000
PM: Registered nosave memory: 0000000090000000 - 00000000fec00000
PM: Registered nosave memory: 00000000fec00000 - 00000000fee10000
PM: Registered nosave memory: 00000000fee10000 - 00000000ff800000
PM: Registered nosave memory: 00000000ff800000 - 0000000100000000
Memory: 198153016k/203423740k available (5336k kernel code, 2263676k absent, 3007048k reserved, 7016k data, 1292k init)
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Initializing cgroup subsys memory
Freeing initrd memory: 18885k freed
Non-volatile memory driver v1.3
crash memory driver: version 1.1
Freeing unused kernel memory: 1292k freed
Freeing unused kernel memory: 788k freed
Freeing unused kernel memory: 1568k freed
IPVS: Connection hash table configured (size=4096, memory=64Kbytes
The following are the last lines from the zabbix_server.log which indicate something related to dbconfig.c resulted in an Out Of Memory Exception or something of that sort to bring down the server...
40531:20151104:135839.323 fping failed: cdc-hpcblx009-14.myCompany.com_10.224.162.19 address not found
40499:20151104:135839.625 __mem_malloc: skipped 0 asked 24 skip_min 4294967295 skip_max 0
40499:20151104:135839.625 file:dbconfig.c,line:446 zbx_mem_realloc(): out of memory (requested 16 bytes)
40499:20151104:135839.625 file:dbconfig.c,line:446 zbx_mem_realloc(): please increase CacheSize configuration parameter
40494:20151104:135839.628 One child process died (PID:40499,exitcode/signal:1). Exiting ...
40494:20151104:135841.631 syncing history data...
40494:20151104:135841.751 syncing history data done
40494:20151104:135841.751 syncing trends data...
40494:20151104:135844.508 syncing trends data done
40494:20151104:135844.508 Zabbix Server stopped. Zabbix 2.4.4 (revision 52341).
Somebody had earlier seen such abrupt termination of the server and had opined in https://support.zabbix.com/browse/ZBX-4415
Can any of you please shed light if you have seen such behavior and how did you fix it ?
We are running a Couple of Zabbix Servers in an Primary/Secondary kindof model and today our primary server suddenly went down.
We are running a PostgreSQL database on the same box as the primary server
which is also the same DB the other server connects to.
On each of our server we also run the JavaGateway to support JMX metrics.
I checked that the PostgreSQL server was running normally and noticed nothing untoward in the /var/logs/messages and dmesg logs.
-bash-4.1$ dmesg | grep -i memory
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-0000000075ddf000
init_memory_mapping: 0000000100000000-000000307ffff000
Reserving 141MB of memory at 48MB for crashkernel (System RAM: 198655MB)
PM: Registered nosave memory: 0000000000095000 - 0000000000096000
PM: Registered nosave memory: 0000000000096000 - 0000000000098000
PM: Registered nosave memory: 0000000000098000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
PM: Registered nosave memory: 0000000075dcc000 - 0000000075dde000
PM: Registered nosave memory: 0000000075ddf000 - 0000000090000000
PM: Registered nosave memory: 0000000090000000 - 00000000fec00000
PM: Registered nosave memory: 00000000fec00000 - 00000000fee10000
PM: Registered nosave memory: 00000000fee10000 - 00000000ff800000
PM: Registered nosave memory: 00000000ff800000 - 0000000100000000
Memory: 198153016k/203423740k available (5336k kernel code, 2263676k absent, 3007048k reserved, 7016k data, 1292k init)
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Initializing cgroup subsys memory
Freeing initrd memory: 18885k freed
Non-volatile memory driver v1.3
crash memory driver: version 1.1
Freeing unused kernel memory: 1292k freed
Freeing unused kernel memory: 788k freed
Freeing unused kernel memory: 1568k freed
IPVS: Connection hash table configured (size=4096, memory=64Kbytes
The following are the last lines from the zabbix_server.log which indicate something related to dbconfig.c resulted in an Out Of Memory Exception or something of that sort to bring down the server...
40531:20151104:135839.323 fping failed: cdc-hpcblx009-14.myCompany.com_10.224.162.19 address not found
40499:20151104:135839.625 __mem_malloc: skipped 0 asked 24 skip_min 4294967295 skip_max 0
40499:20151104:135839.625 file:dbconfig.c,line:446 zbx_mem_realloc(): out of memory (requested 16 bytes)
40499:20151104:135839.625 file:dbconfig.c,line:446 zbx_mem_realloc(): please increase CacheSize configuration parameter
40494:20151104:135839.628 One child process died (PID:40499,exitcode/signal:1). Exiting ...
40494:20151104:135841.631 syncing history data...
40494:20151104:135841.751 syncing history data done
40494:20151104:135841.751 syncing trends data...
40494:20151104:135844.508 syncing trends data done
40494:20151104:135844.508 Zabbix Server stopped. Zabbix 2.4.4 (revision 52341).
Somebody had earlier seen such abrupt termination of the server and had opined in https://support.zabbix.com/browse/ZBX-4415
Can any of you please shed light if you have seen such behavior and how did you fix it ?
Comment