Hi,
Fairly regularly (around once every 6 weeks), we see our monitoring system meltdown. By meltdown, I mean data collection stops working correctly. We have a node and several proxies. We see many hosts slip behind on their data (we have triggers for this so it it lights up like a Christmas tree).
I had a look when this problem last happened and I see this:
Although I am not 100% sure that the problem we see is related to this, all of that does not look healthy to me. We're running zabbix_server and zabbix_agent on this host and this is where our other proxies feed their data.
Shutting down zabbix_server and agent, zapping the shared memory and restarting leaves us with:
This looks pretty healthy to me, although I do not understand why there are 3 segments and not 2.
Does anyone have a related issue or know the details of how the shared memory allocation works? Perhaps I'm barking up the wrong tree, but when I was installing this originally, data collection would randomly stop without any errors logged, and the fix was to increase the max shared memory available. So it appears from my perspective that there is some issue with zabbix_server handing of shared memory.
Matt
Server v1.8.2
Agent v1.8.3
Proxies v1.8.2
Fairly regularly (around once every 6 weeks), we see our monitoring system meltdown. By meltdown, I mean data collection stops working correctly. We have a node and several proxies. We see many hosts slip behind on their data (we have triggers for this so it it lights up like a Christmas tree).
I had a look when this problem last happened and I see this:
Code:
root@node03:~# ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x7a0563c5 1114112 zabbix 666 877448 0 0x63055ef2 557057 zabbix 666 29361128 0 0x6b055ef2 589826 zabbix 666 1073741824 0 0x630563ce 983043 zabbix 666 29361128 0 0x6b0563ce 1015812 zabbix 666 1073741824 0 0x630563cf 851973 zabbix 666 29361128 0 0x6b0563cf 884742 zabbix 666 1073741824 0 0x630563c9 917511 zabbix 666 29361128 0 0x6b0563c9 950280 zabbix 666 1073741824 0 0x630560de 1146889 zabbix 666 29361128 233 0x6b0560de 1179658 zabbix 666 1073741824 233 0x7a056038 1212427 zabbix 666 877448 7
Shutting down zabbix_server and agent, zapping the shared memory and restarting leaves us with:
Code:
root@node03:~# ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x7a056038 1245184 zabbix 666 877448 7 0x630560de 1277953 zabbix 666 29361128 233 0x6b0560de 1310722 zabbix 666 1073741824 233
Does anyone have a related issue or know the details of how the shared memory allocation works? Perhaps I'm barking up the wrong tree, but when I was installing this originally, data collection would randomly stop without any errors logged, and the fix was to increase the max shared memory available. So it appears from my perspective that there is some issue with zabbix_server handing of shared memory.
Matt
Server v1.8.2
Agent v1.8.3
Proxies v1.8.2
Comment