Ad Widget

Collapse

data collection stops, suspect shared memory issue

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • fascinatedcow
    Junior Member
    • Mar 2010
    • 20

    #1

    data collection stops, suspect shared memory issue

    Hi,

    Fairly regularly (around once every 6 weeks), we see our monitoring system meltdown. By meltdown, I mean data collection stops working correctly. We have a node and several proxies. We see many hosts slip behind on their data (we have triggers for this so it it lights up like a Christmas tree).

    I had a look when this problem last happened and I see this:

    Code:
    root@node03:~# ipcs -m
    
    ------ Shared Memory Segments --------
    key        shmid      owner      perms      bytes      nattch     status      
    0x7a0563c5 1114112    zabbix    666        877448     0                       
    0x63055ef2 557057     zabbix    666        29361128   0                       
    0x6b055ef2 589826     zabbix    666        1073741824 0                       
    0x630563ce 983043     zabbix    666        29361128   0                       
    0x6b0563ce 1015812    zabbix    666        1073741824 0                       
    0x630563cf 851973     zabbix    666        29361128   0                       
    0x6b0563cf 884742     zabbix    666        1073741824 0                       
    0x630563c9 917511     zabbix    666        29361128   0                       
    0x6b0563c9 950280     zabbix    666        1073741824 0                       
    0x630560de 1146889    zabbix    666        29361128   233                     
    0x6b0560de 1179658    zabbix    666        1073741824 233                     
    0x7a056038 1212427    zabbix    666        877448     7
    Although I am not 100% sure that the problem we see is related to this, all of that does not look healthy to me. We're running zabbix_server and zabbix_agent on this host and this is where our other proxies feed their data.

    Shutting down zabbix_server and agent, zapping the shared memory and restarting leaves us with:

    Code:
    root@node03:~# ipcs -m
    
    ------ Shared Memory Segments --------
    key        shmid      owner      perms      bytes      nattch     status      
    0x7a056038 1245184    zabbix    666        877448     7                       
    0x630560de 1277953    zabbix    666        29361128   233                     
    0x6b0560de 1310722    zabbix    666        1073741824 233
    This looks pretty healthy to me, although I do not understand why there are 3 segments and not 2.

    Does anyone have a related issue or know the details of how the shared memory allocation works? Perhaps I'm barking up the wrong tree, but when I was installing this originally, data collection would randomly stop without any errors logged, and the fix was to increase the max shared memory available. So it appears from my perspective that there is some issue with zabbix_server handing of shared memory.


    Matt

    Server v1.8.2
    Agent v1.8.3
    Proxies v1.8.2
  • richlv
    Senior Member
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Oct 2005
    • 3112

    #2
    while there are multiple possible reasons, my first guess would be caches filling up. see internal item monitoring section in item documentation - add items to monitor configuration, history & trends caches. how full are they ?

    also check server logfile for error messages.

    and consider upgrading to 1.8.3 or 1.8.4 (once it's out), if i recall correctly, it could reduce amount of used (shared) memory for the caches
    Zabbix 3.0 Network Monitoring book

    Comment

    • fascinatedcow
      Junior Member
      • Mar 2010
      • 20

      #3
      Thanks very much Richlv!

      We were not monitoring these ones before but now we are. Although I could find only history, trends and text under wcache and buffer under rcache. Should I be looking somewhere else to monitor config cache?

      Many thanks again.

      Matt

      Comment

      • richlv
        Senior Member
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Oct 2005
        • 3112

        #4
        rcache buffer is the configuration cache, so you should be all set
        Zabbix 3.0 Network Monitoring book

        Comment

        Working...