Ad Widget

Collapse

trigger not working, but expression looks right

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • groknaut
    Junior Member
    • Feb 2012
    • 7

    #1

    trigger not working, but expression looks right

    hello,

    could someone help me with this trigger? this trigger compares current JVM memory usage (RSS) with the current total memory on the system. if the RSS memory of the JVM is greater than 70% of total memory, it should trigger an average severity alert. but it's not triggering.

    OS: CentOS 5.7
    zabbix: 1.8.10

    here's the expression:

    {Template_Java_OS_Resources:check_health[mem,java,rss].last(0)}>({Template_Linux:vm.memory.size[total].last(0)}*0.7)

    check_health is a script on the system, called as a UserParameter in zabbix-agent.conf.

    UserParameter=check_health[*],sudo /usr/local/bin/check_health.rb $1 $2 $3

    we do indeed get data from this check:

    webapp# su - zabbix
    webapp$ sudo check_health.rb mem java rss
    5541860

    and given the total memory on the system,

    webapp$ free | grep ^Mem | awk '{print $2}'
    7347752

    that trigger should be engaged, as 5541860 is greater than 70% of total memory.

    thanks,
    kallen
    Last edited by groknaut; 30-04-2012, 22:17. Reason: provide version numbers
  • JBo
    Senior Member
    • Jan 2011
    • 310

    #2
    Hi,

    Are you getting these data in Zabbix ?
    You can check them in Monitoring / Latest data.

    Regards,
    JBo

    Comment

    • heaje
      Senior Member
      Zabbix Certified Specialist
      • Sep 2009
      • 325

      #3
      Another thing to check is to be sure you're comparing similar units. For instance, it looks like your check_health[mem,java,rss] item is collecting data in kilobytes. However, if my memory serves me well, vm.memory.size[total] is in Bytes. Therefore, you would have to be using a LOT of kbytes of memory to be more than 70% of your memory in Bytes .

      A simple thing to do would be to multiply the value of check_health[mem,java,rss] by 1024 to get the value in bytes (assuming it is in kilobyes).

      Comment

      • groknaut
        Junior Member
        • Feb 2012
        • 7

        #4
        yep, i'm definately getting the data in zabbix for both measurements.

        and AHA! units! i just fixed that bit by changing how check_health reports the data. it now reports it in bytes, which is how vm.memory.size[total] is measured.

        but my trigger still isn't firing. i've even changed it to fire at 20%, but no joy.
        any help?

        trigger expression:
        Code:
        {Template_Java_OS_Resources:check_health[mem,java,rss].last(0)}>({Template_Linux:vm.memory.size[total].last(0)}*0.2)
        right now, java pid's RSS is 63% of total mem:
        Code:
        4749471744 / 7524098048 = .63123469599953836220
        evidence on the wire of what's being sent:

        Code:
        [root@webapp ~]# tcpflow -c port 10050 | egrep -A 4 'vm.memory.size\[total\]|check_health'
        tcpflow[21279]: listening on eth0
        010.0xx.038.143.50816-010.192.103.031.10050: check_health[mem,java,rss]
        
        010.1xx.013.031.10050-010.0xx.018.143.50816: ZBXD.
        010.1xx.013.031.10050-010.0xx.018.143.50816:
        .......4749471744
        --
        010.0xx.018.143.54239-010.1xx.013.031.10050: vm.memory.size[total]
        
        010.1xx.103.031.10050-010.0xx.018.143.54239: ZBXD.
        010.1xx.103.031.10050-010.0xx.018.143.54239:
        .......7524098048
        Last edited by groknaut; 03-05-2012, 01:08. Reason: typo

        Comment

        • heaje
          Senior Member
          Zabbix Certified Specialist
          • Sep 2009
          • 325

          #5
          Your check looks good based on what you posted. The last thing I can think to try is to modify your trigger slightly. Try changing it to look like this (it accomplishes the same thing):

          100*({Template_Java_OS_Resources:check_health[mem,java,rss].last(0)}/{Template_Linux:vm.memory.size[total].last(0)})>20

          I know it's just another way of calculating the same thing, but I have many checks that use that exact same logic for triggers (and they work fine).

          Comment

          • groknaut
            Junior Member
            • Feb 2012
            • 7

            #6
            i tried what you suggested, but that too is not working.

            anyone have more ideas?

            is there a server debugging level that would expose how this logic is playing out? i don't think the standard debug level on the server shows me this.

            Comment

            • groknaut
              Junior Member
              • Feb 2012
              • 7

              #7
              still not working. and i wonder if i'm running into this unsolved bug:

              Comment

              Working...