Ad Widget

Collapse

How to debug Zabbix agent on AIX

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • geniepage
    Member
    • Sep 2015
    • 34

    #1

    How to debug Zabbix agent on AIX

    Hello Zabbix community,
    I'm trying to use zabbix as our main monitoring tool, but I found the problem.
    Zabbix agent on aix have problem collect data about cpu. On some aix server not and on some aix server it is ok.
    Aix version is the same:
    7100-02-02-1316
    The same server - different lpars
    Performance colletion is allowed also.
    lparstat -h 5

    System configuration: type=Shared mode=Uncapped smt=4 lcpu=16 mem=36864MB psize=4 ent=2.00

    %user %sys %wait %idle physc %entc lbusy >>app<< vcsw phint %hypv hcalls
    ----- ----- ------ ------ ----- ----- ------ --- ----- ----- ------ ------
    1.3 2.7 0.0 96.0 0.13 6.6 1.0 3.77 858 0 4.0 3673
    0.5 3.1 0.0 96.4 0.12 6.1 0.8 3.40 981 2 4.3 3174

    But in zabbix server I see next to problematic item:
    No data available in collector.
    And no info in zabbix server logs.
    In agent log on debug 4 I see only:
    9502956:20150910:133550.721 listener #1 [processing request]
    9502956:20150910:133550.721 Requested [system.stat[cpu,ec]]
    20971696:20150910:133550.726 listener #2 [processing request]
    20971696:20150910:133550.726 Requested [vfs.fs.inode[/sapmnt/HBP,pfree]]
    20971696:20150910:133550.727 Sending back [96.274226]
    20971696:20150910:133550.727 listener #2 [waiting for connection]
    15270034:20150910:133550.768 In send_buffer() host:'192.168.25.201' port:10051 values:0/100
    15270034:20150910:133550.768 End of send_buffer():SUCCEED
    15270034:20150910:133550.768 active checks #1 [idle 1 sec]
    23003288:20150910:133550.807 collector [processing data]
    23003288:20150910:133550.807 In update_cpustats()
    23003288:20150910:133550.807 End of update_cpustats()

    Nothing else.
    zabbix get output is empty, but with error code = 0.
    #./zabbix_get -s 127.0.0.1 -p 10050 -k system.stat[cpu,ec]

    # echo $?
    0
    #

    I can do trace or truss from zabbxi agent, but I don't know what to find.
    Do you have any one how to find it what is wrong?

    Thanks a lot
  • geniepage
    Member
    • Sep 2015
    • 34

    #2
    Zabbix hanged on aix

    Hello community,
    I found the problem with zabbix agent on aix version 7100-02-02-1316.
    I see this output:
    ps -ef |grep zabbix
    zabbix 9502958 1 0 Sep 11 - 0:00 /opt/zabbix/sbin/zabbix_agentd -c /opt/zabbix/conf/zabbix_agentd.conf
    zabbix 15270036 9502958 0 Sep 11 - 0:28 /opt/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
    zabbix 15401134 9502958 0 Sep 11 - 0:28 /opt/zabbix/sbin/zabbix_agentd: listener #1 [processing request]
    zabbix 16646384 9502958 0 Sep 11 - 0:15 /opt/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
    zabbix 20971698 9502958 0 Sep 11 - 47:41 /opt/zabbix/sbin/zabbix_agentd: collector [processing data]
    zabbix 23003296 9502958 0 Sep 11 - 0:28 /opt/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
    root 24969314 24379514 0 11:17:35 pts/0 0:00 grep zabbix
    zabbix 25165846 15401134 20 0:00 <defunct>

    <defunct> is still the same. The same pid.

    Do you know how to check what is wrong in agent?

    Thanks

    Comment

    • kloczek
      Senior Member
      • Jun 2006
      • 1771

      #3
      Originally posted by geniepage
      zabbix get output is empty, but with error code = 0.
      #./zabbix_get -s 127.0.0.1 -p 10050 -k system.stat[cpu,ec]

      # echo $?
      0
      #
      RTFM. There is no such key like system.stat[cpu,ec] (look on https://www.zabbix.com/documentation...s#zabbix_agent)
      Try to use system.stat[ec]
      http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
      https://kloczek.wordpress.com/
      zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
      My zabbix templates https://github.com/kloczek/zabbix-templates

      Comment

      • geniepage
        Member
        • Sep 2015
        • 34

        #4
        Ok,
        but
        =====
        CPU entitled capacity consumed system.stat[cpu,ec] 60 7 365 Zabbix agent CPU, Logical partitions, Performance Enabled
        =====
        is from Default template for aix.

        Adn also definition is:
        system.stat[resource,<type>] => system.stat[cpu,ec] looks ok.

        What I wanna to find: How agent collect this info to check what is wrong. If is called some C code in kernel or exec on external command in aix etc.
        Last edited by geniepage; 16-09-2015, 19:25.

        Comment

        • geniepage
          Member
          • Sep 2015
          • 34

          #5
          Also I wanna to find why some process of zabbix hang in defunct state. It is very often on aix. I did script which cleaning this problem, but it is not final solution. Better to have stable zabbix agent on aix.

          Comment

          • geniepage
            Member
            • Sep 2015
            • 34

            #6
            And few days back I found that Zabbix agent on aix also "sleeping" and not answering to server or to zabbix_get command. From truss I see that process is sleeping in kernel kode semop. Have anyone the same behavour on aix 7.1 ? Do you know how to handle it or what is the reason for that? Actually I have script as watchdog to clear these problematic states.

            Comment

            • geniepage
              Member
              • Sep 2015
              • 34

              #7
              Hint to find Root Cause

              Hello guys,
              no body know how to debug zabbix agent which have trouble on aix and "hanging" = is zombie, not actining as alive agent or crashed etc. ?
              Thanks for any hints how to start debug of this problem.

              Comment

              • darkwolf29a
                Junior Member
                • Nov 2015
                • 12

                #8
                We're using an older version, 1.8, so we don't actually have a client installed on the AIX servers. We just use a zabbix_get and a few scripts to get our data back to the Zabbix server.

                I wonder if you need that client at all?

                Comment

                • geniepage
                  Member
                  • Sep 2015
                  • 34

                  #9
                  Zabbix agent problem on aix

                  Ok. I can use cron and sending data via sender. But benefit of agent is that I can schedule data collection via template and I have the same concept on all OS. The same on linux or windows. It is the best for me. Simply install agent with all necessary scripts, add cutomised configuration to zabbix_agentd.conf.d/*.conf and running. It is more simple than doing it via cron etc.

                  And also I'm developed many low level discovery for aix and vios servers. It is easy way how to monitor servers where all can be modified. And agent is simple way how to manage it.

                  It is reason why I'm looking for solution how to debug zabbix agent on aix and find where is the problem. Why it hanged in zombie state or is crashed or consume more cpu than normally or is "hanged" and not react on any command. I written script which checking these states and restart zabbix agent, but it occure only on some server and only sometimes.

                  I hope that answer is enough.

                  Back to problem.
                  How know how to debug this problem on aix when agent hang? I can provide all: process dump, kdb from kernel etc. No problem, but I don't know what I have to find or where to look.

                  Thanks for help.
                  Last edited by geniepage; 27-11-2015, 11:16.

                  Comment

                  • yasith
                    Junior Member
                    • Aug 2019
                    • 3

                    #10
                    Dear geniepage,

                    I also faced to this problem right now. In my environment, we monitor 20 AIX 7 servers. On Zabbix server most of them collect all the data. But some of them not showing any of cpu data. Did you fix the issue?

                    If you fix this please give me some feedback to fix this.

                    Thanks,
                    Best Regards,
                    Yasith.

                    Comment

                    Working...