Ad Widget

Collapse

agent active check losing

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Dmitriy Kirhlarov
    Member
    • Jun 2007
    • 40

    #1

    agent active check losing

    I use zabbix agent active checks for get disk space information (parsing 'df' output).
    Problem is -- one volume periodicaly get "Unsupported" state.

    zabbix_agentd.conf:
    UserParameter=vfs.fs.free[*],df | sed -nE 's@^([^ ]+)([ ]+)([^ ]+)([ ]+)([^ ]+)([ ]+)([^ ]+)([ ]+)([^ ]+)([ ]+)$1$@\7@p'

    zabbix_agent.log:
    66479:20080206:091646 For key [vfs.fs.size[/data,free]] received value [295453184]
    66479:20080206:091646 XML before sending [<req><host>aW5mcmEubWdtdC52ZWdhLnJ1</host><key>dmZzLmZzLnNpemVbL2RhdGEsZnJlZV0=</key><data>Mjk1NDUzMTg0</data></req>]
    66479:20080206:091646 OK
    66479:20080206:091646 In get_min_nextcheck()
    66479:20080206:091646 Sleeping for 57 seconds
    66479:20080206:091743 In refresh_metrics('10.25.0.254',10051)
    66479:20080206:091743 get_active_checks('10.25.0.254',10051)
    66479:20080206:091743 Sending [ZBX_GET_ACTIVE_CHECKS
    infra.mgmt.vega.ru
    ]
    66479:20080206:091743 Before read

    zabbix_server.log:
    73203:20080206:091653 Active parameter [vfs.fs.size[/data,free]] is not supported by agent on host [infra.mgmt.vega.ru]
    73205:20080206:091653 Active parameter [vfs.fs.size[/data,used]] is not supported by agent on host [infra.mgmt.vega.ru]

    Last value of "vfs.fs.size[/data,free]" on server (I use UTC timezone on servers and MSK on my workstation):
    2008-02-06 12:16:46 1202289406 302544060416

    I.e. -- zabbix_server get correct value from agent, add it to database and, after that, mark item as "unsupported".

    My system:
    infra# uname -rs; pkg_info -Ix zabbix; pkg_info -Ix postgres
    FreeBSD 7.0-20071107-SNAP
    zabbix-1.4.4,1 Application and network monitoring solution
    zabbix-agent-1.4.4,1 Application and network monitoring solution
    postgresql-client-8.2.5_1 PostgreSQL database (client)
    postgresql-server-8.2.5_2 The most advanced open-source database available anywhere

    I need zabbix and can give any extended information.
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    Perhaps value type of the item is not compatible with the received values?
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • Dmitriy Kirhlarov
      Member
      • Jun 2007
      • 40

      #3
      Numeric (integer 64bit).
      Correctly work for others volumes and for volume '/data' as 'ZABBIX agent' (not 'ZABBIX agent (active)')

      Comment

      • Dmitriy Kirhlarov
        Member
        • Jun 2007
        • 40

        #4
        same behaviour with attached log item.

        zabbix_agentd.log:
        75136:20080206:130943 In process log (/var/log/rsnapshot.log,96)
        75136:20080206:130943 In get_min_nextcheck()
        75136:20080206:130943 Sleeping for 9 seconds
        75136:20080206:130952 In refresh_metrics('10.25.0.254',10051)
        75136:20080206:130952 get_active_checks('10.25.0.254',10051)
        75136:20080206:130952 Sending [ZBX_GET_ACTIVE_CHECKS
        infra.mgmt.vega.ru
        ]
        75136:20080206:130952 Before read
        75136:20080206:130952 In parse_list_of_checks() [vfs.dev.gmirror:600:0
        log[/var/log/maillog,SYSERR]:10:0
        vfs.dev.zpool:600:0
        system.uptime:60:0
        ZBX_EOF
        ]
        75136:20080206:130952 In disable_all_metrics()

        zabbix_server.log:
        73205:20080206:130818 Active parameter [log[/var/log/rsnapshot.log,ERROR]] is not supported by agent on host [infra.mgmt.vega.ru]
        Attached Files

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Originally posted by Dmitriy Kirhlarov
          Problem is -- one volume periodicaly get "Unsupported" state.
          Well, this means that the user parameter periodically fails. It could be because of timeouts.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • Dmitriy Kirhlarov
            Member
            • Jun 2007
            • 40

            #6
            It's strange, because zabbix_server and zabbix_agent work on same host, and CPU Idle ~ 85.

            Comment

            • Alexei
              Founder, CEO
              Zabbix Certified Trainer
              Zabbix Certified SpecialistZabbix Certified Professional
              • Sep 2004
              • 5654

              #7
              I see nothing strange here. The df command can be slow!
              Alexei Vladishev
              Creator of Zabbix, Product manager
              New York | Tokyo | Riga
              My Twitter

              Comment

              • Dmitriy Kirhlarov
                Member
                • Jun 2007
                • 40

                #8
                I add Timeout=10 to zabbix_agentd.conf
                It doesn't help.

                Any other hints?

                Comment

                • Dmitriy Kirhlarov
                  Member
                  • Jun 2007
                  • 40

                  #9
                  After enabling log monitoring, two times I get situation, when zabbix_server eat 100% CPU. After restart zabbix_server CPU Idle return to normal ~85%

                  What I did:
                  1. looks to top
                  2. ps -auxww | grep $PID
                  zabbix 86125 86.9 0.1 28028 1420 ?? RN 7:21PM 827:46.87 zabbix_server: processing data (zabbix_server)

                  Comment

                  • Dmitriy Kirhlarov
                    Member
                    • Jun 2007
                    • 40

                    #10
                    Originally posted by Dmitriy Kirhlarov
                    I add Timeout=10 to zabbix_agentd.conf
                    It doesn't help.

                    Any other hints?
                    Issue fixed now.
                    Stupid error -- several hosts has identical hostname in zabbix_agentd.conf

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #11
                      Thanks for the follow up!
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      Working...