Ad Widget

Collapse

Agent mysteriously dies

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • bdub
    Junior Member
    • Apr 2006
    • 24

    #1

    Agent mysteriously dies

    Hello everyone,

    I'm running 1.1b9 with agentd running on several servers, all of them debian, and out of 10 servers, only one is giving me any issues. I'm trying to monitor apache processes (memory used by httpd, how many processes, etc, mostly by watching the www-user) on this particular server, and this is the only server experiencing this problem.

    The agentd will run for 8-10 hours without a problem, then myseriously in the middle of the night. I've dropped logging into debug mode and attached the last 20 lines of the log. I promise you're not missing anything spectacular from the last bajillion lines

    brian@web1:/tmp$ tail -n20 zabbix_agentd.log
    016847:20060509:140416 Before read()
    016847:20060509:140416 After read() 2 [22]
    016847:20060509:140416 Got line:net.if.in[eth0,bytes]
    016847:20060509:140416 Sending back:1901590598
    016841:20060509:140416 In check_security()
    016841:20060509:140416 Connection from [10.5.5.10]. Allowed servers [10.5.5.10]
    016841:20060509:140416 Before read()
    016841:20060509:140416 After read() 2 [17]
    016841:20060509:140416 Got line:net.if.out[eth0]
    016841:20060509:140416 Sending back:3583571922
    016845:20060509:140416 In check_security()
    016845:20060509:140416 Connection from [10.5.5.10]. Allowed servers [10.5.5.10]
    016845:20060509:140416 Before read()
    016845:20060509:140416 After read() 2 [20]
    016845:20060509:140416 Got lineroc.mem[,www-data]
    016840:20060509:140416 One child process died. Exiting ...
    016840:20060509:140416 Got signal. Exiting ...
    016841:20060509:140416 Got signal. Exiting ...
    016846:20060509:140416 Got signal. Exiting ...
    016847:20060509:140416 Got signal. Exiting ...
    Thanks for any help you guys can throw my way.


    brian
  • bdub
    Junior Member
    • Apr 2006
    • 24

    #2
    just giving this thread a bump as I'm still having issues with it...

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #3
      Thanks for reporting this. We're working on this issue.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • krusty
        Senior Member
        • Oct 2005
        • 222

        #4
        I have the same issue.

        I have still updated our zabbix server from 1.1beta6 to 1.1beta8 and know 1.1beta9. Nearly every day the zabbix_agentd process died. But the log file shows no errors. Take a look at example file.
        012550:20060510:091045 Got signal. Exiting ...
        012656:20060510:091046 zabbix_agentd started. ZABBIX 1.1beta9.
        012657:20060510:091046 zabbix_agentd 12657 started
        012658:20060510:091046 zabbix_agentd 12658 started
        012659:20060510:091046 zabbix_agentd 12659 started
        012660:20060510:091046 zabbix_agentd 12660 started
        012661:20060510:091046 zabbix_agentd 12661 started
        012656:20060512:115953 Got signal. Exiting ...
        012661:20060512:115953 Got signal. Exiting ...
        017615:20060512:120001 zabbix_agentd started. ZABBIX 1.1beta9.

        Comment

        • Alexei
          Founder, CEO
          Zabbix Certified Trainer
          Zabbix Certified SpecialistZabbix Certified Professional
          • Sep 2004
          • 5654

          #5
          Please set DebugLevel=4 in zabbix_agentd.conf file and restart agent. Post extract of the debug file when agent crashes next time. Thanks.
          Alexei Vladishev
          Creator of Zabbix, Product manager
          New York | Tokyo | Riga
          My Twitter

          Comment

          • krusty
            Senior Member
            • Oct 2005
            • 222

            #6
            Hi Alexei, i have changed the debug level. Yesterday the agentd dies again.

            Look at the log file. I only post the time when the problem is shown. I have send the whole log file to your email account.

            Code:
            014686:20060517:160115 Connection from [127.0.0.1]. Allowed servers [127.0.0.1] 
            014686:20060517:160115 Before read()
            014686:20060517:160115 After read() 2 [26]
            014686:20060517:160115 Got line:proc.mem[zabbix_server,,]
            014686:20060517:160115 Sending back:217108480
            014686:20060517:160115 In check_security()
            014686:20060517:160115 Connection from [127.0.0.1]. Allowed servers [127.0.0.1] 
            014686:20060517:160115 Before read()
            014686:20060517:160115 After read() 2 [28]
            014686:20060517:160115 Got line:system.cpu.util[,idle,avg1]
            014686:20060517:160115 Sending back:66
            014686:20060517:160116 In check_security()
            014686:20060517:160116 Connection from [127.0.0.1]. Allowed servers [127.0.0.1] 
            014686:20060517:160116 Before read()
            014686:20060517:160116 After read() 2 [26]
            014686:20060517:160116 Got line:proc.mem[zabbix_agentd,,]
            014687:20060517:160156 Sleeping for 60 seconds
            014687:20060517:160256 In refresh_metrics()
            014687:20060517:160256 get_active_checks: host[127.0.0.1] port[10051]
            014687:20060517:160256 Sending [ZBX_GET_ACTIVE_CHECKS
            localhost
            ]
            014687:20060517:160256 Before read
            014687:20060517:160256 In delete_all_metrics()
            014687:20060517:160256 Parsed [ZBX_EOF]
            014687:20060517:160256 Sleeping for 60 seconds
            014687:20060517:160356 Sleeping for 60 seconds
            014687:20060517:160456 In refresh_metrics()
            014687:20060517:160456 get_active_checks: host[127.0.0.1] port[10051]
            014687:20060517:160456 Sending [ZBX_GET_ACTIVE_CHECKS
            localhost
            ]
            014687:20060517:160456 Before read
            014687:20060517:160456 In delete_all_metrics()
            014687:20060517:160456 Parsed [ZBX_EOF]
            014687:20060517:160456 Sleeping for 60 seconds
            014687:20060517:160556 Sleeping for 60 seconds
            014687:20060517:160656 In refresh_metrics()
            014687:20060517:160656 get_active_checks: host[127.0.0.1] port[10051]
            014687:20060517:160656 Sending [ZBX_GET_ACTIVE_CHECKS
            localhost
            ]
            014687:20060517:160656 Before read
            014687:20060517:160656 In delete_all_metrics()
            014687:20060517:160656 Parsed [ZBX_EOF]
            014687:20060517:160656 Sleeping for 60 seconds
            014687:20060517:160756 Sleeping for 60 seconds
            014687:20060517:160856 In refresh_metrics()
            014687:20060517:160856 get_active_checks: host[127.0.0.1] port[10051]
            014687:20060517:160856 Sending [ZBX_GET_ACTIVE_CHECKS
            localhost
            ]
            014687:20060517:160856 Before read
            014687:20060517:160856 In delete_all_metrics()
            014687:20060517:160856 Parsed [ZBX_EOF]
            014687:20060517:160856 Sleeping for 60 seconds
            014687:20060517:160956 Sleeping for 60 seconds
            014687:20060517:161056 In refresh_metrics()
            014687:20060517:161056 get_active_checks: host[127.0.0.1] port[10051]
            014687:20060517:161056 Sending [ZBX_GET_ACTIVE_CHECKS
            localhost
            ]
            I can't see any problems in log file. Any ideas

            Comment

            Working...