Ad Widget

Collapse

zabbix_agentd is dying

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • zeratul
    Junior Member
    • Jun 2006
    • 7

    #1

    zabbix_agentd is dying

    Hi

    I installed the server on a CentOS 4.2 machine and is working well, but I have problems with zabbix_agentd which are dying on various other servers where I installed this software. There is no specific hour or event when this is happening (I tried to make a connection between the agent death and a cron job or something, but they are dying randomly).
    The list of servers includes Fedora Core 2 and 4, CentOS 4.2 and Mandriva 2006. I tried the agents from 1.1 stable, 1.1 beta 11 and 1.1 beta 12. The servers are working well - they are production machines - and I have no reason to suspect something wrong on this side.

    The agents are dying in 2 ways:
    - the zabbix_agentd processes just vanish without any error in log files (btw, why /tmp/zabbix_agentd.log is always empty even with DebugLevel=4 in /etc/zabbix/zabbix_agentd.conf ?)
    - the zabbix_agentd processes still exist, but the server can't get any data from them, and the syslog file is showing "*** glibc detected *** double free or corruption". I must use "killall -9 ..." to get rid of the precesses, then I must wait 1-2 minutes until I can start again zabbix_agentd. I found something related to "export MALLOC_CHECK_=0" but it does not seems to help in any way.

    I even tried a precompiled binary for debian. It's working fine, from the data collecting point of view, but is dying like the rest.

    Maybe there is something obvious that I'm missing and I'm doing wrong on all servers. Anyone has any idea ? Or, someone who has a running zabbix_agentd compiled on/for a system like the above systems, can send me the binary?
    Thank you.

    Daniel
  • Alexei
    Founder, CEO
    Zabbix Certified Trainer
    Zabbix Certified SpecialistZabbix Certified Professional
    • Sep 2004
    • 5654

    #2
    It seems that there is a problem with ZABBIX agent when doing proc.mem under Linux. We even managed to reproduce the problem once or twice but still didn't see why it happens.

    Give us some time to have it fixed.
    Alexei Vladishev
    Creator of Zabbix, Product manager
    New York | Tokyo | Riga
    My Twitter

    Comment

    • zeratul
      Junior Member
      • Jun 2006
      • 7

      #3
      Originally posted by Alexei
      It seems that there is a problem with ZABBIX agent when doing proc.mem under Linux.
      Thank you for your answer. Until you'll find a solution, there is an option for me to disable the item "proc.mem" ?

      Daniel

      Comment

      • Alexei
        Founder, CEO
        Zabbix Certified Trainer
        Zabbix Certified SpecialistZabbix Certified Professional
        • Sep 2004
        • 5654

        #4
        You may disable monitoring of all items having key proc.mem[...]. Let me know if it makes agent stay alive.
        Alexei Vladishev
        Creator of Zabbix, Product manager
        New York | Tokyo | Riga
        My Twitter

        Comment

        • zeratul
          Junior Member
          • Jun 2006
          • 7

          #5
          Originally posted by Alexei
          You may disable monitoring of all items having key proc.mem[...]. Let me know if it makes agent stay alive.
          You're right. Since I got your answer, until today, no agent died anymore, after I disabled all items having key proc.mem[].

          I found the reason for not having any log in /tmp/zabbix_agentd.log . The file is created by zabbix_agentd startup, with user.group=root.root, with permission 644 :
          [root@server tmp]# ls -l /tmp/zabbix_agentd.log
          -rw-r--r-- 1 root root 0 Jun 26 17:54 /tmp/zabbix_agentd.log

          Zabbix_agentd, is trying to write in this file as user zabbix and this fails. So, I changed the owner of the file and now I have also logs for the agent.

          Daniel

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            Thanks for your post. We are still trying to understand why it crashes while evaluating proc.mem. No progress so far.
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            • bgd
              Junior Member
              • May 2006
              • 9

              #7
              The problem is probably in common.c->process()

              Hi Alexei,

              I've had this problem as well. The last lines in the Agent's log file are:
              032351:20060627:110001 Got lineroc.mem[zabbix_agentd]
              032350:20060627:110001 One child process died. Exiting ...
              032352:20060627:110001 Got signal. Exiting ...
              032353:20060627:110001 Got signal. Exiting ...

              "Got line:" is probably in the zabbix_agentd.c -> process_child() function, so the problem is most likely in common.c -> process() (the function called immediately after the 'Got line' entry in the log file).

              Hope this helps,
              bogdan

              Comment

              • Alexei
                Founder, CEO
                Zabbix Certified Trainer
                Zabbix Certified SpecialistZabbix Certified Professional
                • Sep 2004
                • 5654

                #8
                Fixed, finally we found what's wrong

                See http://www.zabbix.com/forum/showthread.php?t=3317
                Alexei Vladishev
                Creator of Zabbix, Product manager
                New York | Tokyo | Riga
                My Twitter

                Comment

                Working...