Ad Widget

Collapse

zabbix server process 100% cpu every 5 hours?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Tristan
    Senior Member
    • Feb 2008
    • 110

    #16
    problem is finaly gone :S

    a few days ago i wrote that i have put my zabbix server on a hardware box insteed of a virtual machine on a esx farm. I thought that the problem was gone. After a day it was back. To bad, so i have disabled my housekeeping. It was just a guess but the problem was gone.

    last week i have deleted a lot of items in my templates and removed a view hosts.

    At thursday i have enabled the housekeeping again and rebooted the server. How strange: The problem is gone. I think there was something in my database that was f*cking up the whole thing.

    so try to disable your housekeeping and see if you have the same problem.

    Comment

    • Alexei
      Founder, CEO
      Zabbix Certified Trainer
      Zabbix Certified SpecialistZabbix Certified Professional
      • Sep 2004
      • 5654

      #17
      I still think that the problem does not exist on 1.4.5. You must be running non-1.4.5 version of ZABBIX server when it happened.
      Alexei Vladishev
      Creator of Zabbix, Product manager
      New York | Tokyo | Riga
      My Twitter

      Comment

      • stefanw
        Junior Member
        • Aug 2007
        • 7

        #18
        well, I did run ./configure and then make install
        I only have one file called zabbix_server on the system and when I run:

        # whereis zabbix_server
        zabbix_server: /usr/local/sbin/zabbix_server
        # zabbix_server -V
        ZABBIX Server (daemon) v1.4.5 (25 March 2008)
        Compilation time: Apr 9 2008 10:15:10


        so how could it not be 1.4.5?

        Comment

        • stefanw
          Junior Member
          • Aug 2007
          • 7

          #19
          I still have this problem in 1.4.5,

          To be sure that I didnt used some old 1.4.x code I removed all old source directories, did a make clean, a new configure and then a new make install.
          This time the zabbix_server was running for about 10 days before it happend.

          # zabbix_server -V
          ZABBIX Server (daemon) v1.4.5 (25 March 2008)
          Compilation time: Apr 15 2008 09:05:33


          # strace -p 3289
          Process 3289 attached - interrupt to quit
          --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
          --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})
          recvfrom(0, 0x67be00, 65536, 0, 0x638010, 0x7fff347b2aac) = -1 ENOTSOCK (Socket operation on non-socket)
          select(1, [0], NULL, NULL, {0, 0}) = 1 (in [0], left {0, 0})

          Comment

          • stefanw
            Junior Member
            • Aug 2007
            • 7

            #20
            This is starting to become very annoying, right now I see that we have 3 zabbix_server process eating 100% cpu (on a 4 processor server). This will force me to restart the zabbix_server process each week which then make any comments the operators have writen about current problems disapear which also is an annoying behaviour.

            So do you have any insight on how to troubleshoot this problem?

            System is 4 CPU server 16 GB Memory running SUSE Enterprise Server 10 SP1.

            We poll about 1800+ hosts and we have about 8000+ triggers.

            We have no other performance issues on the server. All db queries and webserver is running just fine.

            # mysqladmin status
            Uptime: 1206622 Threads: 16 Questions: 585036768 Slow queries: 8049 Opens: 10251 Flush tables: 15 Open tables: 109 Queries per second avg: 484.855

            Comment

            • Tristan
              Senior Member
              • Feb 2008
              • 110

              #21
              Originally posted by stefanw
              This is starting to become very annoying, right now I see that we have 3 zabbix_server process eating 100% cpu (on a 4 processor server). This will force me to restart the zabbix_server process each week which then make any comments the operators have writen about current problems disapear which also is an annoying behaviour.

              So do you have any insight on how to troubleshoot this problem?

              System is 4 CPU server 16 GB Memory running SUSE Enterprise Server 10 SP1.

              We poll about 1800+ hosts and we have about 8000+ triggers.

              We have no other performance issues on the server. All db queries and webserver is running just fine.

              # mysqladmin status
              Uptime: 1206622 Threads: 16 Questions: 585036768 Slow queries: 8049 Opens: 10251 Flush tables: 15 Open tables: 109 Queries per second avg: 484.855
              strange... I had the same problem and it's just disappeared. The only thing what i have done was disable my housekeeping for a week. I found out that the problem was gone. After that i have deleted a few hosts and items from my database. a while a go i have enabled my housekeeping and the problem didn't return.
              strange....

              b.t.w i yust saw that my zabbix_server didn't collect some data. But my cpu was working normal. just a simple zabbix crash i thought.

              I use sles 10 sp1 also!

              Comment

              • Tristan
                Senior Member
                • Feb 2008
                • 110

                #22
                Originally posted by Tristan
                strange... I had the same problem and it's just disappeared. The only thing what i have done was disable my housekeeping for a week. I found out that the problem was gone. After that i have deleted a few hosts and items from my database. a while a go i have enabled my housekeeping and the problem didn't return.
                strange....

                b.t.w i yust saw that my zabbix_server didn't collect some data. But my cpu was working normal. just a simple zabbix crash i thought.

                I use sles 10 sp1 also!
                I thought that my cpu was running normal. when i look at a graph at zabbix the system time was 25%. strange, because normally the problem exists at 100% system time.
                I'am gonna write a script that restart zabbix_server when mysql is not accepting data.

                Comment

                • Robert Wagnon
                  Member
                  • Jan 2008
                  • 47

                  #23
                  Problem in 1.4.5

                  I'm also having this problem. I posted diagnostic information in a separate posting searchable by CPU or ENOTSOCK.

                  I thought it might be VM related, but rebuilt on physical machine. I've done updatedb and locate, and which and cannot find a non 1.4.5 version. I've rebuilt from nightlies.

                  I was thinking it might be related to Cisco router NAT handling, but tweaked that a lot, so I don't think it is that...

                  I just restart the services a few times a day...

                  Comment

                  • Tristan
                    Senior Member
                    • Feb 2008
                    • 110

                    #24
                    my solution....

                    I found out that it happens sometimes. (once in the month). So i have used a script that sends me an email if the zabbix process crashed and use 90% of
                    cpu usage

                    you can modify this script so that is restart the service!

                    edit your mail adress and put this in a crontab. I hope that the problem is gone in version 1.6.

                    #!/bin/bash

                    # March-13-2006
                    # CPUuse trigger script by Noel
                    #
                    # bash code to watch a running program's CPU usage.
                    # if it's above a set value, it will auto send an email.
                    # You will need to set a Cron job to run this script every xx minutes
                    #
                    # Set some needed things:
                    #
                    processToWatch="convert" # in my case I need to watch convert
                    emailAddress="[email protected]" # this is my main emailaddress
                    triggerValue=90 # if the CPU use is above 90% send an email. DO NOT USE a DOT or COMMA!
                    tempFileName=tmp-cpu # some name of the temp file for the ps, grep data

                    ps auxww | grep "$processToWatch" | grep -v grep > /tmp/$tempFileName
                    export LINE
                    (
                    read LINE
                    while [ -n "$LINE" ]
                    do
                    set $LINE
                    read LINE
                    if [ $(echo "$3" | sed -e 's/\.[0-9]*//g') -gt $triggerValue ]; then
                    mail -s "CPU message alert for: $processToWatch" $emailAddress <<-END
                    This is to inform you that the following process: $processToWatch with PID (Process ID) $2 is now using more than your preset $triggerValue value.

                    Process: $processToWatch is using: $3 of CPU power!
                    The command used is: $11
                    END
                    fi
                    done
                    )< /tmp/$tempFileName

                    Comment

                    • Alexei
                      Founder, CEO
                      Zabbix Certified Trainer
                      Zabbix Certified SpecialistZabbix Certified Professional
                      • Sep 2004
                      • 5654

                      #25
                      The problem has been fixed in the latest 1.4.x code. Thanks for all the details!
                      Alexei Vladishev
                      Creator of Zabbix, Product manager
                      New York | Tokyo | Riga
                      My Twitter

                      Comment

                      • bbrendon
                        Senior Member
                        • Sep 2005
                        • 870

                        #26
                        What causes this bug to come alive? I'd like to know so I can determine if a recompile is necessary.

                        tia
                        Unofficial Zabbix Expert
                        Blog, Corporate Site

                        Comment

                        • stefanw
                          Junior Member
                          • Aug 2007
                          • 7

                          #27
                          This problem is occuring less and less frequent for me now, Soon 4 weeks now. Getting my hops up thats its gone forever.

                          Comment

                          • Robert Wagnon
                            Member
                            • Jan 2008
                            • 47

                            #28
                            Source of problem determined

                            We found that it seemed to be caused by a printer with SNMP capability. When Zabbix queries that printer, it eventually stops working correctly. We told the system not to query this device and the problem went away.

                            Comment

                            Working...