Ad Widget

Collapse

[1.4.4] zabbix_server doesn't crash, but no longer collects data

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pascalp
    Junior Member
    • Jan 2008
    • 5

    #1

    [1.4.4] zabbix_server doesn't crash, but no longer collects data

    Hello,

    I have searched the existing threads, but so far didn't find a match. So I opened a new thread. If I'm mistaken and there is a thread already handling this, please do not hit me

    My problem: Sometimes when I start zabbix_server, after a few minutes, the server stops collecting data. When I look at the graph for e.g. the processor load of a client, it simply stops at a certain moment. But the zabbix_server process continous running. At the same time, when the server stops collecting data, the processor load of the machine on which zabbix_server is running, falls down to 0.40 +- which is lower than the value when zabbix_server is collecting data.
    All my clients are running zabbix_agentd with active checks. There are about 15 clients, with each having +- 15 items. When zabbix_server is collecting data, the average load never goes higher than 2.
    The server that runs zabbix_server is virtual Debian. 1500 MHz (Xeon) guaranteed, 800MB RAM. Virtualization software is Virtuozzo.

    Does anyone have seen the same problem or is it probably a fault of mine?

    Regards,
    Pascal

    N.B. I forgot to say that recently I fixed the bug described in http://www.zabbix.com/forum/showthread.php?t=8703 though I have no idea if this has something with the actual problem..
    Last edited by pascalp; 23-01-2008, 17:08.
  • Petya
    Member
    • Dec 2007
    • 37

    #2
    Do you mean "Zabbix Agent (active)" when saying
    "zabbix_agentd with active checks"?

    If yes then I'm another one who have similar problem,
    (I don't have such problem when items are of type "Zabbix Agent").

    Try changing item types (you can use "Mass update" button) --
    this works well when you have not many hosts (and it's the default actually).

    Also there's similar issue here:

    Comment

    • pascalp
      Junior Member
      • Jan 2008
      • 5

      #3
      Originally posted by Petya
      Do you mean "Zabbix Agent (active)" when saying
      "zabbix_agentd with active checks"?

      If yes then I'm another one who have similar problem,
      (I don't have such problem when items are of type "Zabbix Agent").

      Try changing item types (you can use "Mass update" button) --
      this works well when you have not many hosts (and it's the default actually).
      Exactly, I can't use passive checks because all my servers are running behind routers for which I'm not responsible of the maintenance.

      Originally posted by Petya
      Also there's similar issue here:
      http://www.zabbix.com/forum/showthread.php?t=8718
      but in fact, my zabbix_server.log in /tmp is filled with the statement
      "(...) Error while sending list of active checks"
      And this message can be found in the place in the source code where the patch described in the thread http://www.zabbix.com/forum/showthread.php?t=8703 (this thread is mentionned in your link) is applied. Does the patch to fix the average load problem could cause this problem? I'm absolutely not sure because my server is throwing these messages already when it's still collecting data..

      Regards,
      Pascal
      Last edited by pascalp; 24-01-2008, 17:58.

      Comment

      • torti-
        Junior Member
        • Mar 2007
        • 18

        #4
        this is exactly the situation I have - did someone already solve this?

        Comment

        • xs-
          Senior Member
          Zabbix Certified Specialist
          • Dec 2007
          • 393

          #5
          I believe this is fixed in 1.4.5-pre (1.4.4 nightly build, on website -> developer)

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            It is fixed in pre 1.4.5.
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            • torti-
              Junior Member
              • Mar 2007
              • 18

              #7
              Well if the mentioned archive is http://www.zabbix.com/downloads/nigh...bix-1.4.tar.gz then it is not fixed

              zabbix_server still stops responding (and collecting data) without an error. the only thin I can see in the logs is:

              this is like I guess it should look when the server process is still ok:
              Code:
                2016:20080228:223009 In process_httptests()
                2016:20080228:223009 Query [select httptestid,name,applicationid,nextcheck,status,delay,macros,agent from httptest where status=0 and nextcheck<=1204234209 and mod(httptestid,5)=2 and  httptestid>=100000000000000*0 and httptestid<=(100000000000000*0+99999999999999) ]
                2016:20080228:223009 End process_httptests()
                2016:20080228:223009 Spent 0 seconds while processing HTTP tests
                2016:20080228:223009 Query [select count(*),min(nextcheck) from httptest t where t.status=0 and mod(t.httptestid,5)=2 and  t.httptestid>=100000000000000*0 and t.httptestid<=(100000000000000*0+99999999999999) ]
                2016:20080228:223009 Nextcheck:1204234259 Time:1204234209
                2016:20080228:223009 Sleeping for 5 seconds
              and this is what I get when the server process hangs:
              Code:
                2015:20080228:223009 In process_httptests()
                2015:20080228:223009 Query [select httptestid,name,applicationid,nextcheck,status,delay,macros,agent from httptest where status=0 and nextcheck<=1204234209 and mod(httptestid,5)=1 and  httptestid>=100000000000000*0 and httptestid<=(100000000000000*0+99999999999999) ]
                2015:20080228:223009 End process_httptests()
                2015:20080228:223009 Spent 0 seconds while processing HTTP tests
                2015:20080228:223009 Query [select count(*),min(nextcheck) from httptest t where t.status=0 and mod(t.httptestid,5)=1 and  t.httptestid>=100000000000000*0 and t.httptestid<=(100000000000000*0+99999999999999) ]
                2015:20080228:223009 No httptests to process in get_minnextcheck.
                2015:20080228:223009 Nextcheck:-1 Time:1204234209
                2015:20080228:223009 Sleeping for 5 seconds

              Comment

              • xs-
                Senior Member
                Zabbix Certified Specialist
                • Dec 2007
                • 393

                #8
                Heh, well yesterday we had a similar thing again.
                It very much looked like the problems we had before (trapper not receiving data) but this time the load was 0, no zabbix threads going haywire.

                After not finding anything to blame, we restarted zabbix_server (master node in a distributed setup) and all was well again.
                Shortly after that we saw one of the distributed nodes had its zabbix_server stopped (connection to db lost, local database, didnt stop). After inspection we saw it had stopped around the same time the master node stopped receiving data.

                Maybe this is related, maybe not. worth looking into tho.
                It might be possible the trapper part of zabbix can experience problems when another server node dies during a send or action (or vice versa).

                -- Edit
                We are running 1.4.5-pre on the main node, 1.4.4 on the child nodes
                Last edited by xs-; 18-03-2008, 12:47.

                Comment

                • torti-
                  Junior Member
                  • Mar 2007
                  • 18

                  #9
                  hm you might be right, that the problem is in the db-connection-part of zabbix.

                  I am currently not running a distributed setup of zabbix_server, so I don't think, that it is a problem related to multiple servers.

                  Comment

                  • bbrendon
                    Senior Member
                    • Sep 2005
                    • 870

                    #10


                    Seems to be related to the mysql server being very busy, which seems to sometimes be caused by the web monitoring, which I don't use in production so I delete all web monitoring.

                    We'll see if things improve. My zabbix has been down for the past week because of this.
                    Unofficial Zabbix Expert
                    Blog, Corporate Site

                    Comment

                    • torti-
                      Junior Member
                      • Mar 2007
                      • 18

                      #11
                      I have thought about that too and disabling web monitoring didn't help at all. I tried various 1.4.* versions including developer pre-1.4.5 from monday

                      actually the problem raised, when I started using active agents.

                      This is a major issue for me because at this point zabbix isn't useful at all if you need to use active agents and the zabbix_server process has stability issues

                      PLEASE fix this as soon as possible alexei

                      Comment

                      • bbrendon
                        Senior Member
                        • Sep 2005
                        • 870

                        #12
                        Originally posted by torti-
                        I have thought about that too and disabling web monitoring didn't help at all. I tried various 1.4.* versions including developer pre-1.4.5 from monday

                        actually the problem raised, when I started using active agents.

                        This is a major issue for me because at this point zabbix isn't useful at all if you need to use active agents and the zabbix_server process has stability issues

                        PLEASE fix this as soon as possible alexei
                        FYI:
                        - My zabbix seems to die between 3:50 AM and 4:10 AM (almost every night, but not quite)
                        - I only use active agents.
                        - I'm running 1.4.4 with the load patch
                        - I disabled web monitoring last night
                        - I looked at the mysql-slow logs and it seems that the problem is related to a busy mysql server
                        - Non-active agent related items appear to get data, while active agents don't. 90% of my system are active agent agent items though.
                        - I updated SNMP to 5.4.1 hoping it was SNMP lib related, recompiled, and no change
                        - server didn't stop recording data last night. We'll see how long it lasts...

                        Thats about it here.
                        Last edited by bbrendon; 19-03-2008, 18:43.
                        Unofficial Zabbix Expert
                        Blog, Corporate Site

                        Comment

                        • bbrendon
                          Senior Member
                          • Sep 2005
                          • 870

                          #13
                          Okay. I have a fix! ...You'll love it, I swear!

                          Code:
                          # tail -2 crontab
                          # disable zabbix actions before zabbix_server breaks at 4 AM
                          22 1 * * * root mysql --user=zabbix --password=mypass zabbix -e "update actions set status = 1"
                          Unofficial Zabbix Expert
                          Blog, Corporate Site

                          Comment

                          • torti-
                            Junior Member
                            • Mar 2007
                            • 18

                            #14
                            well that is not my definition of a 'fix'
                            last night it broke at 22:05 or so. Restarting the server process works fine but this is no solution for serious use of a program.

                            I have attached the server logfile with debuglevel 4. maybe someone more familiar with zabbix might look over it?

                            I'm not really sure that the server breaks everytime at the same time...

                            thanks,
                            torti-

                            ps:
                            please increase the maximal size of the zip attachment - Your file of 262.5 KB bytes exceeds the forum's limit of 97.7 KB for this filetype.
                            I have renamed the archive for now to .c
                            Attached Files

                            Comment

                            • bbrendon
                              Senior Member
                              • Sep 2005
                              • 870

                              #15
                              I have narrowed it down to plain old busy server. It doesn't appear to have anything to do with mysql. Mysql just has long queries because the server gets very busy, causing zabbix to malfunction.
                              Unofficial Zabbix Expert
                              Blog, Corporate Site

                              Comment

                              Working...