Ad Widget

Collapse

1.4.5 Crashes under heavy activity load...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • makini
    Member
    • Jul 2006
    • 59

    #1

    1.4.5 Crashes under heavy activity load...

    Hi,

    We have about 100 machines w/ agents & single server. All was working stably on 1.1.7 up until the server migration to 1.4.5, agents stayed 1.1.7
    After the migration, server process started randomly crashing under heavy loads of activity - e.g. agents going down, many alerts triggered at once, latency on connection to agents and etc.

    Initially, errors about connection to the database (mysql 5.0.27) appeared in the log with "[Interrupted system call]" and same message related to timeouts on connection to agents. After reading the posts here on this we changed the trappers value in conf file, set up mysql for more connections and increased the timeouts. Those errors were gone but the sporadic crashes of the server remained. Error log doesn't help much beyond "One child process died. Exiting ..."

    There was a post here regarding a bad 1.4.5 version that got cached after release - checked it, the correct one was used.

    Setup is Centos 5 (2.6.18-8.1.8.el5), 1GB mem, 3Ghz dual Xeon.

    The machine itself is not under a heavy load during the crash. Conf files, error log and my.cnf are attached.

    Any clues?
    Attached Files
    Last edited by makini; 17-04-2008, 14:14. Reason: typo...
  • vinny
    Senior Member
    • Jan 2008
    • 145

    #2
    Have u tried with up-to-date agents ?

    Agent version 1.1.7 may be a little too old...
    -------
    Zabbix 1.8.3, 1200+ Hosts, 40 000+ Items...zabbix's everywhere

    Comment

    • makini
      Member
      • Jul 2006
      • 59

      #3
      We're in process...

      We are in process of upgrading the agents.

      But still, they do say in the 1.4 manual it is compatible. Those are stable agents that worked perfectly with 1.1.7 server... It's the server that crashes.

      What's more troubling is that even with the debug logging enabled it's impossible to see what child process of the server that died first, clueless of the reason...

      Comment

      • bbrendon
        Senior Member
        • Sep 2005
        • 870

        #4
        I went through a lot of effort troubleshooting strange behaviors as you saw.

        Make sure to save the log after you start zabbix and then again when it crashes. I ended up scripting this because it was such a common occurrence. After alex figured out that the trappers had to be increased, I was golden until this morning.

        My zabbix server monitpring cron job notified my zabbix_server wasn't running. I haven't been running in debug mode lately. It probably ran for a week before I got the "one process has exited".

        We might be experiencing the same issue!

        Don't worry about the agents being down-rev. Thats a supported configuration and we have a few 1.1.x agents still as well.
        Unofficial Zabbix Expert
        Blog, Corporate Site

        Comment

        • makini
          Member
          • Jul 2006
          • 59

          #5
          Another server crash...

          We just had another server crash...

          This time there was at least a clue:
          Code:
          4335:20080423:070033 Query failed:MySQL server has gone away [2006]
          Seems like the server in 1.4.5 is not as close to stable as 1.1.7 was... Besides, isn't the DB server crash suppose to be handled using the zabbix server now? There is even a setting to notify a user group on such.

          Comment

          • Alexei
            Founder, CEO
            Zabbix Certified Trainer
            Zabbix Certified SpecialistZabbix Certified Professional
            • Sep 2004
            • 5654

            #6
            Please send complete after-crash server log files to a l e x @ z a b b i x . c o m. Thank you.
            Last edited by Alexei; 24-04-2008, 08:50.
            Alexei Vladishev
            Creator of Zabbix, Product manager
            New York | Tokyo | Riga
            My Twitter

            Comment

            • makini
              Member
              • Jul 2006
              • 59

              #7
              It looks like a DB problem....

              Well,

              After many recent crashes we had, it looks like a DB problem. Zabbix server however is NOT indicating it in any way whatsoever - logs or DB failure mail alerts. It just crashes...

              There was this clue in mysql logs:
              Code:
              080429  2:14:02 [Warning] Aborted connection 296517 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
              080429  2:14:02 [Warning] Aborted connection 296505 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
              080429  2:14:02 [Warning] Aborted connection 296447 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
              080429  2:14:02 [Warning] Aborted connection 296506 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
              080429  2:14:02 [Warning] Aborted connection 296438 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
              P.S.
              MySQL is configured to handle more connections than by default - see my.cnf attached in the first post here.
              Last edited by makini; 29-04-2008, 13:11.

              Comment

              • screeble
                Member
                • Dec 2011
                • 34

                #8
                Greetings,

                We have the same problem on our zabbix server. Do you have any ideas how to resolve it?

                Code:
                120113  3:00:01 [Warning] Aborted connection 39 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                120113  3:00:01 [Warning] Aborted connection 153 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                120113  3:00:01 [Warning] Aborted connection 148 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                120113  3:00:01 [Warning] Aborted connection 24 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                120113  3:00:01 [Warning] Aborted connection 149 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                120113  3:00:01 [Warning] Aborted connection 35 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                120113  3:00:01 [Warning] Aborted connection 150 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                120113  3:00:01 [Warning] Aborted connection 151 to db: 'zabbix' user: 'zabbix' host: 'localhost' (Got an error reading communication packets)
                Looking forward to hearing from you. Thank you.

                Comment

                Working...