Ad Widget

Collapse

Zabbix server 2.2.2 on CentOS 6.5 crashes/stops every couple of hours.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • arg
    Junior Member
    • Mar 2014
    • 2

    #1

    Zabbix server 2.2.2 on CentOS 6.5 crashes/stops every couple of hours.

    Since updating to 2.2.2 (official Zabbix repository rpms) the Zabbix server process stops after 1 to 3 hours.

    Downgrading to 2.2.1 solves the crash issue.

    The message in the log is:

    Code:
      9892:20140311:090037.769 [Z3005] query failed: [0] PGRES_FATAL_ERROR:could not receive data from server: Connection timed out
     [begin;]
      9892:20140311:090037.770 [Z3005] query failed: [0] result is NULL [rollback;]
      9892:20140311:090037.779 Cannot connect to the database. Exiting...
      9848:20140311:090037.781 One child process died (PID:9892,exitcode/signal:255). Exiting ...
      9848:20140311:090039.792 syncing history data...
      9848:20140311:090039.792 syncing history data done
      9848:20140311:090039.792 syncing trends data...
      9848:20140311:090041.121 syncing trends data done
      9848:20140311:090041.122 Zabbix Server stopped. Zabbix 2.2.2 (revision 42525).
    Zabbix server versions:

    Code:
    zabbix-web-pgsql-2.2.2-1.el6.noarch
    zabbix-agent-2.2.2-1.el6.x86_64
    zabbix-web-2.2.2-1.el6.noarch
    zabbix-2.2.2-1.el6.x86_64
    zabbix-server-pgsql-2.2.2-1.el6.x86_64
    zabbix-server-2.2.2-1.el6.x86_64
    zabbix-get-2.2.2-1.el6.x86_64
    zabbix-sender-2.2.2-1.el6.x86_64
    The database is PostgreSQL 9.3.3 on a different server.

    Logging on the PostgresSQL server doesn't show any problems.

    Also: shutting down the server when a connection times out seems a bit ... drastic.
    Last edited by arg; 14-03-2014, 10:12. Reason: Definitely a regression
  • bsdtux
    Junior Member
    • Mar 2014
    • 9

    #2
    Try running an active ping on the zabbix server to the database server and run
    Code:
    tail -f /var/log/zabbix/zabbix_server.log
    and see if you have dropped packets.

    You may also want to run tcpdump on the zabbix server and check for dropped packets or TCP Resets which can indicate either congested network or a NIC that is going bad.

    These are two steps that i would take.

    On a side note I am not sure why they are not handling a child process so that it doesn't kill the parent process. This would help with the Zabbix server software from dying just because of a child processes. I was having similar issues when trying to get VMWare monitoring to work.

    Comment

    • arg
      Junior Member
      • Mar 2014
      • 2

      #3
      Thanks for your reply, as you can see in the edited post, this issue is definitely a regression.
      I downgraded to 2.2.1 and the server hasn't crashed since.

      There are still postgres errors in the server log, however they do not lead to a server shutdown:

      Code:
        1954:20140314:090425.624 [Z3005] query failed: [0] PGRES_FATAL_ERROR:could not receive data from server: Connection timed out
       [begin;]
      ...
      
        1954:20140314:090425.624 [Z3005] query failed: [0] result is NULL [rollback;]
        2005:20140314:090425.626 Sending configuration data to proxy '******'. Datalen 130855
      NOTICE:  there is no transaction in progress
        1955:20140314:090558.643 [Z3005] query failed: [0] PGRES_FATAL_ERROR:could not receive data from server: Connection timed out
       [begin;]
        1955:20140314:090558.643 [Z3005] query failed: [0] result is NULL [rollback;]
      NOTICE:  there is no transaction in progress
        1952:20140314:090625.624 [Z3005] query failed: [0] PGRES_FATAL_ERROR:could not receive data from server: Connection timed out
       [begin;]
        1952:20140314:090625.624 [Z3005] query failed: [0] result is NULL [rollback;]
      NOTICE:  there is no transaction in progress
        1961:20140314:090700.629 [Z3005] query failed: [0] PGRES_FATAL_ERROR:could not receive data from server: Connection timed out
       [begin;]
        1961:20140314:090700.629 [Z3005] query failed: [0] result is NULL [rollback;]
      NOTICE:  there is no transaction in progress
      
      ...
      There are no network issues between the zabbix- and postgresql servers no congestion or packetloss. The only issues logged by the firewall(s) are connections dropped by the server shutdown. Hardware issues are ruled out, both servers are VMs, migrating to different hosts made no difference.

      Comment

      • aero
        Senior Member
        • Apr 2013
        • 152

        #4
        I have exactly the same problem, zabbix server is running on CentOS 6.5, database (PostgreSQL 9.3.5) is on a dedicated server.
        Several times a day, zabbix server crashes with the following errors :
        Code:
        27507:20141118:172714.275 [Z3005] query failed: [0] PGRES_FATAL_ERROR:could not receive data from server: Connection timed out
         [begin;]
         27507:20141118:172714.275 [Z3005] query failed: [0] result is NULL [rollback;]
         27507:20141118:172714.283 Cannot connect to the database. Exiting...
         27495:20141118:172714.288 One child process died (PID:27507,exitcode/signal:1). Exiting ...
         27495:20141118:172716.301 syncing history data...
         27495:20141118:172716.372 syncing history data done
         27495:20141118:172716.372 syncing trends data...
         27495:20141118:172717.310 syncing trends data done
         27495:20141118:172717.311 Zabbix Server stopped. Zabbix 2.4.2 (revision 50419).
        I tried to upgrade zabbix server (from 2.4.0, then 2.4.1 and finally 2.4.2) and postgresql database, crashes still repeat.
        Is there a way to fix this problem without downgrading zabbix server ?
        Thanks for reply.
        Last edited by aero; 18-11-2014, 19:17.

        Comment

        • kloczek
          Senior Member
          • Jun 2006
          • 1771

          #5
          Originally posted by aero
          Several times a day, zabbix server crashes with the following errors :
          Code:
          27507:20141118:172714.275 [Z3005] query failed: [0] PGRES_FATAL_ERROR:could not receive data from server: Connection timed out
          I tried to upgrade zabbix server (from 2.4.0, then 2.4.1 and finally 2.4.2) and postgresql database, crashes still repeat.
          Is there a way to fix this problem without downgrading zabbix server ?
          Thanks for reply.
          Why you are trying to upgrade whatever if in logs you have clear information that zabbix is not crashing but stopping because is some connectivity issue with your postgresql DB backend?
          Check what you have in postgresl logs. Check access to zabbix DB from zabbix server using psql command. Check access to to postgres DB using telnet command.
          It is not an zabbix issue.
          http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
          https://kloczek.wordpress.com/
          zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
          My zabbix templates https://github.com/kloczek/zabbix-templates

          Comment

          • aero
            Senior Member
            • Apr 2013
            • 152

            #6
            Thank you for your answer.
            Code:
            Why you are trying to upgrade whatever if in logs you have clear information that zabbix is not crashing but stopping because is some connectivity issue with your postgresql DB backend?
            Because I have this error since I upgraded zabbix server to 2.4.0 and I was hoping that any upgrade fix it. Before that error appears, zabbix version was 2.2.6.
            arg said downgrading zabbix to 2.2.1 from 2.2.2 solves his crash issue, so Zabbix seems to be a little bit involved.
            I have reccuring timeout errors in Postgresql logs (every 30 minutes), but not at the same time that the crashes (occurs from 1 to 3 times a day).
            I just changed DBHost parameter in zabbix config file by setting the IP address instead of domain name.
            Edit: With database IP address on DBHost parameter, always the same issue.
            Last edited by aero; 19-11-2014, 17:34.

            Comment

            • kloczek
              Senior Member
              • Jun 2006
              • 1771

              #7
              Originally posted by aero
              Thank you for your answer.
              Code:
              Why you are trying to upgrade whatever if in logs you have clear information that zabbix is not crashing but stopping because is some connectivity issue with your postgresql DB backend?
              Because I have this error since I upgraded zabbix server to 2.4.0 and I was hoping that any upgrade fix it. Before that error appears, zabbix version was 2.2.6.
              arg said downgrading zabbix to 2.2.1 from 2.2.2 solves his crash issue, so Zabbix seems to be a little bit involved.
              I have reccuring timeout errors in Postgresql logs (every 30 minutes), but not at the same time that the crashes (occurs from 1 to 3 times a day).
              So again. Why since upgrade to 2.4 you are ignoring these errors and not trying to diagnose database issue?
              And again 2: this not zabbix crash. This is normal zabbix behavior that if it is not able to communicate with DB backends it quits its activity.

              If you know that you have postgres timeouts every 30 minutes why you fiddling around zabbix?

              I just changed DBHost parameter in zabbix config file by setting the IP address instead of domain name.
              Edit: With database IP address on DBHost parameter, always the same issue.
              Aaaa .. so you have some kind of DNS issue.
              Please try solve it first. Zabbix upgrade will not solve you non-zabbix issues.
              http://uk.linkedin.com/pub/tomasz-k%...zko/6/940/430/
              https://kloczek.wordpress.com/
              zapish - Zabbix API SHell binding https://github.com/kloczek/zapish
              My zabbix templates https://github.com/kloczek/zabbix-templates

              Comment

              Working...