Ad Widget

Collapse

Zabbix 6.2 - Postgresql shut down...

Collapse
This topic has been answered.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Alfista
    Senior Member
    • Mar 2017
    • 136

    #1

    Zabbix 6.2 - Postgresql shut down...

    Hi,

    I have done a clean install of Zabbix 6.2 with PostgreSQL and TimescaleDB.
    All have some tim eworking without any issues, but after some time I get this errors in PostgreSQL:

    Code:
    2022-07-19 19:30:52.426 CEST [1574296] WARNING: worker took too long to start; canceled
    2022-07-19 19:31:30.009 CEST [1574296] WARNING: worker took too long to start; canceled
    2022-07-19 19:31:35.171 CEST [1989699] WARNING: autovacuum worker started without a worker entry
    2022-07-19 19:31:42.757 CEST [1989707] WARNING: autovacuum worker started without a worker entry
    2022-07-19 19:32:04.331 CEST [1574296] WARNING: worker took too long to start; canceled
    2022-07-19 19:32:25.280 CEST [1574296] WARNING: worker took too long to start; canceled
    2022-07-19 19:33:09.268 CEST [1989734] WARNING: autovacuum worker started without a worker entry
    2022-07-19 19:33:46.726 CEST [1574288] LOG: server process (PID 1574305) was terminated by signal 9: Killed
    2022-07-19 19:33:46.726 CEST [1574288] DETAIL: Failed process was running: select item_conditionid,macro,value,operator from item_condition where itemid=44364
    2022-07-19 19:33:46.726 CEST [1574288] LOG: terminating any other active server processes
    2022-07-19 19:33:48.881 CEST [1574288] LOG: received fast shutdown request
    2022-07-19 19:33:51.903 CEST [1574288] LOG: issuing SIGKILL to recalcitrant children
    2022-07-19 19:33:56.971 CEST [1574288] LOG: issuing SIGKILL to recalcitrant children
    2022-07-19 19:34:55.003 CEST [1574288] LOG: issuing SIGKILL to recalcitrant children
    2022-07-19 19:35:00.054 CEST [1574288] LOG: issuing SIGKILL to recalcitrant children
    2022-07-19 19:35:01.013 CEST [1989843] FATAL: the database system is shutting down
    2022-07-19 19:35:01.014 CEST [1989851] FATAL: the database system is shutting down
    2022-07-19 19:35:01.014 CEST [1989847] FATAL: the database system is shutting down
    2022-07-19 19:35:01.017 CEST [1989846] FATAL: the database system is shutting down
    2022-07-19 19:35:01.435 CEST [1574288] LOG: abnormal database system shutdown
    2022-07-19 19:35:01.686 CEST [1574288] LOG: database system is shut down
    And it shutdown.
    I don't know why its doing.
    I have there moved my templates and host from the older instalation and now try to prepare it for working, but when it shut down the database it can be moved to production.

    Please is possible to help me where can be the problem?

    Thanks.

  • Answer selected by Alfista at 04-08-2022, 15:49.
    Alfista
    Senior Member
    • Mar 2017
    • 136

    Hi,
    maybe I have found the problem.
    I dont know why and how it happens but from beging (clean install) the Zabbix has accepted the TomescaleDB 2.7.1 even it works only with version 2.6.x.
    After an update the whole system was updated the TimescaleDB too to th elatest version 2.7.2 and by this i have found that Zabbix doesnt support and work with version 2.7.x.
    When I have downgraded it to 2.6.1 then its started working and looks that the memroy usage is normal. I still monitor it.

    Comment

    • Alfista
      Senior Member
      • Mar 2017
      • 136

      #2
      Hi,

      The database crashed again now with this errors:

      Code:
      2022-07-23 23:58:38.711 CEST [1047] LOG: server process (PID 2863) was terminated by signal 9: Killed
      2022-07-23 23:58:38.711 CEST [1047] DETAIL: Failed process was running: select distinct i.itemid,i.flags from items i,functions f where i.itemid=f.itemid and f.triggerid in (28839,28840,28841,28842,28843,28844)
      2022-07-23 23:58:38.741 CEST [1047] LOG: terminating any other active server processes
      2022-07-23 23:58:43.899 CEST [1047] LOG: issuing SIGKILL to recalcitrant children
      2022-07-23 23:58:56.349 CEST [1047] LOG: issuing SIGKILL to recalcitrant children
      2022-07-23 23:58:58.507 CEST [260592] FATAL: the database system is in recovery mode
      
      ...
      
      2022-07-24 00:01:41.964 CEST [261616] FATAL: the database system is in recovery mode
      2022-07-24 00:01:42.015 CEST [1047] LOG: received fast shutdown request
      2022-07-24 00:01:42.094 CEST [1047] LOG: abnormal database system shutdown
      2022-07-24 00:01:42.573 CEST [1047] LOG: database system is shut down
      [root@ZabbixServer ~]# cat /var/log/postresql/postgresql_2022-07-23.log
      2022-07-23 23:58:38.711 CEST [1047] LOG: server process (PID 2863) was terminated by signal 9: Killed
      2022-07-23 23:58:38.711 CEST [1047] DETAIL: Failed process was running: select distinct i.itemid,i.flags from items i,functions f where i.itemid=f.itemid and f.triggerid in (28839,28840,28841,28842,28843,28844)
      2022-07-23 23:58:38.741 CEST [1047] LOG: terminating any other active server processes
      2022-07-23 23:58:43.899 CEST [1047] LOG: issuing SIGKILL to recalcitrant children
      2022-07-23 23:58:56.349 CEST [1047] LOG: issuing SIGKILL to recalcitrant children
      2022-07-23 23:58:58.507 CEST [260592] FATAL: the database system is in recovery mode
      I'm new working with PostgreSQL, please can any body help me?

      Thanks.
      Last edited by Alfista; 25-07-2022, 08:41.

      Comment

      • vladimir_lv
        Senior Member
        • May 2022
        • 240

        #3
        What is the Timescale version that you have used?
        Is this your case?

        Comment

        • Alfista
          Senior Member
          • Mar 2017
          • 136

          #4
          And after starting the postgreSQL i get this in logs:

          Code:
          2022-07-25 07:56:23.720 CEST [336795] LOG: starting PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9), 64-bit
          2022-07-25 07:56:23.722 CEST [336795] LOG: listening on IPv6 address "::1", port 5432
          2022-07-25 07:56:23.722 CEST [336795] LOG: listening on IPv4 address "127.0.0.1", port 5432
          2022-07-25 07:56:23.724 CEST [336795] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
          2022-07-25 07:56:23.728 CEST [336795] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
          2022-07-25 07:56:23.794 CEST [336797] LOG: database system was interrupted; last known up at 2022-07-23 23:54:56 CEST
          2022-07-25 07:56:24.300 CEST [336798] FATAL: the database system is starting up
          2022-07-25 07:56:24.306 CEST [336800] FATAL: the database system is starting up
          2022-07-25 07:56:24.316 CEST [336801] FATAL: the database system is starting up
          2022-07-25 07:56:24.350 CEST [336803] FATAL: the database system is starting up
          2022-07-25 07:56:24.414 CEST [336797] LOG: database system was not properly shut down; automatic recovery in progress
          2022-07-25 07:56:24.442 CEST [336797] LOG: redo starts at A/C713EC78
          2022-07-25 07:56:24.756 CEST [336804] FATAL: the database system is starting up
          2022-07-25 07:56:24.772 CEST [336805] FATAL: the database system is starting up
          2022-07-25 07:56:25.358 CEST [336806] FATAL: the database system is starting up
          2022-07-25 07:56:25.372 CEST [336807] FATAL: the database system is starting up
          2022-07-25 07:56:25.422 CEST [336808] FATAL: the database system is starting up
          2022-07-25 07:56:25.582 CEST [336809] FATAL: the database system is starting up
          2022-07-25 07:56:25.719 CEST [336810] FATAL: the database system is starting up
          2022-07-25 07:56:25.993 CEST [336812] FATAL: the database system is starting up
          2022-07-25 07:56:26.055 CEST [336811] FATAL: the database system is starting up
          2022-07-25 07:56:26.362 CEST [336813] FATAL: the database system is starting up
          2022-07-25 07:56:27.021 CEST [336814] FATAL: the database system is starting up
          2022-07-25 07:56:27.366 CEST [336815] FATAL: the database system is starting up
          2022-07-25 07:56:27.393 CEST [336816] FATAL: the database system is starting up
          2022-07-25 07:56:27.444 CEST [336818] FATAL: the database system is starting up
          2022-07-25 07:56:27.448 CEST [336817] FATAL: the database system is starting up
          2022-07-25 07:56:27.776 CEST [336819] FATAL: the database system is starting up
          2022-07-25 07:56:27.837 CEST [336820] FATAL: the database system is starting up
          2022-07-25 07:56:27.854 CEST [336821] FATAL: the database system is starting up
          2022-07-25 07:56:28.185 CEST [336797] LOG: invalid record length at A/CADD37D8: wanted 24, got 0
          2022-07-25 07:56:28.185 CEST [336797] LOG: redo done at A/CADD37A0 system usage: CPU: user: 0.24 s, system: 0.22 s, elapsed: 3.74 s
          2022-07-25 07:56:28.285 CEST [336822] FATAL: the database system is starting up
          2022-07-25 07:56:28.362 CEST [336823] FATAL: the database system is starting up
          2022-07-25 07:56:28.497 CEST [336824] FATAL: the database system is starting up
          2022-07-25 07:56:28.657 CEST [336827] FATAL: the database system is starting up
          2022-07-25 07:56:28.801 CEST [336828] FATAL: the database system is starting up
          2022-07-25 07:56:28.913 CEST [336795] LOG: database system is ready to accept connections
          2022-07-25 07:56:28.931 CEST [336834] LOG: TimescaleDB background worker launcher connected to shared catalogs
          2022-07-25 07:56:30.015 CEST [336837] WARNING: failed to launch job 1000 "Compression Policy [1000]": failed to start a background worker
          2022-07-25 07:56:32.140 CEST [336862] WARNING: there is no transaction in progress
          2022-07-25 07:56:54.790 CEST [336862] ERROR: duplicate key value violates unique constraint "4_4_trends_pkey"
          2022-07-25 07:56:54.790 CEST [336862] DETAIL: Key (itemid, clock)=(29823, 1658610000) already exists.
          2022-07-25 07:56:54.790 CEST [336862] STATEMENT: insert into trends (itemid,clock,num,value_min,value_avg,value_max) values (29823,1658610000,1,96.671458586752181,96.67145858 6752181,96.671458586752181),(52740,1658610000,1,40 46.5901899999999,4046.5901899999999,4046.590189999 9999),(52744,1658610000,1,1.194151,1.194151,1.1941 51),(52865,1658610000,1,31.685005972131329,31.6850 05972131329,31.685005972131329),(58680,1658610000,
          ...
          (67681,1658610000,1,27.753 805,27.753805,27.753805),(67682,1658610000,1,19077 .347321000001,19077.347321000001,19077.34732100000 1),(67686,1658610000,1,22.402808,22.402808,22.4028 08);
          
          2022-07-25 07:56:54.790 CEST [336839] ERROR: duplicate key value violates unique constraint "4_4_trends_pkey"
          2022-07-25 07:56:54.790 CEST [336839] DETAIL: Key (itemid, clock)=(43560, 1658610000) already exists.
          2022-07-25 07:56:54.790 CEST [336839] STATEMENT: insert into trends (itemid,clock,num,value_min,value_avg,value_max) values (43560,1658610000,1,9.4953987730061353,9.495398773 0061353,9.4953987730061353),(52754,1658610000,2,
          ...
          ,-0.00013500000000021828,-0.00013500000000021828),(67271,1658610000,1,0.2387 599999999992,0.2387599999999992,0.2387599999999992 ),(67696,1658610000,2,69.94811470975317,69.9481147 0975317,69.94811470975317);
          
          2022-07-25 07:56:54.791 CEST [336857] ERROR: duplicate key value violates unique constraint "5_5_trends_uint_pkey"
          2022-07-25 07:56:54.791 CEST [336857] DETAIL: Key (itemid, clock)=(52677, 1658610000) already exists.
          2022-07-25 07:56:54.791 CEST [336857] STATEMENT: insert into trends_uint (itemid,clock,num,value_min,value_avg,value_max) values (52677,1658610000,2,1,1,1),(52788,1658610000,2,1,1 ,1),(58698,1658610000,2,0,0,0),(58699,1658610000,2 ,30757879808,30791958528,30826037248),(58700,16586 10000,2,715776,18705920,36696064),(58701,165861000 0,1,8861696,8861696,8861696),(58818,1658610000,2,0 ,0,0),(58819,1658610000,2,0,0,0),(58820,1658610000 ,2,16378630144,16404619264,16430608384),
          ...
          , 8202240,8202240,8202240),(67639,1658610000,2,17688 16640,1768816640,1768816640),(67640,1658610000,2,3 78902922,378902922,378902922),(67698,1658610000,2, 8,8,8),(67699,1658610000,2,1,1,1);
          Thanks.

          Comment

          • Alfista
            Senior Member
            • Mar 2017
            • 136

            #5
            Originally posted by vladimir_lv
            What is the Timescale version that you have used?
            Is this your case?
            https://support.zabbix.com/browse/ZBX-19328
            I dont know, I have installed it from repo as that one for the PostgreSQL 14.
            I have used this command to see the version:
            Code:
            SELECT default_version, installed_version FROM pg_available_extensions where name = 'timescaledb';
            and looks that I have this version:
            Code:
            default_version | installed_version
            -----------------+-------------------
            2.7.1 |
            (1 row)
            Thanks.

            Comment

            • vladimir_lv
              Senior Member
              • May 2022
              • 240

              #6
              Did you complete all steps that are described there?

              Comment

              • Alfista
                Senior Member
                • Mar 2017
                • 136

                #7
                Originally posted by vladimir_lv
                Did you complete all steps that are described there?
                https://www.zabbix.com/documentation...%2Ctimescaledb
                Yes, and I have check it also with the install help video on the Zabbix Install site.

                I dont know why, it works only some days and then crash it as you can see it logs even I have improved settings as they was shown in the video, also added more workers and memory as needed for small instalation.
                Last edited by Alfista; 25-07-2022, 09:14.

                Comment

                • adrian_c
                  Junior Member
                  • Nov 2018
                  • 12

                  #8
                  Have you checked your syslog at the time of the crash, it could be that the process is being killed by the system due to memory allocation errors.

                  Comment

                  • Alfista
                    Senior Member
                    • Mar 2017
                    • 136

                    #9
                    Originally posted by adrian_c
                    Have you checked your syslog at the time of the crash, it could be that the process is being killed by the system due to memory allocation errors.
                    Hi,

                    I have looke it to the syslog and found only this:

                    Code:
                    Jul 24 00:01:07 ZabbixServer pcp-pmie[2603]: Severe demand for real memory 223pgsout/s@ZabbixServer
                    Jul 24 00:01:07 ZabbixServer pcp-pmie[2603]: High per disk average queue length 9.7aveq[sda]@ZabbixServer
                    Jul 24 00:01:07 ZabbixServer pcp-pmie[2603]: High per disk average queue length 0.68%await[sda]@ZabbixServer
                    Jul 24 00:01:42 ZabbixServer systemd[1]: postgresql-14.service: A process of this unit has been killed by the OOM killer.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: pmie_check.service: Deactivated successfully.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: pmie_check.service: Consumed 3.838s CPU time.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: pmie_farm_check.service: Deactivated successfully.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: pmie_farm_check.service: Consumed 1.656s CPU time.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: Starting system activity accounting tool...
                    Jul 24 00:01:42 ZabbixServer systemd[1]: Starting update of the root trust anchor for DNSSEC validation in unbound...
                    Jul 24 00:01:42 ZabbixServer systemd[1]: Starting Rotate log files...
                    Jul 24 00:01:42 ZabbixServer systemd[1]: Started Update a database for mlocate.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: sysstat-collect.service: Deactivated successfully.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: Finished system activity accounting tool.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: postgresql-14.service: Main process exited, code=exited, status=1/FAILURE
                    Jul 24 00:01:42 ZabbixServer systemd[1]: postgresql-14.service: Killing process 1156 (postmaster) with signal SIGKILL.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: postgresql-14.service: Failed with result 'oom-kill'.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: postgresql-14.service: Unit process 1156 (postmaster) remains running after unit stopped.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: postgresql-14.service: Consumed 2h 15min 54.995s CPU time.
                    Jul 24 00:01:42 ZabbixServer systemd[1]: system.slice: A process of this unit has been killed by the OOM killer.
                    Jul 24 00:01:43 ZabbixServer systemd[1]: unbound-anchor.service: Deactivated successfully.
                    Jul 24 00:01:43 ZabbixServer systemd[1]: Finished update of the root trust anchor for DNSSEC validation in unbound.
                    Jul 24 00:01:43 ZabbixServer systemd[1]: Reloading The Apache HTTP Server...
                    But I have found in Top that the Zabbix access to the PostgreSQLuse a lot of memory - arount 90%
                    Code:
                    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
                    342372 postgres 20 0 10.8g 3.8g 75592 S 0.3 24.7 7:36.30 postgres: zabbix zabbix ::1(34504) idle
                    342384 postgres 20 0 10.8g 3.3g 75932 S 0.0 22.0 7:33.82 postgres: zabbix zabbix ::1(53092) idle
                    342367 postgres 20 0 5259740 1.4g 720660 S 0.3 9.0 2:13.24 postgres: zabbix zabbix ::1(34488) idle
                    342373 postgres 20 0 5247096 1.4g 722340 S 0.3 9.0 2:11.20 postgres: zabbix zabbix ::1(34506) idle
                    342377 postgres 20 0 5248920 1.3g 719008 S 0.0 8.8 2:11.13 postgres: zabbix zabbix ::1(34532) idle
                    342368 postgres 20 0 5253832 1.3g 718812 S 0.7 8.6 2:11.85 postgres: zabbix zabbix ::1(34490) idle
                    342376 postgres 20 0 7021668 1.3g 55720 S 0.3 8.5 2:25.33 postgres: zabbix zabbix ::1(34516) idle
                    342358 postgres 20 0 4393360 820040 819340 S 0.3 5.1 2:14.02 postgres: checkpointer
                    342399 postgres 20 0 5137340 409600 47296 S 0.0 2.6 2:13.00 postgres: zabbix zabbix ::1(53164) idle
                    342374 postgres 20 0 4933904 315112 47028 S 0.0 2.0 0:46.44 postgres: zabbix zabbix ::1(34510) idle
                    342369 postgres 20 0 4932796 299236 46960 S 0.0 1.9 0:46.63 postgres: zabbix zabbix ::1(34494) idle
                    342380 postgres 20 0 4922080 287892 47052 S 0.0 1.8 0:45.05 postgres: zabbix zabbix ::1(34562) idle
                    342379 postgres 20 0 4689660 178416 45988 S 0.0 1.1 0:42.56 postgres: zabbix zabbix ::1(34548) idle
                    342433 postgres 20 0 4588856 166076 71060 S 0.0 1.0 1:01.44 postgres: zabbix zabbix ::1(50046) idle
                    342409 postgres 20 0 4546404 127720 45216 S 0.0 0.8 0:12.24 postgres: zabbix zabbix ::1(53238) idl
                    e

                    And I dont know why?
                    Maybe this is the problem. Is possible to reduce it somehow, that it doesnt use so high amount of memory?

                    Thanks.


                    Comment

                    • Alfista
                      Senior Member
                      • Mar 2017
                      • 136

                      #10
                      Hi,
                      maybe I have found the problem.
                      I dont know why and how it happens but from beging (clean install) the Zabbix has accepted the TomescaleDB 2.7.1 even it works only with version 2.6.x.
                      After an update the whole system was updated the TimescaleDB too to th elatest version 2.7.2 and by this i have found that Zabbix doesnt support and work with version 2.7.x.
                      When I have downgraded it to 2.6.1 then its started working and looks that the memroy usage is normal. I still monitor it.

                      Comment

                      • vladimir_lv
                        Senior Member
                        • May 2022
                        • 240

                        #11
                        Yes, you are right. Even for the new 6.2.1 version the maximum supported version for TimescaleDB is now 2.6. As far as ever the old adage is true: just read the f..ng manual. )))

                        Comment

                        Working...