Ad Widget

Collapse

housekeeping of events is very slow

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • krej
    Junior Member
    • Jul 2019
    • 4

    #1

    housekeeping of events is very slow

    I'm having problems understanding why deleting events is so slow.

    Zabbix: 4.2 (never mind the name of the db - it is 4.2)
    Postgresql: 10.8
    nvps: ~400
    hosts: ~600
    items: ~45000

    event table: ~25 million rows
    event_recovery table: ~12 million rows
    alerts table: ~600000 rows

    We've already partitioned all history* and trends* tables - which has worked fine and without problems for a year. I don't think I've seen a definitive guide on partitioning of events, and the since the db schema is a bit complex I haven't tried to figure out how to transition with partitioning of events. I would be happy for any pointers on this.

    Now we've seen that housekeeping of events is very slow - a single (hourly) run can take more than one day to finish, so events keep stacking up. A typical slow delete statement, from the postgres log:

    postgresql-10-20190717-031404.log:2019-07-17 03:37:43 CEST [80965]: [4-1] user=zabbix,db=zabbix_34,app=[unknown],client=[local]: LOG: duration: 27298798.930 ms statement: delete from events where (eventid between 5580621 and 5580681 or eventid between 5580689 and 5580762 or eventid between 5580769 and 5580844 or eventid between 5580851 and 5580867 or eventid between 5580869 and 5580926 or eventid between 5580933 and 5580949 or eventid between 5580963 and 5581024
    --- 8< --- a lot of similar eventids snipped away -----
    or eventid between 5586799 and 5586839 or eventid in (5581385,5581389,5581561,5581563,5581564,5581580,5 581582,5581584,5581585,5581635))



    I've analyzed the deletion of a single row in events:

    zabbix_34=> begin;
    BEGIN
    zabbix_34=> explain (analyze,buffers,timing)
    delete from events where eventid = 5600001;
    QUERY PLAN
    --------------------------------------------------------------------------------------------------------------------------------
    Delete on events (cost=0.44..8.46 rows=1 width=6) (actual time=0.260..0.260 rows=0 loops=1)
    Buffers: shared hit=6 dirtied=1
    -> Index Scan using idx_29337_primary on events (cost=0.44..8.46 rows=1 width=6) (actual time=0.043..0.044 rows=1 loops=1)
    Index Cond: (eventid = '5600001'::numeric)
    Buffers: shared hit=4 dirtied=1
    Planning time: 0.060 ms
    Trigger for constraint c_alerts_5: time=330.938 calls=1
    Trigger for constraint c_event_suppress_1: time=0.102 calls=1
    Trigger for constraint c_acknowledges_2: time=0.081 calls=1
    Trigger for constraint c_alerts_2: time=0.080 calls=1
    Trigger for constraint c_event_recovery_1: time=1821.648 calls=1
    Trigger for constraint c_event_recovery_2: time=1866.104 calls=1
    Trigger for constraint c_event_recovery_3: time=762.721 calls=1

    Trigger for constraint c_event_tag_1: time=3.526 calls=1
    Trigger for constraint c_problem_1: time=2.016 calls=1
    Trigger for constraint c_problem_2: time=1.162 calls=1
    Execution time: 4788.672 ms
    (17 rows)
    zabbix_34=> rollback;


    5 seconds to delete a single row, wow! This shows that it is the foreign key constraints on event_recovery and alerts that take a lot of time. But why? In this case there is no row with eventid=5600001 in either event_recovery or alerts. There are indexes or primary keys (as per the schema for Zabbix 4.2) on all the referencing columns. The database is analyzed and vacuumed nightly. Querying event_recovery for this eventid is fast

    zabbix_34=> explain (analyze,timing) select * from event_recovery where eventid = 5600001;
    QUERY PLAN
    --------------------------------------------------------------------------------------------------------------------------------------------
    Index Scan using event_recovery_rb_test_100 on event_recovery (cost=0.43..8.45 rows=1 width=40) (actual time=0.055..0.055 rows=0 loops=1)
    Index Cond: (eventid = 5600001)
    Planning time: 0.157 ms
    Execution time: 0.068 ms
    (4 rows)

    zabbix_34=> select * from event_recovery where eventid = 5600001;
    eventid | r_eventid | c_eventid | correlationid | userid
    ---------+-----------+-----------+---------------+--------
    (0 rows)



    Any ideas?
  • vso
    Zabbix developer
    • Aug 2016
    • 190

    #2
    Please show create table for event_recovery

    Comment

    • krej
      Junior Member
      • Jul 2019
      • 4

      #3
      Certainly, this is fresh from the database:


      Code:
      CREATE TABLE zabbix.event_recovery (
          eventid bigint NOT NULL,
          r_eventid bigint NOT NULL,
          c_eventid bigint,
          correlationid bigint,
          userid bigint
      );
      
      ALTER TABLE zabbix.event_recovery OWNER TO zabbix;
      
      ALTER TABLE ONLY zabbix.event_recovery
          ADD CONSTRAINT event_recovery_pkey PRIMARY KEY (eventid);
      
      CREATE INDEX event_recovery_1 ON zabbix.event_recovery USING btree (r_eventid);
      CREATE INDEX event_recovery_2 ON zabbix.event_recovery USING btree (c_eventid);
      
      ALTER TABLE ONLY zabbix.event_recovery
          ADD CONSTRAINT c_event_recovery_1 FOREIGN KEY (eventid) REFERENCES zabbix.events(eventid) ON DELETE CASCADE;
      
      ALTER TABLE ONLY zabbix.event_recovery
          ADD CONSTRAINT c_event_recovery_2 FOREIGN KEY (r_eventid) REFERENCES zabbix.events(eventid) ON DELETE CASCADE;
      
      ALTER TABLE ONLY zabbix.event_recovery
          ADD CONSTRAINT c_event_recovery_3 FOREIGN KEY (c_eventid) REFERENCES zabbix.events(eventid) ON DELETE CASCADE;
      ===================

      Code:
      CREATE TABLE zabbix.events (
          eventid numeric NOT NULL,
          source bigint DEFAULT '0'::bigint NOT NULL,
          object bigint DEFAULT '0'::bigint NOT NULL,
          objectid numeric DEFAULT '0'::numeric NOT NULL,
          clock bigint DEFAULT '0'::bigint NOT NULL,
          value bigint DEFAULT '0'::bigint NOT NULL,
          acknowledged bigint DEFAULT '0'::bigint NOT NULL,
          ns bigint DEFAULT '0'::bigint NOT NULL,
          name character varying(2048) DEFAULT ''::character varying NOT NULL,
          severity integer DEFAULT 0 NOT NULL
      );
      
      ALTER TABLE zabbix.events OWNER TO zabbix;
      
      ALTER TABLE ONLY zabbix.events
          ADD CONSTRAINT idx_29337_primary PRIMARY KEY (eventid);
      
      CREATE INDEX events_1 ON zabbix.events USING btree (source, object, objectid, clock);
      CREATE INDEX events_2 ON zabbix.events USING btree (source, object, clock);
      CREATE INDEX events_clk_3 ON zabbix.events USING btree (clock);
      ===================

      Code:
      CREATE TABLE zabbix.alerts (
          alertid numeric NOT NULL,
          actionid numeric NOT NULL,
          eventid numeric NOT NULL,
          userid numeric,
          clock bigint DEFAULT '0'::bigint NOT NULL,
          mediatypeid numeric,
          sendto character varying(1024) DEFAULT ''::character varying NOT NULL,
          subject character varying(255) DEFAULT ''::character varying NOT NULL,
          message text DEFAULT ''::text NOT NULL,
          status bigint DEFAULT '0'::bigint NOT NULL,
          retries bigint DEFAULT '0'::bigint NOT NULL,
          error character varying(2048) DEFAULT ''::character varying NOT NULL,
          esc_step bigint DEFAULT '0'::bigint NOT NULL,
          alerttype bigint DEFAULT '0'::bigint NOT NULL,
          p_eventid bigint,
          acknowledgeid bigint
      );
      
      ALTER TABLE zabbix.alerts OWNER TO zabbix;
      
      ALTER TABLE ONLY zabbix.alerts
          ADD CONSTRAINT idx_29120_primary PRIMARY KEY (alertid);
      
      CREATE INDEX alerts_1 ON zabbix.alerts USING btree (actionid);
      CREATE INDEX alerts_2 ON zabbix.alerts USING btree (clock);
      CREATE INDEX alerts_3 ON zabbix.alerts USING btree (eventid);
      CREATE INDEX alerts_4 ON zabbix.alerts USING btree (status);
      CREATE INDEX alerts_5 ON zabbix.alerts USING btree (mediatypeid);
      CREATE INDEX alerts_6 ON zabbix.alerts USING btree (userid);
      CREATE INDEX alerts_7 ON zabbix.alerts USING btree (p_eventid);
      
      ALTER TABLE ONLY zabbix.alerts
          ADD CONSTRAINT c_alerts_1 FOREIGN KEY (actionid) REFERENCES zabbix.actions(actionid) ON UPDATE RESTRICT ON DELETE CASCADE;
      
      ALTER TABLE ONLY zabbix.alerts
          ADD CONSTRAINT c_alerts_2 FOREIGN KEY (eventid) REFERENCES zabbix.events(eventid) ON UPDATE RESTRICT ON DELETE CASCADE;
      
      ALTER TABLE ONLY zabbix.alerts
          ADD CONSTRAINT c_alerts_3 FOREIGN KEY (userid) REFERENCES zabbix.users(userid) ON UPDATE RESTRICT ON DELETE CASCADE;
      
      ALTER TABLE ONLY zabbix.alerts
          ADD CONSTRAINT c_alerts_4 FOREIGN KEY (mediatypeid) REFERENCES zabbix.media_type(mediatypeid) ON UPDATE RESTRICT ON DELETE CASCADE;
      
      ALTER TABLE ONLY zabbix.alerts
          ADD CONSTRAINT c_alerts_5 FOREIGN KEY (p_eventid) REFERENCES zabbix.events(eventid) ON DELETE CASCADE;
      
      ALTER TABLE ONLY zabbix.alerts
          ADD CONSTRAINT c_alerts_6 FOREIGN KEY (acknowledgeid) REFERENCES zabbix.acknowledges(acknowledgeid) ON DELETE CASCADE;

      Comment

      • vso
        Zabbix developer
        • Aug 2016
        • 190

        #4
        Tables look fine to me, was analyze performed with Zabbix server running ?

        Comment

        • krej
          Junior Member
          • Jul 2019
          • 4

          #5
          Originally posted by vso
          Tables look fine to me, was analyze performed with Zabbix server running ?
          Yes.

          xxx
          xxx
          xxx

          Comment

          • vso
            Zabbix developer
            • Aug 2016
            • 190

            #6
            Maybe explain was slow due to delete query in progress, there could be another issue, maybe there are simply too many events.

            Comment

            • krej
              Junior Member
              • Jul 2019
              • 4

              #7
              We've found the issue, which was that the tables have different types (numeric vs bigint) on the foreign key columns. Thus the index on the referencing tables was not used when deleting from the referenced table. This was probably caused by a migration from MySQL to PostgreSQL a few years ago.

              Comment

              Working...