-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Currently, TiCDC replicates all data changes for the tables configured in a changefeed. However, users often have scenarios where they do not want all data to be synchronized. For example:
- Irrelevant Data: Tables may contain temporary or "soft-deleted" rows (e.g.,
is_deleted = true
) that are not needed by the downstream application. - Operational DDLs: Certain DDL operations, like adding a non-critical index or minor table alterations, might not need to be propagated.
- Workload Segregation: Users may want to replicate only high-value data (e.g.,
orders where amount > 1000
) while ignoring low-value or operational data within the same table.
The lack of a built-in, granular filtering mechanism at the source leads to unnecessary network traffic, increased storage and processing costs for the downstream consumer, and requires users to implement complex and inefficient filtering logic in their own applications.
We propose the implementation of a flexible and powerful event filtering system directly within TiCDC. This system should allow users to define rules in the changefeed configuration to exclude specific DML and DDL events before they are sent to a sink.
The filtering capabilities should include:
-
Event Type Filtering: The ability to filter events based on their fundamental type.
- DML:
INSERT
,UPDATE
,DELETE
. - DDL:
CREATE TABLE
,ALTER TABLE
,DROP TABLE
,TRUNCATE TABLE
, etc.
- DML:
-
DDL Content Filtering: The ability to filter DDL events by matching regular expressions against the full SQL query string. This is useful for ignoring specific patterns, like
ALTER TABLE ... ADD CACHE
. -
DML Value Filtering: The ability to filter DML events based on expressions evaluated against the row's column values. For example, a user should be able to specify a rule to:
- Ignore
INSERT
events wherestatus = 'archived'
. - Ignore
UPDATE
events where the new value ofamount
is less than10
.
- Ignore