Skip to content

puller(ticdc): should split all update kv entries in puller when sink module is in safe mode #11231

@lidezhu

Description

@lidezhu

Previous Problem

In #10919, we addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of UPDATE events received by TiCDC.

Take the following SQL statements as an example:

CREATE TABLE t (a INT PRIMARY KEY, b INT);
INSERT INTO t VALUES (1, 1);
INSERT INTO t VALUES (2, 2);

BEGIN;
UPDATE t SET a = 3 WHERE a = 2;
UPDATE t SET a = 2 WHERE a = 1;
COMMIT;

In this example, the two UPDATE statements within the transaction have a sequential dependency on execution. The primary key a is changed from 2 to 3, and then the primary key a is changed from 1 to 2. After this transaction is executed, the records in the upstream database are (2, 1) and (3, 2).

However, the order of UPDATE events received by TiCDC might differ from the actual execution order of the upstream transaction. For example:

UPDATE t SET a = 2 WHERE a = 1;
UPDATE t SET a = 3 WHERE a = 2;

Previously, TiCDC just splits these UPDATE events into DELETE and INSERT events before sending them to downstream. After the split, the actual execution order of these events in the downstream is as follows:

BEGIN;
DELETE FROM t WHERE a = 1;
REPLACE INTO t VALUES (2, 1);
DELETE FROM t WHERE a = 2;
REPLACE INTO t VALUES (3, 2);
COMMIT;

After the downstream executes the transaction, the records in the database are (3, 2), which are different from the records in the upstream database ((2, 1) and (3, 2)), indicating a data inconsistency issue.

To fix this problem, we need a mechanism to reorder the DELETE and INSERT events after the split.

For example, if we can guarantee all DELETE events are executing before the INSERT events in the same transaction. The data inconsistency problem can be avoided.

 BEGIN;
 DELETE FROM t WHERE a = 1;
 DELETE FROM t WHERE a = 2;
 REPLACE INTO t VALUES (2, 1);
 REPLACE INTO t VALUES (3, 2);
 COMMIT;

After the downstream executes the transaction, the records in the downstream database are the same as those in the upstream database, which are (2, 1) and (3, 2), ensuring data consistency.

Previous Solution

But the goal to reorder the DELETE and INSERT events after the split is not easy to accomplish.

The whole process of TiCDC processing a row of data and sending it to the MySQL compatible downstream is as follows:

  1. TiCDC obtains KV data from TiKV and stores it in the local hard disk and sorts it according to the transaction CommitTS and event type (DELETE > UPDATE > INSERT);
  2. The Sink module pulls KV data from the local hard disk in units of transactions and decodes the KV data into Event data (including the column name, type, data, etc. of each column) according to the Table Schema;
  3. The Sink module translates the Event data into SQL and sends it to the downstream;

Splitting the UPDATE event in the sink module can ensure that only the events that update the primary key or the unique non-null unique key are split. But in sink module the event data has been fully loaded into the memory, so it is impossible to load all its data into the memory for splitting and then sorting in large transaction scenario;

If the UPDATE event is split before TiCDC writes the KV data to the local disk, it is hard to know whether the event updates the primary key or the unique non-null unique key, so we can only choose to split all UPDATE events. However, splitting all UPDATE events will lead to performance degradation in some scenarios (sysbench oltp_update_non_index performance degradation of about 41%).

Therefore, we adopted a compromise solution in #10919: When a new table sink start to write data to downstream, we fetch the current timestamp from pd as replicateTS, and we just split all update KV entries which commitTS is smaller than the replicateTS.

After the fix, when using the MySQL sink, TiCDC does not split the UPDATE event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting UPDATE events into DELETE and INSERT events before writing them to the local disk. This ensures that all events within the same transaction are correctly ordered, with all DELETE events preceding INSERT events, thus correctly completing data replication.

Current Issue

In previous solution, we rely on restarting changefeed automatically to fix data conflicting error and prevent data inconsistency problem caused by incorrect order of UPDATE events in the same transaction received by TiCDC.

Although after restart, puller can split the conflict UPDATE events and continue to run correctly, and from test we can see that the restart has no noticeable impact on latency, some users may be unhappy with this behavior if their workload has many conflict data which cause changefeed restart occasionally. So we need a workaround to avoid restart if need.

TiCDC has a config known as safe-mode and is default as false currently.
If safe-mode is true,

  1. All INSERT events are translated to REPLACE sql;
  2. All UPDATE events are split into DELETE and INSERT events in sink module(This behavior removed in *(ticdc): split old update kv entry after restarting changefeed #10919 as described in the previous Problem part);

Because in safe-mode, performance degradation is already expected(REPLACE is much slower than INSERT), so we suggest a solution to split all UPDATE KV entries before write them to disk when safe-mode is true. Users can enable safe-mode to prevent changefeed from restarting due to data conflicts while also prevent data inconsistency problem to happen.

Metadata

Metadata

Assignees

Labels

affects-6.5This bug affects the 6.5.x(LTS) versions.affects-7.1This bug affects the 7.1.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.area/ticdcIssues or PRs related to TiCDC.type/enhancementThe issue or PR belongs to an enhancement.type/regression

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions