Skip to content

Commit 4e63851

Browse files
PatrickRenleonardBang
authored andcommitted
[mysql] Update docs of specifying starting offset feature of MySQL CDC source
1 parent c074518 commit 4e63851

File tree

2 files changed

+146
-9
lines changed

2 files changed

+146
-9
lines changed

docs/content/connectors/mysql-cdc(ZH).md

Lines changed: 73 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -215,9 +215,44 @@ Flink SQL> SELECT * FROM orders;
215215
<td style="word-wrap: break-word;">initial</td>
216216
<td>String</td>
217217
<td> MySQL CDC 消费者可选的启动模式,
218-
合法的模式为 "initial" 和 "latest-offset"。
218+
合法的模式为 "initial","earliest-offset","latest-offset","specific-offset" 和 "timestamp"。
219219
请查阅 <a href="#a-name-id-002-a">启动模式</a> 章节了解更多详细信息。</td>
220-
</tr>
220+
</tr>
221+
<tr>
222+
<td>scan.startup.specific-offset.file</td>
223+
<td>optional</td>
224+
<td style="word-wrap: break-word;">(none)</td>
225+
<td>String</td>
226+
<td>在 "specific-offset" 启动模式下,启动位点的 binlog 文件名。</td>
227+
</tr>
228+
<tr>
229+
<td>scan.startup.specific-offset.pos</td>
230+
<td>optional</td>
231+
<td style="word-wrap: break-word;">(none)</td>
232+
<td>Long</td>
233+
<td>在 "specific-offset" 启动模式下,启动位点的 binlog 文件位置。</td>
234+
</tr>
235+
<tr>
236+
<td>scan.startup.specific-offset.gtid-set</td>
237+
<td>optional</td>
238+
<td style="word-wrap: break-word;">(none)</td>
239+
<td>Long</td>
240+
<td>在 "specific-offset" 启动模式下,启动位点的 GTID 集合。</td>
241+
</tr>
242+
<tr>
243+
<td>scan.startup.specific-offset.skip-events</td>
244+
<td>optional</td>
245+
<td style="word-wrap: break-word;">(none)</td>
246+
<td>String</td>
247+
<td>在指定的启动位点后需要跳过的事件数量。</td>
248+
</tr>
249+
<tr>
250+
<td>scan.startup.specific-offset.skip-rows</td>
251+
<td>optional</td>
252+
<td style="word-wrap: break-word;">(none)</td>
253+
<td>String</td>
254+
<td>在指定的启动位点后需要跳过的数据行数量。</td>
255+
</tr>
221256
<tr>
222257
<td>server-time-zone</td>
223258
<td>optional</td>
@@ -520,9 +555,43 @@ MySQL CDC 连接器是一个 Flink Source 连接器,它将首先读取表快
520555
配置选项```scan.startup.mode```指定 MySQL CDC 使用者的启动模式。有效枚举包括:
521556

522557
- `initial` (默认):在第一次启动时对受监视的数据库表执行初始快照,并继续读取最新的 binlog。
523-
- `latest-offset`: 首次启动时,从不对受监视的数据库表执行快照, 连接器仅从 binlog 的结尾处开始读取,这意味着连接器只能读取在连接器启动之后的数据更改。
558+
- `earliest-offset`:跳过快照阶段,从可读取的最早 binlog 位点开始读取
559+
- `latest-offset`:首次启动时,从不对受监视的数据库表执行快照, 连接器仅从 binlog 的结尾处开始读取,这意味着连接器只能读取在连接器启动之后的数据更改。
560+
- `specific-offset`:跳过快照阶段,从指定的 binlog 位点开始读取。位点可通过 binlog 文件名和位置指定,或者在 GTID 在集群上启用时通过 GTID 集合指定。
561+
- `timestamp`:跳过快照阶段,从指定的时间戳开始读取 binlog 事件。
562+
563+
例如使用 DataStream API:
564+
```java
565+
MySQLSource.builder()
566+
.startupOptions(StartupOptions.earliest()) // 从最早位点启动
567+
.startupOptions(StartupOptions.latest()) // 从最晚位点启动
568+
.startupOptions(StartupOptions.specificOffset("mysql-bin.000003", 4L) // 从指定 binlog 文件名和位置启动
569+
.startupOptions(StartupOptions.specificOffset("24DA167-0C0C-11E8-8442-00059A3C7B00:1-19")) // 从 GTID 集合启动
570+
.startupOptions(StartupOptions.timestamp(1667232000000L) // 从时间戳启动
571+
...
572+
.build()
573+
```
574+
575+
使用 SQL:
576+
577+
```SQL
578+
CREATE TABLE mysql_source (...) WITH (
579+
'connector' = 'mysql-cdc',
580+
'scan.startup.mode' = 'earliest-offset', -- 从最早位点启动
581+
'scan.startup.mode' = 'latest-offset', -- 从最晚位点启动
582+
'scan.startup.mode' = 'specific-offset', -- 从特定位点启动
583+
'scan.startup.mode' = 'timestamp', -- 从特定位点启动
584+
'scan.startup.specific-offset.file' = 'mysql-bin.000003', -- 在特定位点启动模式下指定 binlog 文件名
585+
'scan.startup.specific-offset.pos' = '4', -- 在特定位点启动模式下指定 binlog 位置
586+
'scan.startup.specific-offset.gtid-set' = '24DA167-0C0C-11E8-8442-00059A3C7B00:1-19', -- 在特定位点启动模式下指定 GTID 集合
587+
'scan.startup.timestamp-millis' = '1667232000000' -- 在时间戳启动模式下指定启动时间戳
588+
...
589+
)
590+
```
591+
592+
**注意**MySQL source 会在 checkpoint 时将当前位点以 INFO 级别打印到日志中,日志前缀为 "Binlog offset on checkpoint {checkpoint-id}"
593+
该日志可以帮助将作业从某个 checkpoint 的位点开始启动的场景。
524594

525-
_Note: 扫描启动机制的模式选项依赖于 Debezium 的快照模式配置。所以请不要同时使用它们。如果你同时指定了`scan.startup.mode`` debezium.snapshot.mode`选项在表DDL中,它可能会使`scan.startup.mode`不起作用。_
526595

527596
### DataStream Source
528597

docs/content/connectors/mysql-cdc.md

Lines changed: 73 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -218,10 +218,44 @@ Connector Options
218218
<td>optional</td>
219219
<td style="word-wrap: break-word;">initial</td>
220220
<td>String</td>
221-
<td>Optional startup mode for MySQL CDC consumer, valid enumerations are "initial"
222-
and "latest-offset".
223-
Please see <a href="#startup-reading-position">Startup Reading Position</a>section for more detailed information.</td>
224-
</tr>
221+
<td>Optional startup mode for MySQL CDC consumer, valid enumerations are "initial", "earliest-offset", "latest-offset", "specific-offset" and "timestamp".
222+
Please see <a href="#startup-reading-position">Startup Reading Position</a> section for more detailed information.</td>
223+
</tr>
224+
<tr>
225+
<td>scan.startup.specific-offset.file</td>
226+
<td>optional</td>
227+
<td style="word-wrap: break-word;">(none)</td>
228+
<td>String</td>
229+
<td>Optional binlog file name used in case of "specific-offset" startup mode</td>
230+
</tr>
231+
<tr>
232+
<td>scan.startup.specific-offset.pos</td>
233+
<td>optional</td>
234+
<td style="word-wrap: break-word;">(none)</td>
235+
<td>Long</td>
236+
<td>Optional binlog file position used in case of "specific-offset" startup mode</td>
237+
</tr>
238+
<tr>
239+
<td>scan.startup.specific-offset.gtid-set</td>
240+
<td>optional</td>
241+
<td style="word-wrap: break-word;">(none)</td>
242+
<td>String</td>
243+
<td>Optional GTID set used in case of "specific-offset" startup mode</td>
244+
</tr>
245+
<tr>
246+
<td>scan.startup.specific-offset.skip-events</td>
247+
<td>optional</td>
248+
<td style="word-wrap: break-word;">(none)</td>
249+
<td>String</td>
250+
<td>Optional number of events to skip after the specific starting offset</td>
251+
</tr>
252+
<tr>
253+
<td>scan.startup.specific-offset.skip-rows</td>
254+
<td>optional</td>
255+
<td style="word-wrap: break-word;">(none)</td>
256+
<td>String</td>
257+
<td>Optional number of rows to skip after the specific starting offset</td>
258+
</tr>
225259
<tr>
226260
<td>server-time-zone</td>
227261
<td>optional</td>
@@ -527,10 +561,44 @@ both snapshot phase and binlog phase, MySQL CDC connector read with **exactly-on
527561
The config option `scan.startup.mode` specifies the startup mode for MySQL CDC consumer. The valid enumerations are:
528562

529563
- `initial` (default): Performs an initial snapshot on the monitored database tables upon first startup, and continue to read the latest binlog.
564+
- `earliest-offset`: Skip snapshot phase and start reading binlog events from the earliest accessible binlog offset.
530565
- `latest-offset`: Never to perform snapshot on the monitored database tables upon first startup, just read from
531566
the end of the binlog which means only have the changes since the connector was started.
567+
- `specific-offset`: Skip snapshot phase and start reading binlog events from a specific offset. The offset could be
568+
specified with binlog filename and position, or a GTID set if GTID is enabled on server.
569+
- `timestamp`: Skip snapshot phase and start reading binlog events from a specific timestamp.
570+
571+
For example in DataStream API:
572+
```java
573+
MySQLSource.builder()
574+
.startupOptions(StartupOptions.earliest()) // Start from earliest offset
575+
.startupOptions(StartupOptions.latest()) // Start from latest offset
576+
.startupOptions(StartupOptions.specificOffset("mysql-bin.000003", 4L) // Start from binlog file and offset
577+
.startupOptions(StartupOptions.specificOffset("24DA167-0C0C-11E8-8442-00059A3C7B00:1-19")) // Start from GTID set
578+
.startupOptions(StartupOptions.timestamp(1667232000000L) // Start from timestamp
579+
...
580+
.build()
581+
```
582+
583+
and with SQL:
584+
585+
```SQL
586+
CREATE TABLE mysql_source (...) WITH (
587+
'connector' = 'mysql-cdc',
588+
'scan.startup.mode' = 'earliest-offset', -- Start from earliest offset
589+
'scan.startup.mode' = 'latest-offset', -- Start from latest offset
590+
'scan.startup.mode' = 'specific-offset', -- Start from specific offset
591+
'scan.startup.mode' = 'timestamp', -- Start from timestamp
592+
'scan.startup.specific-offset.file' = 'mysql-bin.000003', -- Binlog filename under specific offset startup mode
593+
'scan.startup.specific-offset.pos' = '4', -- Binlog position under specific offset mode
594+
'scan.startup.specific-offset.gtid-set' = '24DA167-0C0C-11E8-8442-00059A3C7B00:1-19', -- GTID set under specific offset startup mode
595+
'scan.startup.timestamp-millis' = '1667232000000' -- Timestamp under timestamp startup mode
596+
...
597+
)
598+
```
532599

533-
_Note: the mechanism of `scan.startup.mode` option relying on Debezium's `snapshot.mode` configuration. So please do not using them together. If you speicifying both `scan.startup.mode` and `debezium.snapshot.mode` options in the table DDL, it may make `scan.startup.mode` doesn't work._
600+
**Note:** MySQL source will print the current binlog position into logs with INFO level on checkpoint, with the prefix
601+
"Binlog offset on checkpoint {checkpoint-id}". It could be useful if you want to restart the job from a specific checkpointed position.
534602

535603
### DataStream Source
536604

0 commit comments

Comments
 (0)