Skip to content

Schema version update failure may cause DDL job to get stuck #43755

@pcqz

Description

@pcqz

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Import data using tidb-lightning while some tidb server is under high load. DDL task is found to be stuck and there are some warnings.

[2023/05/11 01:25:54.712 +08:00] [WARN] [util.go:299] ["[ddl] etcd-cli put kv failed"] [key=/tidb/ddl/all_schema_by_job_versions/130222/f296efb9-ed7c-4d77-b740-0692a2592685] [value=173784] [error="context deadline exceeded"] [retryCnt=0]
[2023/05/11 01:25:54.748 +08:00] [WARN] [domain.go:734] ["update self version failed"] [error="context deadline exceeded"]

As seen in the DDL owner log, the above tidb server can't sync the schema version for many hours, and therefore unable to get the mdl lock. Check the logs and find no running transactions block DDL.

[2023/05/11 01:25:57.307 +08:00] [INFO] [syncer.go:333] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip xx.xx.xx.xx, port 4000, id f296efb9-ed7c-4d77-b740-0692a2592685"] ["ddl id"=130222] [ver=173784]
......
[2023/05/11 02:17:46.351 +08:00] [INFO] [syncer.go:333] ["[ddl] syncer check all versions, someone is not synced"] [info="instance ip xx.xx.xx.xx, port 4000, id f296efb9-ed7c-4d77-b740-0692a2592685"] ["ddl id"=130222] [ver=173784]

2. What did you expect to see? (Required)

DDL executes normally when there is no running transaction.

3. What did you see instead (Required)

DDL execution is stuck.

4. What is your TiDB version? (Required)

v6.5.2

Metadata

Metadata

Assignees

Labels

affects-6.5This bug affects the 6.5.x(LTS) versions.affects-7.1This bug affects the 7.1.x(LTS) versions.severity/majorsig/sql-infraSIG: SQL Infratype/bugThe issue is confirmed as a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions