Skip to content

Fail to scan regions when using Follower Handle #60085

@3pointer

Description

@3pointer

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

During a restore from a checkpoint in a large-scale cluster(3M regions), I observed that BR continuously reported the following error:

[2025/03/14 05:58:30.092 +00:00] [WARN] [split.go:254] ["failed to scan region, retrying"] [error="region 15872724's endKey not equal to next region 15872732's startKey, endKey: 

However, upon checking with pd-ctl, I found that the next region after 15872724 was actually 15872728, and there was no specific region role.

This issue started after merging pingcap/tidb#59783, which introduced the use of followerHandle for pd client scanning regions.

After disabling the followerHandle option, the error no longer occurred.

This suggests that the root cause is likely related to inconsistencies in the region tree synchronization between the PD leader and follower. It appears that the follower’s region information is not fully up-to-date, causing discrepancies during the region scanning process. This synchronization issue needs to be addressed.

2. What did you expect to see? (Required)

No error

3. What did you see instead (Required)

keep scan keep fail

4. What is your TiDB version? (Required)

master

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions