Skip to content

Recover time may be very long if TiKV enables hibernate region #34906

@sticnarf

Description

@sticnarf

Enhancement

Considering one TiKV is unhealthy, all requests sending to that store will timeout. But due to hibernate region, the regions whose leaders are on the unhealthy store will not be re-elected until a request is sent to the follower.

If there are thousands of such regions, every time we are accessing a new region, it will send a request to the original leader on the unhealthy TiKV because the leader definitely does not change. And then, the user will always experience very long request duration until all regions are touched and elect new leaders.

We can use the health check service of TiKV to get the serving status of TiKV. By filtering out healthy nodes, we can recover from TiKV failure more quickly.

Metadata

Metadata

Assignees

Labels

affects-5.3This bug affects 5.3.x versions.affects-5.4This bug affects the 5.4.x(LTS) versions.affects-6.1This bug affects the 6.1.x(LTS) versions.sig/transactionSIG:Transactiontype/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions