Skip to content

lightning might stuck for hours when importing parquet files from cloud storage #56104

@D3Hunter

Description

@D3Hunter

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

after #46984, including >= v7.5.x, 7.1.3+, we will sample parquet all the time in serial, it's very slow and might takes hours if user have large mount of parquet files before lightning start doing import, and the time takes to sample the files might even longer than real import work.

we only need this size for displaying progress more accurately and use it as a reference when splitting engine, but slowing import this much is un-acceptable.

2. What did you expect to see? (Required)

start import fast

3. What did you see instead (Required)

it might takes hours before start doing any work

4. What is your TiDB version? (Required)

Metadata

Metadata

Assignees

Labels

affects-6.1This bug affects the 6.1.x(LTS) versions.affects-6.5This bug affects the 6.5.x(LTS) versions.affects-7.1This bug affects the 7.1.x(LTS) versions.affects-7.5This bug affects the 7.5.x(LTS) versions.affects-8.1This bug affects the 8.1.x(LTS) versions.affects-8.5This bug affects the 8.5.x(LTS) versions.component/lightningThis issue is related to Lightning of TiDB.severity/majortype/bugThe issue is confirmed as a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions