Skip to content

Lightning: Performance Regression on 6.2.0 Compared with 5.3.3 on Parquet Data Source with Strings #38351

@dsdashun

Description

@dsdashun

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Import a moderate data set into different versions of Lightning: 1) 6.2.0; 2) 5.3.3
The reproduce conditions:

  • The data format of the source is parquet
  • The data set to import contains some string columns
  • The table schema should set the encoding to 'binary'

2. What did you expect to see? (Required)

  • It takes longer time to finish the import on 6.2.0 than in 5.3.3
  • From the log, we can see the encode KV time is much longer in 6.2.0, the import KV time is roughly the same as in 5.3.3

3. What did you see instead (Required)

The import time on newer version of Lightning should be roughly the same as the older version

4. What is your TiDB version? (Required)

6.2.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions