Skip to content

Enhance the region balance of the 1M tables imported by lightning #8424

@HuSharp

Description

@HuSharp

Development Task

Background

  • PD ensures that the table level's scatter, but it doesn't care about table-to-table, which relies on balance-scheduler.
  • Balance-Region will not schedule empty region, There is a hardcode in
    // isEmptyRegionAllowBalance returns true if the region is not empty or the number of regions is too small.
    func isEmptyRegionAllowBalance(cluster sche.SharedCluster, region *core.RegionInfo) bool {
    return region.GetApproximateSize() > core.EmptyRegionApproximateSize || cluster.GetTotalRegionCount() < core.InitClusterRegionThreshold
    }
  • When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247

Problems faced

For lightning importing 1 million tables(one table corresponds to one region), even though there are more than 3 stores, consecutive region keys will generate a lot of regions aggregations in the first 3 stores. And since regions are not scheduled, the three stores have a high probability of OOM.

img_v3_02cu_570929e1-9590-4330-8034-ed3a92434bfg

Metadata

Metadata

Assignees

Labels

type/developmentThe issue belongs to a development tasks

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions