-
Notifications
You must be signed in to change notification settings - Fork 742
Open
Labels
type/developmentThe issue belongs to a development tasksThe issue belongs to a development tasks
Description
Development Task
Background
- PD ensures that the table level's scatter, but it doesn't care about table-to-table, which relies on
balance-scheduler
. Balance-Region
will not schedule empty region, There is a hardcode in
pd/pkg/schedule/filter/region_filters.go
Lines 150 to 153 in 7e18a69
// isEmptyRegionAllowBalance returns true if the region is not empty or the number of regions is too small. func isEmptyRegionAllowBalance(cluster sche.SharedCluster, region *core.RegionInfo) bool { return region.GetApproximateSize() > core.EmptyRegionApproximateSize || cluster.GetTotalRegionCount() < core.InitClusterRegionThreshold } - When lightning imports tables, the table's region key is encoded in the table ID, while table IDs are created consecutively(basically next to each other). https://github.com/tikv/client-go/blob/6ba909c4ad2de65b5b36d0e5036d0a85f3154cc0/tikv/split_region.go#L241-L247
Problems faced
For lightning importing 1 million tables(one table corresponds to one region), even though there are more than 3 stores, consecutive region keys will generate a lot of regions aggregations in the first 3 stores. And since regions are not scheduled, the three stores have a high probability of OOM.
Metadata
Metadata
Assignees
Labels
type/developmentThe issue belongs to a development tasksThe issue belongs to a development tasks