Skip to content

planner: a new risk-aware index selection heuristic rule for IndexLookup in skyline pruning #63336

@qw4990

Description

@qw4990

Enhancement

Many wrong index selections are the following case:

create table t (a int, b int, c int, d int, e int, f int, key abcd(a, b, c, d), key dc(d, c));

mysql> explain select * from t use index(dc) where a=1 and d=1 and c>1;
+-------------------------------+---------+-----------+-------------------------+----------------------------------------------------+
| id                            | estRows | task      | access object           | operator info                                      |
+-------------------------------+---------+-----------+-------------------------+----------------------------------------------------+
| IndexLookUp_9                 | 1.00    | root      |                         |                                                    |
| ├─IndexRangeScan_6(Build)     | 5.00    | cop[tikv] | table:t, index:dc(d, c) | range:(1 1,1 +inf], keep order:false, stats:pseudo |
| └─Selection_8(Probe)          | 1.00    | cop[tikv] |                         | eq(test.t.a, 1)                                    |
|   └─TableRowIDScan_7          | 5.00    | cop[tikv] | table:t                 | keep order:false, stats:pseudo                     |
+-------------------------------+---------+-----------+-------------------------+----------------------------------------------------+

mysql> explain select * from t where a=1 and d=1 and c>1;
+-------------------------------+---------+-----------+---------------------------------+---------------------------------------------+
| id                            | estRows | task      | access object                   | operator info                               |
+-------------------------------+---------+-----------+---------------------------------+---------------------------------------------+
| IndexLookUp_9                 | 1.00    | root      |                                 |                                             |
| ├─Selection_8(Build)          | 1.00    | cop[tikv] |                                 | eq(test.t.d, 1), gt(test.t.c, 1)            |
| │ └─IndexRangeScan_6          | 10.00   | cop[tikv] | table:t, index:abcd(a, b, c, d) | range:[1,1], keep order:false, stats:pseudo |
| └─TableRowIDScan_7(Probe)     | 1.00    | cop[tikv] | table:t                         | keep order:false, stats:pseudo              |
+-------------------------------+---------+-----------+---------------------------------+---------------------------------------------+

In the case above, the optimizer is prone to select dc since its estimated rows is slightly smaller than abcd.
But from the perspective of risk, especially avoiding large number of cop-tasks, the plan using abcd seems lower-risky, since it contains all columns of dc, which means at least after index accessing / filtering, abcd has less rows to trigger cop-tasks.

So maybe we can take advantage of this attribute to introduce a new risk-aware heuristic rule in Skyline Pruning, which is like:

  1. introduce 2 variables like _delta;
  2. for 2 indexes like a and b,
  3. if a contains all predicated applied on b AND idx_access_rows(a) - idx_access_rows(b) <= _delta
  4. a defeats b in our Skyline pruning.

Below is a concrete case from one of our customer:

  1. the index above has a smaller estRows than the below index
  2. the below index contains all columns of the above index
Image

Metadata

Metadata

Labels

report/customerCustomers have encountered this bug.sig/plannerSIG: Plannertype/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions