Skip to content

Conversation

eurekaka
Copy link
Contributor

@eurekaka eurekaka commented Sep 7, 2018

Derive superset of DNF predicates and push it to children
plan nodes as much as possible.

What problem does this PR solve?

fix #7628

This PR should have performance improvement for TPC-H Q7 and Q19 at least.

before this PR:

drop table if exists t1;
drop table if exists t2;

create table t1(id int primary key, a int, b int, c int);
create table t2(id int primary key, a int, b int, c int);

mysql> explain select * from t1 join t2 on t1.b = t2.b where (t1.a=1 and t2.a=1) or (t1.a=2 and t2.a=2);
+-----------------------+----------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                    | count    | task | operator info                                                                                                                                                       |
+-----------------------+----------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashLeftJoin_7        | 12500.00 | root | inner join, inner:TableReader_12, equal:[eq(test.t1.b, test.t2.b)], other cond:or(and(eq(test.t1.a, 1), eq(test.t2.a, 1)), and(eq(test.t1.a, 2), eq(test.t2.a, 2))) |
| ├─TableReader_10      | 10000.00 | root | data:TableScan_9                                                                                                                                                    |
| │ └─TableScan_9       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo                                                                                                         |
| └─TableReader_12      | 10000.00 | root | data:TableScan_11                                                                                                                                                   |
|   └─TableScan_11      | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo                                                                                                         |
+-----------------------+----------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+

no predicate is pushed down because the filter is in DNF and involves columns from different tables;

after this PR:

mysql> explain select * from t1 join t2 on t1.b = t2.b where (t1.a=1 and t2.a=1) or (t1.a=2 and t2.a=2);
+-------------------------+----------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                      | count    | task | operator info                                                                                                                                                       |
+-------------------------+----------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashLeftJoin_7          | 25.00    | root | inner join, inner:TableReader_14, equal:[eq(test.t1.b, test.t2.b)], other cond:or(and(eq(test.t1.a, 1), eq(test.t2.a, 1)), and(eq(test.t1.a, 2), eq(test.t2.a, 2))) |
| ├─TableReader_11        | 20.00    | root | data:Selection_10                                                                                                                                                   |
| │ └─Selection_10        | 20.00    | cop  | or(eq(test.t1.a, 1), eq(test.t1.a, 2))                                                                                                                              |
| │   └─TableScan_9       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo                                                                                                         |
| └─TableReader_14        | 20.00    | root | data:Selection_13                                                                                                                                                   |
|   └─Selection_13        | 20.00    | cop  | or(eq(test.t2.a, 1), eq(test.t2.a, 2))                                                                                                                              |
|     └─TableScan_12      | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo                                                                                                         |
+-------------------------+----------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+

new filters in Selection_13 and Selection_10 is derived from the original filters and pushed down.

What is changed and how it works?

rewrite original filter (t1.a=1 and t2.a=1) or (t1.a=2 and t2.a=2) into ((t1.a=1 and t2.a=1) or (t1.a=2 and t2.a=2)) AND (t1.a=1 or t1.a=2) AND (t2.a=1 or t2.a=2), and the newly derived CNF items can be pushed down.

Outer join is supported as well.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change

Side effects

  • Increased code complexity

Related changes

  • Need to be included in the release note: enhance predicate pushdown over join operator

@eurekaka eurekaka added type/enhancement The issue or PR belongs to an enhancement. status/WIP sig/planner SIG: Planner labels Sep 7, 2018
@eurekaka
Copy link
Contributor Author

eurekaka commented Sep 9, 2018

/run-all-tests

@eurekaka
Copy link
Contributor Author

/run-all-tests

@eurekaka
Copy link
Contributor Author

/run-all-tests

@eurekaka
Copy link
Contributor Author

@zz-jason @winoros PTAL

otherCond = append(otherCond, expr)
}
}
return
}

// deriveOtherConditions given a LogicalJoin, check the OtherConditions to see if we can derive more
// conditions for left/right child pushdown.
func deriveOtherConditions(p *LogicalJoin, deriveLeft bool, deriveRight bool) (leftCond []expression.Expression,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this function to "plan/rule_predicate_push_down.go"?

if ExprFromSchema(cnfItem, schema) {
newCNFItems = append(newCNFItems, cnfItem)
}
// If simple cnfItem cannot be fully covered by schema, just drop this CNF item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this comment together with line 400?

@winoros
Copy link
Member

winoros commented Sep 12, 2018

Have we tested this branch's performance on TPCH Q7 and Q8?

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka
Copy link
Contributor Author

@winoros not yet, will provide result later.

Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

}
}

func (s *testPlanSuite) TestOuterWherePredicatePushDown(c *C) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use SimpleRewriter so don't need to build actual plan in test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we need to test the predicates are correctly derived and pushed to specific child in PredicatePushDown as well, so I prefer building logical plan.

@winoros
Copy link
Member

winoros commented Sep 13, 2018

I come up with the question what will there be performance regression when the pushed one's selectivity is quite big?
Since this transformation won't remove the original filter so the filter will be calculated multiple times.

Copy link
Contributor Author

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is possible. @zz-jason any thoughts regarding this concern?

}
}

func (s *testPlanSuite) TestOuterWherePredicatePushDown(c *C) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we need to test the predicates are correctly derived and pushed to specific child in PredicatePushDown as well, so I prefer building logical plan.

@zz-jason
Copy link
Member

An ideal solution is to calculate the selectivity of the to-be-pushed predicates, decide whether to push that predicate according to the calculated selectivity. But this selectivity might not be so accurate.

It seems that the adaptive execution plan adopted in oracle optimizer is more suitable. They just constructed two plans, one has the pushed predicates while the other doesn't. And using the statistics collector to record the number of rows produced by the child operator during the execution phase. Deciding whether to use the pushed predicates according to the number of rows observed by the statistics collector.

@zz-jason zz-jason added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 14, 2018
@winoros winoros added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 17, 2018
@winoros winoros merged commit 506dcce into pingcap:master Sep 17, 2018
@eurekaka eurekaka deleted the ppd_join_dnf branch September 17, 2018 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enhance the predicate push down on join operator
3 participants