-
Notifications
You must be signed in to change notification settings - Fork 6k
plan: enhance predicate pushdown over join #7645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Derive superset of DNF predicates and push it to children plan nodes as much as possible.
whether we should derive
refactor PredicatePushDown function
45ef80c
to
7e1f616
Compare
/run-all-tests |
/run-all-tests |
/run-all-tests |
plan/logical_plan_builder.go
Outdated
otherCond = append(otherCond, expr) | ||
} | ||
} | ||
return | ||
} | ||
|
||
// deriveOtherConditions given a LogicalJoin, check the OtherConditions to see if we can derive more | ||
// conditions for left/right child pushdown. | ||
func deriveOtherConditions(p *LogicalJoin, deriveLeft bool, deriveRight bool) (leftCond []expression.Expression, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this function to "plan/rule_predicate_push_down.go"?
expression/util.go
Outdated
if ExprFromSchema(cnfItem, schema) { | ||
newCNFItems = append(newCNFItems, cnfItem) | ||
} | ||
// If simple cnfItem cannot be fully covered by schema, just drop this CNF item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this comment together with line 400?
Have we tested this branch's performance on TPCH Q7 and Q8? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@winoros not yet, will provide result later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
} | ||
} | ||
|
||
func (s *testPlanSuite) TestOuterWherePredicatePushDown(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use SimpleRewriter
so don't need to build actual plan in test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we need to test the predicates are correctly derived and pushed to specific child in PredicatePushDown
as well, so I prefer building logical plan.
I come up with the question what will there be performance regression when the pushed one's selectivity is quite big? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is possible. @zz-jason any thoughts regarding this concern?
} | ||
} | ||
|
||
func (s *testPlanSuite) TestOuterWherePredicatePushDown(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we need to test the predicates are correctly derived and pushed to specific child in PredicatePushDown
as well, so I prefer building logical plan.
An ideal solution is to calculate the selectivity of the to-be-pushed predicates, decide whether to push that predicate according to the calculated selectivity. But this selectivity might not be so accurate. It seems that the adaptive execution plan adopted in oracle optimizer is more suitable. They just constructed two plans, one has the pushed predicates while the other doesn't. And using the statistics collector to record the number of rows produced by the child operator during the execution phase. Deciding whether to use the pushed predicates according to the number of rows observed by the statistics collector. |
Derive superset of DNF predicates and push it to children
plan nodes as much as possible.
What problem does this PR solve?
fix #7628
This PR should have performance improvement for TPC-H Q7 and Q19 at least.
before this PR:
no predicate is pushed down because the filter is in DNF and involves columns from different tables;
after this PR:
new filters in
Selection_13
andSelection_10
is derived from the original filters and pushed down.What is changed and how it works?
rewrite original filter
(t1.a=1 and t2.a=1) or (t1.a=2 and t2.a=2)
into((t1.a=1 and t2.a=1) or (t1.a=2 and t2.a=2)) AND (t1.a=1 or t1.a=2) AND (t2.a=1 or t2.a=2)
, and the newly derived CNF items can be pushed down.Outer join is supported as well.
Check List
Tests
Code changes
Side effects
Related changes
enhance predicate pushdown over join operator