-
Notifications
You must be signed in to change notification settings - Fork 6k
planner: avoid exceeding the configured concurrency limit (#61786) #61813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: avoid exceeding the configured concurrency limit (#61786) #61813
Conversation
Signed-off-by: ti-chi-bot <[email protected]>
@hawkingrei This PR has conflicts, I have hold it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix the conflict
Signed-off-by: Weizhen Wang <[email protected]>
/unhold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AilinKid, hawkingrei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## release-7.5 #61813 +/- ##
================================================
Coverage ? 72.2023%
================================================
Files ? 1417
Lines ? 414294
Branches ? 0
================================================
Hits ? 299130
Misses ? 95172
Partials ? 19992
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
/retest |
This is an automated cherry-pick of #61786
What problem does this PR solve?
Issue Number: close #61785
Problem Summary:
The issue is that customers have observed higher I/O consumption when the analyze operation reaches the index, compared to when it analyzes regular tables. (The analyze status contains sensitive information, so it will not be included here.)
The root cause of the issue lies in improper coding practices. When we perform the analyze operation, we create multiple concurrent tasks to execute it. However, within these concurrently spawned goroutines, we further create additional concurrency. This nested concurrency results in an actual level of parallelism that is significantly higher than we anticipated.
You will see that it will create two task about
analyze ndv for index
.the problem is here.
The first creation of concurrency
tidb/pkg/executor/analyze.go
Lines 121 to 126 in 8fc1430
The second creation of concurrency
AnalyzeExec.analyzeWorker -> analyzeColumnsPushDownEntry -> analyzeColumnsPushDownV2
https://github.com/pingcap/tidb/blob/master/pkg/executor/analyze_col_v2.go#L105-L107
The third creation of concurrency
tidb/pkg/executor/analyze_col_v2.go
Lines 461 to 466 in 8fc1430
This part is actually the most dangerous. It allows the concurrency of handleNDVForSpecialIndexes and the concurrency of column collection to coexist, which increases the business risk.
What changed and how does it work?
1、Wait until
handleNDVForSpecialIndexes
is completed before proceeding with the statistics collection for columns.2、To prevent modifying the build stats concurrency, which could result in an exponential relationship in the actual number of concurrent tasks, we set the concurrency here to be the same as the build sampling concurrency.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.