planner: avoid exceeding the configured concurrency limit (#61786) #61815

ti-chi-bot · 2025-06-18T15:07:38Z

This is an automated cherry-pick of #61786

What problem does this PR solve?

Issue Number: close #61785

Problem Summary:

The issue is that customers have observed higher I/O consumption when the analyze operation reaches the index, compared to when it analyzes regular tables. (The analyze status contains sensitive information, so it will not be included here.)

The root cause of the issue lies in improper coding practices. When we perform the analyze operation, we create multiple concurrent tasks to execute it. However, within these concurrently spawned goroutines, we further create additional concurrency. This nested concurrency results in an actual level of parallelism that is significantly higher than we anticipated.

CREATE TABLE `test` (
  `c1` binary(16) NOT NULL,
  `c2` tinyint(1) NOT NULL DEFAULT '0',
  `c3` int NOT NULL,
  `c4` varchar(48) COLLATE utf8mb4_general_ci NOT NULL,
  `c5` varchar(512) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c6` enum('A','B','C') COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c7` int unsigned NOT NULL DEFAULT '0',
  `c8` int unsigned NOT NULL DEFAULT '0',
  `c9` tinyint(1) GENERATED ALWAYS AS (`c7` > 0) VIRTUAL NOT NULL,
  `c10` int DEFAULT NULL,
  `c11` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  `c12` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  PRIMARY KEY (`c1`) /*T![clustered_index] CLUSTERED */,
  KEY `idx_c4_c2_c9_c3_c12_c5_c6` (`c4`,`c2`,`c9`,`c3`,`c12`,`c5`,`c6`),
  KEY `idx_c4_c2_c9_c12_c5_c6` (`c4`,`c2`,`c9`,`c12`,`c5`,`c6`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

analyze table chat_session all columns ;

show analyze status

+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| Table_schema | Table_name | Partition_name | Job_info                                                                                                        | Processed_rows | Start_time          | End_time            | State    | Fail_reason | Instance       | Process_ID | Remaining_seconds | Progress | Estimated_total_rows |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| test         | test       |                | analyze ndv for index idx_c4_c2_c9_c12_c5_c6                                                                    | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
| test         | test       |                | analyze ndv for index idx_c4_c2_c9_c3_c12_c5_c6                                                                 | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
| test         | test       |                | analyze table all indexes, columns c1, c2, c3, c4, c5, c6, c7, c9, c12 with 256 buckets, 100 topn, 1 samplerate | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+

You will see that it will create two task about analyze ndv for index.

the problem is here.

The first creation of concurrency

tidb/pkg/executor/analyze.go

Lines 121 to 126 in 8fc1430

    
           // Start workers with channel to collect results. 
        
           taskCh := make(chan *analyzeTask, buildStatsConcurrency) 
        
           resultsCh := make(chan *statistics.AnalyzeResults, 1) 
        
           for range buildStatsConcurrency { 
        
           	e.wg.Run(func() { e.analyzeWorker(taskCh, resultsCh) }) 
        
           }

The second creation of concurrency

AnalyzeExec.analyzeWorker -> analyzeColumnsPushDownEntry -> analyzeColumnsPushDownV2

https://github.com/pingcap/tidb/blob/master/pkg/executor/analyze_col_v2.go#L105-L107

The third creation of concurrency

tidb/pkg/executor/analyze_col_v2.go

Lines 461 to 466 in 8fc1430

    
           var subIndexWorkerWg = NewAnalyzeResultsNotifyWaitGroupWrapper(resultsCh) 
        
           subIndexWorkerWg.Add(statsConcurrncy) 
        
           for range statsConcurrncy { 
        
           	subIndexWorkerWg.Run(func() { e.subIndexWorkerForNDV(taskCh, resultsCh) }) 
        
           } 
        
           for _, task := range tasks {

This part is actually the most dangerous. It allows the concurrency of handleNDVForSpecialIndexes and the concurrency of column collection to coexist, which increases the business risk.

What changed and how does it work?

1、Wait untilhandleNDVForSpecialIndexesis completed before proceeding with the statistics collection for columns.

2、To prevent modifying the build stats concurrency, which could result in an exponential relationship in the actual number of concurrent tasks, we set the concurrency here to be the same as the build sampling concurrency.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot · 2025-06-18T15:07:42Z

@hawkingrei This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

hawkingrei · 2025-06-19T15:15:39Z

/unhold

Signed-off-by: Weizhen Wang <[email protected]>

codecov · 2025-06-19T15:31:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-8.5@e601725). Learn more about missing BASE report.

Additional details and impacted files

@@               Coverage Diff                @@
##             release-8.5     #61815   +/-   ##
================================================
  Coverage               ?   57.1718%           
================================================
  Files                  ?       1771           
  Lines                  ?     629443           
  Branches               ?          0           
================================================
  Hits                   ?     359864           
  Misses                 ?     245537           
  Partials               ?      24042

Flag	Coverage Δ
integration	`37.1152% <100.0000%> (?)`
unit	`72.8333% <100.0000%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.9278% <0.0000%> (?)`
parser	`∅ <0.0000%> (?)`
br	`52.5103% <0.0000%> (?)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

hawkingrei · 2025-06-20T03:34:55Z

/retest

hawkingrei · 2025-06-23T07:17:15Z

/retest

hawkingrei · 2025-06-23T15:01:43Z

/retest

ti-chi-bot · 2025-06-24T16:34:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AilinKid, hawkingrei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [AilinKid,hawkingrei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2025-06-24T16:34:42Z

[LGTM Timeline notifier]

Timeline:

2025-06-19 07:59:46.959952311 +0000 UTC m=+345039.683131292: ☑️ agreed by hawkingrei.
2025-06-24 16:34:41.917270008 +0000 UTC m=+807934.640448990: ☑️ agreed by AilinKid.

hawkingrei · 2025-06-25T02:58:49Z

/retest

hawkingrei · 2025-06-25T04:12:29Z

/retest

hawkingrei · 2025-06-25T07:19:11Z

/retest

hawkingrei · 2025-06-25T09:48:24Z

/retest

This is an automated cherry-pick of pingcap#61786

1c1b7cc

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot mentioned this pull request Jun 18, 2025

planner: avoid exceeding the configured concurrency limit #61786

Merged

13 tasks

ti-chi-bot bot added the do-not-merge/cherry-pick-not-approved label Jun 18, 2025

ti-chi-bot assigned hawkingrei Jun 18, 2025

hawkingrei approved these changes Jun 19, 2025

View reviewed changes

ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Jun 19, 2025

ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2025

update

70d9360

Signed-off-by: Weizhen Wang <[email protected]>

AilinKid approved these changes Jun 24, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 24, 2025

ti-chi-bot bot merged commit 32bcbe8 into pingcap:release-8.5 Jun 25, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

planner: avoid exceeding the configured concurrency limit (#61786) #61815

planner: avoid exceeding the configured concurrency limit (#61786) #61815

ti-chi-bot commented Jun 18, 2025

Uh oh!

ti-chi-bot commented Jun 18, 2025

Uh oh!

hawkingrei commented Jun 19, 2025

Uh oh!

codecov bot commented Jun 19, 2025 •

edited

Loading

Uh oh!

hawkingrei commented Jun 20, 2025

Uh oh!

hawkingrei commented Jun 23, 2025

Uh oh!

hawkingrei commented Jun 23, 2025

Uh oh!

ti-chi-bot bot commented Jun 24, 2025

Uh oh!

ti-chi-bot bot commented Jun 24, 2025

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

	// Start workers with channel to collect results.
	taskCh := make(chan *analyzeTask, buildStatsConcurrency)
	resultsCh := make(chan *statistics.AnalyzeResults, 1)
	for range buildStatsConcurrency {
	e.wg.Run(func() { e.analyzeWorker(taskCh, resultsCh) })
	}

	var subIndexWorkerWg = NewAnalyzeResultsNotifyWaitGroupWrapper(resultsCh)
	subIndexWorkerWg.Add(statsConcurrncy)
	for range statsConcurrncy {
	subIndexWorkerWg.Run(func() { e.subIndexWorkerForNDV(taskCh, resultsCh) })
	}
	for _, task := range tasks {

planner: avoid exceeding the configured concurrency limit (#61786) #61815

planner: avoid exceeding the configured concurrency limit (#61786) #61815

Conversation

ti-chi-bot commented Jun 18, 2025

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

ti-chi-bot commented Jun 18, 2025

Uh oh!

hawkingrei commented Jun 19, 2025

Uh oh!

codecov bot commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hawkingrei commented Jun 20, 2025

Uh oh!

hawkingrei commented Jun 23, 2025

Uh oh!

hawkingrei commented Jun 23, 2025

Uh oh!

ti-chi-bot bot commented Jun 24, 2025

Uh oh!

ti-chi-bot bot commented Jun 24, 2025

[LGTM Timeline notifier]

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

hawkingrei commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 19, 2025 •

edited

Loading