Skip to content

Conversation

ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #61786

What problem does this PR solve?

Issue Number: close #61785

Problem Summary:

The issue is that customers have observed higher I/O consumption when the analyze operation reaches the index, compared to when it analyzes regular tables. (The analyze status contains sensitive information, so it will not be included here.)

Image

The root cause of the issue lies in improper coding practices. When we perform the analyze operation, we create multiple concurrent tasks to execute it. However, within these concurrently spawned goroutines, we further create additional concurrency. This nested concurrency results in an actual level of parallelism that is significantly higher than we anticipated.

CREATE TABLE `test` (
  `c1` binary(16) NOT NULL,
  `c2` tinyint(1) NOT NULL DEFAULT '0',
  `c3` int NOT NULL,
  `c4` varchar(48) COLLATE utf8mb4_general_ci NOT NULL,
  `c5` varchar(512) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c6` enum('A','B','C') COLLATE utf8mb4_general_ci DEFAULT NULL,
  `c7` int unsigned NOT NULL DEFAULT '0',
  `c8` int unsigned NOT NULL DEFAULT '0',
  `c9` tinyint(1) GENERATED ALWAYS AS (`c7` > 0) VIRTUAL NOT NULL,
  `c10` int DEFAULT NULL,
  `c11` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  `c12` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  PRIMARY KEY (`c1`) /*T![clustered_index] CLUSTERED */,
  KEY `idx_c4_c2_c9_c3_c12_c5_c6` (`c4`,`c2`,`c9`,`c3`,`c12`,`c5`,`c6`),
  KEY `idx_c4_c2_c9_c12_c5_c6` (`c4`,`c2`,`c9`,`c12`,`c5`,`c6`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

analyze table chat_session all columns ;

show analyze status

+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| Table_schema | Table_name | Partition_name | Job_info                                                                                                        | Processed_rows | Start_time          | End_time            | State    | Fail_reason | Instance       | Process_ID | Remaining_seconds | Progress | Estimated_total_rows |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| test         | test       |                | analyze ndv for index idx_c4_c2_c9_c12_c5_c6                                                                    | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
| test         | test       |                | analyze ndv for index idx_c4_c2_c9_c3_c12_c5_c6                                                                 | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
| test         | test       |                | analyze table all indexes, columns c1, c2, c3, c4, c5, c6, c7, c9, c12 with 256 buckets, 100 topn, 1 samplerate | 0              | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null>      | 127.0.0.1:4000 | <null>     | <null>            | <null>   | <null>               |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+

You will see that it will create two task about analyze ndv for index.

the problem is here.

The first creation of concurrency

// Start workers with channel to collect results.
taskCh := make(chan *analyzeTask, buildStatsConcurrency)
resultsCh := make(chan *statistics.AnalyzeResults, 1)
for range buildStatsConcurrency {
e.wg.Run(func() { e.analyzeWorker(taskCh, resultsCh) })
}

The second creation of concurrency

AnalyzeExec.analyzeWorker -> analyzeColumnsPushDownEntry -> analyzeColumnsPushDownV2

https://github.com/pingcap/tidb/blob/master/pkg/executor/analyze_col_v2.go#L105-L107

The third creation of concurrency

var subIndexWorkerWg = NewAnalyzeResultsNotifyWaitGroupWrapper(resultsCh)
subIndexWorkerWg.Add(statsConcurrncy)
for range statsConcurrncy {
subIndexWorkerWg.Run(func() { e.subIndexWorkerForNDV(taskCh, resultsCh) })
}
for _, task := range tasks {

This part is actually the most dangerous. It allows the concurrency of handleNDVForSpecialIndexes and the concurrency of column collection to coexist, which increases the business risk.

What changed and how does it work?

1、Wait untilhandleNDVForSpecialIndexesis completed before proceeding with the statistics collection for columns.

2、To prevent modifying the build stats concurrency, which could result in an exponential relationship in the actual number of concurrent tasks, we set the concurrency here to be the same as the build sampling concurrency.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Jun 18, 2025
@ti-chi-bot
Copy link
Member Author

@hawkingrei This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Jun 19, 2025
@hawkingrei
Copy link
Member

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2025
Signed-off-by: Weizhen Wang <[email protected]>
Copy link

codecov bot commented Jun 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (release-8.5@e601725). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff                @@
##             release-8.5     #61815   +/-   ##
================================================
  Coverage               ?   57.1718%           
================================================
  Files                  ?       1771           
  Lines                  ?     629443           
  Branches               ?          0           
================================================
  Hits                   ?     359864           
  Misses                 ?     245537           
  Partials               ?      24042           
Flag Coverage Δ
integration 37.1152% <100.0000%> (?)
unit 72.8333% <100.0000%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9278% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 52.5103% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hawkingrei
Copy link
Member

/retest

2 similar comments
@hawkingrei
Copy link
Member

/retest

@hawkingrei
Copy link
Member

/retest

Copy link

ti-chi-bot bot commented Jun 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AilinKid, hawkingrei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [AilinKid,hawkingrei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jun 24, 2025
Copy link

ti-chi-bot bot commented Jun 24, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-06-19 07:59:46.959952311 +0000 UTC m=+345039.683131292: ☑️ agreed by hawkingrei.
  • 2025-06-24 16:34:41.917270008 +0000 UTC m=+807934.640448990: ☑️ agreed by AilinKid.

@hawkingrei
Copy link
Member

/retest

3 similar comments
@hawkingrei
Copy link
Member

/retest

@hawkingrei
Copy link
Member

/retest

@hawkingrei
Copy link
Member

/retest

@ti-chi-bot ti-chi-bot bot merged commit 32bcbe8 into pingcap:release-8.5 Jun 25, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants