-
Notifications
You must be signed in to change notification settings - Fork 6k
Description
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
The issue is that customers have observed higher I/O consumption when the analyze operation reaches the index, compared to when it analyzes regular tables. (The analyze status contains sensitive information, so it will not be included here.)
The root cause of the issue lies in improper coding practices. When we perform the analyze operation, we create multiple concurrent tasks to execute it. However, within these concurrently spawned goroutines, we further create additional concurrency. This nested concurrency results in an actual level of parallelism that is significantly higher than we anticipated.
CREATE TABLE `test` (
`c1` binary(16) NOT NULL,
`c2` tinyint(1) NOT NULL DEFAULT '0',
`c3` int NOT NULL,
`c4` varchar(48) COLLATE utf8mb4_general_ci NOT NULL,
`c5` varchar(512) COLLATE utf8mb4_general_ci DEFAULT NULL,
`c6` enum('A','B','C') COLLATE utf8mb4_general_ci DEFAULT NULL,
`c7` int unsigned NOT NULL DEFAULT '0',
`c8` int unsigned NOT NULL DEFAULT '0',
`c9` tinyint(1) GENERATED ALWAYS AS (`c7` > 0) VIRTUAL NOT NULL,
`c10` int DEFAULT NULL,
`c11` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
`c12` datetime(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
PRIMARY KEY (`c1`) /*T![clustered_index] CLUSTERED */,
KEY `idx_c4_c2_c9_c3_c12_c5_c6` (`c4`,`c2`,`c9`,`c3`,`c12`,`c5`,`c6`),
KEY `idx_c4_c2_c9_c12_c5_c6` (`c4`,`c2`,`c9`,`c12`,`c5`,`c6`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
analyze table chat_session all columns ;
show analyze status
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| Table_schema | Table_name | Partition_name | Job_info | Processed_rows | Start_time | End_time | State | Fail_reason | Instance | Process_ID | Remaining_seconds | Progress | Estimated_total_rows |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
| test | test | | analyze ndv for index idx_c4_c2_c9_c12_c5_c6 | 0 | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null> | 127.0.0.1:4000 | <null> | <null> | <null> | <null> |
| test | test | | analyze ndv for index idx_c4_c2_c9_c3_c12_c5_c6 | 0 | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null> | 127.0.0.1:4000 | <null> | <null> | <null> | <null> |
| test | test | | analyze table all indexes, columns c1, c2, c3, c4, c5, c6, c7, c9, c12 with 256 buckets, 100 topn, 1 samplerate | 0 | 2025-06-18 14:48:05 | 2025-06-18 14:48:05 | finished | <null> | 127.0.0.1:4000 | <null> | <null> | <null> | <null> |
+--------------+------------+----------------+-----------------------------------------------------------------------------------------------------------------+----------------+---------------------+---------------------+----------+-------------+----------------+------------+-------------------+----------+----------------------+
You will see that it will create two task about analyze ndv for index
.
the problem is here.
The first creation of concurrency
Lines 121 to 126 in 8fc1430
// Start workers with channel to collect results. | |
taskCh := make(chan *analyzeTask, buildStatsConcurrency) | |
resultsCh := make(chan *statistics.AnalyzeResults, 1) | |
for range buildStatsConcurrency { | |
e.wg.Run(func() { e.analyzeWorker(taskCh, resultsCh) }) | |
} |
The second creation of concurrency
AnalyzeExec.analyzeWorker -> analyzeColumnsPushDownEntry -> analyzeColumnsPushDownV2
https://github.com/pingcap/tidb/blob/master/pkg/executor/analyze_col_v2.go#L105-L107
The third creation of concurrency
tidb/pkg/executor/analyze_col_v2.go
Lines 461 to 466 in 8fc1430
var subIndexWorkerWg = NewAnalyzeResultsNotifyWaitGroupWrapper(resultsCh) | |
subIndexWorkerWg.Add(statsConcurrncy) | |
for range statsConcurrncy { | |
subIndexWorkerWg.Run(func() { e.subIndexWorkerForNDV(taskCh, resultsCh) }) | |
} | |
for _, task := range tasks { |