-
Notifications
You must be signed in to change notification settings - Fork 6k
Description
Feature Request
Is your feature request related to a problem? Please describe:
After DDL on large 800GB table, statistics were cleared, leading to degraded performance cause by bad query plans (eg: full table scans).
The root cause is that after the Modify-Col DDL, the old statistics related to this modified column will no longer be used, and below is the 2 reason of this behavior:
The old statistics is collected based on the column's prior type so the prior stats-data is not compatible with the new column type, for example, after changing the column type from INT to VARCHAR(32), its prior stats-data based on INT can't be used to estimate for predicates like "col = 'xxx'", so we have to collect the stats again;
When modifying the column type, TiDB will create a new column (or an index) with a new colID (or idxID) to replace the old one, and TiDB uses the colID (or idxID) to access its statistics, so after the DDL, TiDB will not be able to find its prior statistics.
Describe the feature you'd like:
TiDB will collect new statistics just before it publishes the final schema version, so that after new schema goes live, stats are already collected and up to date. Something like this could work:
- DDL is started and goes through normal stages
- While DDL is in progress, application still sees original schema version and uses original statistics
- DDL is finished and new schema version is ready to be published
- Before publishing, collect statistics against the new version
- Now publish the new schema version, discard old stats and use new stats collected in 4.
- This means that there would be no bad query plans as we don’t need to wait for new stats, they were preemptively collected.
Describe alternatives you've considered:
It was suggested to use plan binding feature to prevent bad queries. Such workaround is very management heavy and not really feasible to find every query hitting the table and try to make a Plan Binding preemptively.
Teachability, Documentation, Adoption, Migration Strategy:
Currently large DDLs with MODIFY COLUMN are somewhat of a trap, as they have potential to cause cluster performance degradation.
This feature would prevent bad query plans with outdated stats being executed after large DDLs. Currently, such bad plans are be used until ANALYZE TABLE is run manually or it is picked up by auto analyze.