-
Notifications
You must be signed in to change notification settings - Fork 6k
Open
Labels
type/enhancementThe issue or PR belongs to an enhancement.The issue or PR belongs to an enhancement.
Description
Enhancement
We have listed a lot of items that we want to refactor, to make the structure more clear and code easier to maintain and extend, and some optimize tasks to make general DDL run faster. And in the last 8.2 sprint, we optimized for general DDL, and done some refactor for job scheduling and have left some TODOs during the refactor, #53246.
In 8.3 sprint, we have the same goal as in 8.2 and we also want to enhance testibility of DDL component, by decoupling different part of DDL(TODO), and replacing ddl.Hook with simplier failpoint.
Decouple components of DDL
Currently, ddl/ddlCtx is quite large and has a lot of responsibility, including:
- handle job submission which come from sql executor, this is the biggest part
- This part shares the same interface as schema-tracker, maybe extract this interface as something called DDLExecutor.
- Handle local ddl job execution for fast creating tables. It runs on every node
- owner election, start job scheduler
- Enable/disable some functions, such as whether this node can attend owner election, and fast-create
- It contains a lot of methods related to job execution, such as writePhysicalTableRecord
- Manages fields which are shared with other components, such as below. They only use part of their function, and there are no intersections. Maybe we can create those fields on each side or separate the function.
- SchemaSyncer, or should be called schema-version-syncer, shared with domain
- StateSyncer, shared with SyncUpgrade api handling.
- Manages fields that should be part of job scheduler: reorg-ctx/job-ctx/ddlSeqNum/waitSchemaSyncedController
- Some fields are used for unit tests: hook/interceptor inside ddlCtx. We can replace them with failpoints.
- Some util methods which should be moved out: GetTableMaxHandle
Tasks
- Optimize job execution
- remove this markJobProcessing completely, we have move job dependency calculation in memory, and as long as the job resides in the table, we should try run it, so this fields is useless now. see ddl: schedule as many jobs as possible in one round, simplify sql to query jobs #54438
- merge loadDDLJobAndRun by job type into one(we have do it in this ddl: consider paused job when check runnable #54419), and remove the sub-query in getJobSQL, and handle this case ddl: support referring objects in runningJobs #54110 (comment) . And schedule as many jobs as possible in one round. see ddl: schedule as many jobs as possible in one round, simplify sql to query jobs #54438
- infoschema: use maps.Clone to replace iterating #54581
- Make job id allocation and insert job run in one transaction, and test QPS of it when run in parallel. If it outperforms job execution, we can re-add the reverted optimization that query from the min job id.
- ddl: integrate fast create table into normal general DDL workflow #55025
- [later]Use RPC to notify job done when DDL job is submitted on non-owner node.
- optimize global schema version allocation
- Decouple components of DDL
- Move waitSchemaSyncedController / reorgCtx / jobCtx / ddlSeqNum / schemaVersionManager to job scheduler from ddlCtx
- ddl: decouple job seq number from job history & reset its allocator on owner change #54774
- ddl: move structs related to scheduler out from ddlCtx & add job context #55411
- ddl: abstract job submitter & move some fields from ddlCtx to jobContext #55461
- meta: separate reader and mutator #56376
- ddl: make meta mutator part of jobContext #56399
- ddl: move insert job to table to job submitter #56542
- Separate DDLExecutor interface out of DDL interface
-
separate local job execution out: we have integrate fast-create into general DDL, see ddl: integrate fast create table into normal general DDL workflow #55025 - Moving methods related to job execution out ddl: decouple executor part out from
ddl
#54858 -
Avoid exposing SchemaSyncer and StateSyncer from DDL - Replace fields only used for test with failpoint: such as hook/interceptor inside ddlCtx
- Move GetTableMaxHandle out ddl: move structs related to scheduler out from ddlCtx & add job context #55411
- ddl/domain: disallow set schema lease to 0 #55312
- ddl: restructure schema version and server state syncer #55368
- ddl: alloc global id together with job id #55552
- Move waitSchemaSyncedController / reorgCtx / jobCtx / ddlSeqNum / schemaVersionManager to job scheduler from ddlCtx
- Refactor for reorg jobs (TODO, only list a few)
- make all global variables like ingest.LitBackCtxMgr local to job scheduler
- ReorgCtx is stored at ddlCtx, and we are waiting for the reorg routine repeatedly by entering/exit the ddl worker. If after the check due to waitTimeout, owner changes, we might have no chance to clean it up. We better wait for the routine inside runReorgJob
- Remove dependency on lightning config, we should use local.BackendConfig directly. ddl: directly use BackendConfig rather than use lightning config #55433
- ddl: replace local ingest impl with backfill operators #54149
- ddl/ingest: move backend ctx out of checkpoint manager #54292
- ddl: load table ranges from PD instead of region cache #54598
- ddl/ingest: refactor checkpoint manager #54747
- ddl: remove unused
copReqSenderPool
and related structures #55302
- improve the usability of DDL Job.Args #53930
- pass CtxVars through job args
- refine the context and context-like struct usage in DDL #56398
- others
AmoebaProtozoa
Metadata
Metadata
Assignees
Labels
type/enhancementThe issue or PR belongs to an enhancement.The issue or PR belongs to an enhancement.