scheduler: fix bugs to make scheduler more robust #1941

hongyunyan · 2025-09-05T03:35:59Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

gemini-code-assist

Summary of Changes

Hello @hongyunyan, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the scheduler's robustness by refining its operational logic and incorporating stability checks, particularly for span merging and balancing. It also introduces a new, comprehensive integration test suite to validate these scheduler improvements under various DML scenarios, alongside minor logging and test utility enhancements.

Highlights

Scheduler Robustness Improvements: Introduced a configurable maxLagThreshold for span merging logic and added a stability check in the balanceSplitsScheduler to skip balancing operations when the system is not in a stable state (e.g., during active operations or with absent spans).
Logging Enhancement: Improved the clarity of logging for new task group additions in the replication scheduler by explicitly logging the groupID as an int64.
Test Utility Enhancement: Modified the query_dispatcher_count utility script to support a new comparisonMode parameter, enabling more flexible assertions (e.g., 'less than or equal') in integration tests.
New Integration Test Suite: Added a comprehensive integration test (ddl_for_split_tables_random_schedule) to thoroughly validate the scheduler's behavior with both split and non-split tables under random DML workloads across multiple TiCDC servers.
Test Failpoint Refactoring: Removed the StopSplitScheduler failpoint from several existing integration tests, streamlining test configurations and reflecting potential improvements in scheduler stability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request improves the robustness of the scheduler by adding checks to prevent scheduling during unstable states and by replacing magic numbers with constants. It also enhances logging for better observability. A significant part of this PR is the addition of a new integration test for random scheduling scenarios, which improves test coverage. The changes in existing tests to remove failpoints suggest that the scheduler is now more robust and can be tested in a more realistic environment. I've found a potential issue in the updated test utility script that could cause it to fail if it receives a non-integer value.

gemini-code-assist · 2025-09-05T03:37:48Z

tests/integration_tests/_utils/query_dispatcher_count

-		if [ "$value" == "$target" ]; then
-			exit 0
+		if [ "$comparisonMode" == "le" ]; then
+			if [ "$value" -le "$target" ]; then


The arithmetic comparison [ "$value" -le "$target" ] will cause the script to fail if $value is not an integer (e.g., if it's null from the jq command). The previous string comparison == was safer in this regard. To make this more robust, you should check if $value is an integer before performing the arithmetic comparison.

Suggested change

if [ "$value" -le "$target" ]; then

if [[ "$value" =~ ^[0-9]+$ ]] && [ "$value" -le "$target" ]; then

ti-chi-bot · 2025-09-05T03:41:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lidezhu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [lidezhu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2025-09-05T03:41:40Z

[LGTM Timeline notifier]

Timeline:

2025-09-05 03:41:39.97845629 +0000 UTC m=+151243.961553738: ☑️ agreed by lidezhu.

ti-chi-bot · 2025-09-05T03:54:09Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

hongyunyan · 2025-09-05T05:39:43Z

/retest

hongyunyan added 2 commits September 5, 2025 11:07

add test

6d0691e

update

b00f848

ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 5, 2025

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

update

51cdc9e

lidezhu approved these changes Sep 5, 2025

View reviewed changes

ti-chi-bot bot added the lgtm label Sep 5, 2025

ti-chi-bot bot added the approved label Sep 5, 2025

hongyunyan removed the do-not-merge/needs-linked-issue label Sep 5, 2025

fix fmt

285f95f

ti-chi-bot bot added the do-not-merge/needs-linked-issue label Sep 5, 2025

hongyunyan removed the do-not-merge/needs-linked-issue label Sep 5, 2025

ti-chi-bot bot merged commit 233b851 into pingcap:master Sep 5, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scheduler: fix bugs to make scheduler more robust #1941

scheduler: fix bugs to make scheduler more robust #1941

Uh oh!

hongyunyan commented Sep 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

ti-chi-bot bot commented Sep 5, 2025

Uh oh!

ti-chi-bot bot commented Sep 5, 2025

Uh oh!

ti-chi-bot bot commented Sep 5, 2025

Uh oh!

hongyunyan commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

	if [ "$value" -le "$target" ]; then
	if [[ "$value" =~ ^[0-9]+$ ]] && [ "$value" -le "$target" ]; then

scheduler: fix bugs to make scheduler more robust #1941

scheduler: fix bugs to make scheduler more robust #1941

Uh oh!

Conversation

hongyunyan commented Sep 5, 2025

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Sep 5, 2025

Uh oh!

ti-chi-bot bot commented Sep 5, 2025

[LGTM Timeline notifier]

Uh oh!

ti-chi-bot bot commented Sep 5, 2025

Uh oh!

hongyunyan commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!