Skip to content

Conversation

zurakutsia
Copy link
Contributor

@zurakutsia zurakutsia commented Aug 8, 2025

What problem does this PR solve?

Issue Number: close #12277

Fixes TiCDC cloud storage sink flapping (start/stop spam) on Azure Blob caused by premature context cancellation during reads.

Symptoms included:

  • “Failed to read data from azure blob, data info: pos='0', count='1': context canceled”
  • “failed to generate data file path”
  • “dead dmlSink”

The regression seems to be introduced by 0e6782b71. Switching to GetExternalStorageWithDefaultTimeout wrapped Open with a timeout and canceled it immediately, breaking subsequent Read() calls that reuse the Open() context.

What is changed and how it works?

Before: Open wrapped the passed context with context.WithTimeout and deferred cancel(). Since many backends bind the reader’s lifetime to the Open context, the deferred cancel immediately invalidated the reader’s future Read() calls, causing “context canceled” errors.

Now: Open passes the caller's context through without wrapping or canceling it. This prevents premature cancellation while keeping existing timeouts for other APIs. extStorageWithTimeout.Open returns a readerWithCancel wrapper struct which calls cancel() on Close()

Tests

  • Unit tests

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Aug 8, 2025
Copy link
Contributor

ti-chi-bot bot commented Aug 8, 2025

Hi @zurakutsia. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

ti-chi-bot bot commented Aug 8, 2025

Welcome @zurakutsia!

It looks like this is your first PR to pingcap/tiflow 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to pingcap/tiflow. 😃

Copy link

pingcap-cla-assistant bot commented Aug 8, 2025

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 8, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @zurakutsia, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've addressed a critical bug in the TiCDC cloud storage sink, specifically impacting Azure Blob storage, which was causing frequent "context cancelled" errors and leading to a "stuck sink" state. The core issue stemmed from an incorrect context cancellation mechanism within the Open method of extStorageWithTimeout, where the context was prematurely canceled, invalidating subsequent read operations. My fix refactors this by ensuring the context associated with the reader is only canceled when the reader itself is explicitly closed, preventing these disruptive errors and stabilizing the sink.

Highlights

  • Bug Fix for Cloud Storage Sink: Resolved an issue where the TiCDC cloud storage sink, particularly on Azure Blob, experienced "context cancelled" errors and "stuck sink" behavior due to premature context cancellation during file Open operations.
  • Context Handling Refinement: Modified the extStorageWithTimeout.Open method to no longer immediately cancel the context. Instead, a new readerWithCancel wrapper is introduced, which defers the context cancellation until the reader's Close() method is invoked, ensuring the reader's lifetime is respected.
  • Enhanced Test Coverage: Added new unit tests (TestExtStorageOpenDoesNotCancelReaderContext and TestExtStorageOpenReaderRespectsCallerCancel) to specifically validate the corrected context behavior, ensuring the reader's context remains valid for its duration and properly responds to external cancellations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a critical bug causing context cancellation errors by ensuring the context for a storage reader remains valid throughout its lifecycle. The approach of wrapping the reader to manage the context's cancellation in the Close method is sound. The accompanying tests are well-designed and validate the fix effectively. I have one suggestion to enhance the robustness of the new readerWithCancel.Close() method.

@wk989898
Copy link
Collaborator

/ok-to-test

@ti-chi-bot ti-chi-bot bot added ok-to-test Indicates a PR is ready to be tested. and removed needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Aug 11, 2025
@wk989898
Copy link
Collaborator

Can you create an issue to describe what you encountered and how to reproduce this bug? @zurakutsia

@hongyunyan
Copy link
Collaborator

/retest

2 similar comments
@hongyunyan
Copy link
Collaborator

/retest

@OliverS929
Copy link
Contributor

/retest

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Aug 12, 2025
Copy link
Contributor

ti-chi-bot bot commented Aug 12, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asddongmen, hongyunyan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [asddongmen,hongyunyan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Aug 12, 2025
Copy link
Contributor

ti-chi-bot bot commented Aug 12, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-08-12 10:00:17.033807939 +0000 UTC m=+348016.068391167: ☑️ agreed by hongyunyan.
  • 2025-08-12 10:16:22.005500243 +0000 UTC m=+348981.040083470: ☑️ agreed by asddongmen.

@OliverS929
Copy link
Contributor

/retest

@OliverS929
Copy link
Contributor

DM_DropAddColumn This case is a bit flaky. The dm-master restart took slightly longer, so the retry loop did not catch the expected conflict.

@OliverS929
Copy link
Contributor

/retest

1 similar comment
@OliverS929
Copy link
Contributor

/retest

@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. labels Aug 13, 2025
@hongyunyan
Copy link
Collaborator

/merge

Copy link
Contributor

ti-chi-bot bot commented Aug 13, 2025

@hongyunyan: We have migrated to builtin LGTM and approve plugins for reviewing.

👉 Please use /approve when you want to approve this pull request.

The changes announcement: LGTM plugin changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@hongyunyan
Copy link
Collaborator

/run-check-issue-triage-complete

@purelind
Copy link
Collaborator

/check-issue-triage-complete

@ti-chi-bot ti-chi-bot bot merged commit 06161f5 into pingcap:master Aug 13, 2025
26 checks passed
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Aug 13, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #12280.
But this PR has conflicts, please resolve them!

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #12281.
But this PR has conflicts, please resolve them!

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #12282.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #12283.

@hongyunyan
Copy link
Collaborator

/cherrypick release-7.5-20250617-v7.5.6

@ti-chi-bot
Copy link
Member

@hongyunyan: new pull request created to branch release-7.5-20250617-v7.5.6: #12285.

In response to this:

/cherrypick release-7.5-20250617-v7.5.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved contribution This PR is from a community contributor. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. ok-to-test Indicates a PR is ready to be tested. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changefeed meet errors and stuck when sink is azure
7 participants