Skip to content

kafka: Consider reintroducing retry (or at least make it configurable) #1920

@kennytm

Description

@kennytm

Is your feature request related to a problem?

Currently CDC will not retry when meeting a failure because of pingcap/tiflow#11935 = IBM/sarama#2619, because retrying would cause messages to be sent in wrong order. (ONCALL-7528 = TICKET-5339) There is also a data-loss bug related to retrying. (TCOC-3167 = ST-1473)

Retrying has been disabled since v9.0.0-beta (pingcap/tiflow#11870), v8.5.1, v8.1.3, v7.5.6, v7.1.7 and v6.5.12.

However, the current behavior (failing the sink immediately on error) leads to many bad side-effect that also affects user experience:

  1. Sink errors are recorded in metrics and leads to alerts. (GTOC-7542 = APID-11189, ST-1772, TICKET-5957)
  2. Hard-reset via sink-error is known to cause another sarama bug Kafka client panic: assignment to entry in nil map IBM/sarama#2681. (TCOC-4074 = ST-2129, ST-1482)

Describe the feature you'd like

In order of preference:

  1. Actually fix AsyncProducer produces messages in out-of-order when retries happen IBM/sarama#2619 and the related data-loss bug, and re-enable retrying
  2. Add a sink-URI option to unsafely enable retry.

Describe alternatives you've considered

No response

Teachability, Documentation, Adoption, Migration Strategy

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions