-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Is your feature request related to a problem?
Currently CDC will not retry when meeting a failure because of pingcap/tiflow#11935 = IBM/sarama#2619, because retrying would cause messages to be sent in wrong order. (ONCALL-7528 = TICKET-5339) There is also a data-loss bug related to retrying. (TCOC-3167 = ST-1473)
Retrying has been disabled since v9.0.0-beta (pingcap/tiflow#11870), v8.5.1, v8.1.3, v7.5.6, v7.1.7 and v6.5.12.
However, the current behavior (failing the sink immediately on error) leads to many bad side-effect that also affects user experience:
- Sink errors are recorded in metrics and leads to alerts. (GTOC-7542 = APID-11189, ST-1772, TICKET-5957)
- Hard-reset via sink-error is known to cause another sarama bug Kafka client panic: assignment to entry in nil map IBM/sarama#2681. (TCOC-4074 = ST-2129, ST-1482)
Describe the feature you'd like
In order of preference:
- Actually fix AsyncProducer produces messages in out-of-order when retries happen IBM/sarama#2619 and the related data-loss bug, and re-enable retrying
- Add a sink-URI option to unsafely enable retry.
Describe alternatives you've considered
No response
Teachability, Documentation, Adoption, Migration Strategy
No response
Metadata
Metadata
Assignees
Labels
No labels