-
Notifications
You must be signed in to change notification settings - Fork 129
Description
Dear Team,
I hope this email finds you well.
I am experiencing an ongoing issue with PeerDB mirror replication in our PostgreSQL cluster environment, which operates with automatic failover capabilities. Despite having implemented keepalive services and a proxy that automatically redirects connections to the new primary node, the PeerDB service fails to resume replication following a failover event.
The UI mirror logs are displaying the following errors:
- ERROR: ReceiveMessage failed: receive message failed: unexpected EOF
- ERROR: failed to push records: failed to write records to S3: failed to upload file: operation error S3: PutObject, context canceled
Additionally, the flow-worker logs contain the following critical errors:
- flow-worker | {"time": "level":"ERROR","msg":"failed to sync records","Namespace":"default","TaskQueue":"peer-flow-task-queue","error":"replication slot peerflow_slot_mirror does not exist, restarting workflow (type: disconnect, retryable: false)"}
- flow-worker | {"time":"level":"ERROR","msg":"Activity error.","Namespace":"default","TaskQueue":"peer-flow-task-queue","Error":"replication slot peerflow_slot_mirror does not exist, restarting workflow (type: disconnect, retryable: false)"}
The error logs indicate that the replication slot "peerflow_slot_mirror" does not exist on the new primary node after failover, which prevents the workflow from restarting properly. It appears that PeerDB is not automatically recreating the replication slot to continue synchronization from the previous point.
Could you please provide guidance on resolving this replication issue and ensuring proper failover handling with automatic replication slot management?
Best regards,
Mohamed Shahin