@@ -8,7 +8,6 @@ redirects:
8
8
- /pgd/latest/backup/ # generated for DOCS-1247-PGD-6.0-Docs
9
9
---
10
10
11
-
12
11
PGD is designed to be a distributed, highly available system. If
13
12
one or more nodes of a cluster are lost, the best way to replace them
14
13
is to clone new nodes directly from the remaining nodes.
@@ -21,12 +20,73 @@ recovery (DR), such as in the following situations:
21
20
as a result of data corruption, application error, or
22
21
security breach
23
22
24
- ## Backup
25
-
26
- ### pg_dump
23
+ ## Logical backup and restore
27
24
28
25
You can use pg_dump, sometimes referred to as * logical backup* ,
29
- normally with PGD.
26
+ normally with PGD. But in order to reduce the risk of global lock
27
+ timeouts, we recommend dumping pre-data, data, and post-data
28
+ separately. For example:
29
+
30
+ ``` console
31
+ pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump
32
+ pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump
33
+ pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump
34
+ ```
35
+
36
+ And restore by directly executing these SQL files:
37
+
38
+ ``` console
39
+ pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump
40
+ pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump
41
+ psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'
42
+ pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump
43
+ psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'
44
+ ```
45
+
46
+ After which point the dump will be restored on all nodes in the cluster.
47
+
48
+ In contrast if you do not split sections out with a naive pg_dump and
49
+ pg_restore, the restore will likely fail with a global lock timeout.
50
+
51
+ You should also temporarily set the following settings in ` postgresql.conf ` :
52
+
53
+ ```
54
+ # Increase from the default of `1GB` to something large, but still a
55
+ # fraction of your disk space since the non-WAL data must also fit.
56
+ # This decreases the frequency of checkpoints.
57
+ max_wal_size = 100GB
58
+
59
+ # Increase the number of writers to make better use of parallel
60
+ # apply. Default is 2. Make sure this isn't overriden lower by the
61
+ # node group config num_writers setting.
62
+ bdr.writers_per_subscription = 5
63
+
64
+ # Increase the amount of memory for building indexes. Default is
65
+ # 64MB. For example, 1GB assuming 128GB total RAM.
66
+ maintenance_work_mem = 1GB
67
+
68
+ # Increase the receiver and sender timeout from 1 minute to 1hr to
69
+ # allow large transactions through.
70
+ wal_receiver_timeout = 1h
71
+ wal_sender_timeout = 1h
72
+ ```
73
+
74
+ Additionally:
75
+
76
+ - Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed.
77
+ - Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general.
78
+
79
+ And if you continue to get global lock timeouts during initial load,
80
+ temporarily set ` bdr.ddl_locking = off ` for the initial load.
81
+
82
+ ### Prefer restoring to a single node
83
+
84
+ Especially when initially setting up a cluster from a Postgres dump,
85
+ we recommend you restore to a cluster with a single PGD node. Then run
86
+ ` pgd node setup ` for each node you want in the cluster which will do a
87
+ physical join that uses ` bdr_init_physical ` under the hood.
88
+
89
+ ### Sequences
30
90
31
91
pg_dump dumps both local and global sequences as if
32
92
they were local sequences. This behavior is intentional, to allow a PGD
@@ -51,7 +111,7 @@ dump only with `bdr.crdt_raw_value = on`.
51
111
Technical Support recommends the use of physical backup techniques for
52
112
backup and recovery of PGD.
53
113
54
- ### Physical backup
114
+ ## Physical backup and restore
55
115
56
116
You can take physical backups of a node in an EDB Postgres Distributed cluster using
57
117
standard PostgreSQL software, such as
@@ -82,7 +142,92 @@ PostgreSQL backup techniques to PGD:
82
142
local data and a backup of at least one node that subscribes to each
83
143
replication set.
84
144
85
- ### Eventual consistency
145
+ ### Restore
146
+
147
+ While you can take a physical backup with the same procedure as a
148
+ standard PostgreSQL node, it's slightly more complex to
149
+ restore the physical backup of a PGD node.
150
+
151
+ #### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup
152
+
153
+ The most common use case for restoring a physical backup involves the failure
154
+ or replacement of all the PGD nodes in a cluster, for instance in the event of
155
+ a data center failure.
156
+
157
+ You might also want to perform this procedure to clone the current contents of a
158
+ EDB Postgres Distributed cluster to seed a QA or development instance.
159
+
160
+ In that case, you can restore PGD capabilities based on a physical backup
161
+ of a single PGD node, optionally plus WAL archives:
162
+
163
+ - If you still have some PGD nodes live and running, fence off the host you
164
+ restored the PGD node to, so it can't connect to any surviving PGD nodes.
165
+ This practice ensures that the new node doesn't confuse the existing cluster.
166
+ - Restore a single PostgreSQL node from a physical backup of one of
167
+ the PGD nodes.
168
+ - If you have WAL archives associated with the backup, create a suitable
169
+ ` postgresql.conf ` , and start PostgreSQL in recovery to replay up to the latest
170
+ state. You can specify an alternative ` recovery_target ` here if needed.
171
+ - Start the restored node, or promote it to read/write if it was in standby
172
+ recovery. Keep it fenced from any surviving nodes!
173
+ - Clean up any leftover PGD metadata that was included in the physical backup.
174
+ - Fully stop and restart the PostgreSQL instance.
175
+ - Add further PGD nodes with the standard procedure based on the
176
+ ` bdr.join_node_group() ` function call.
177
+
178
+ #### Cleanup of PGD metadata
179
+
180
+ To clean up leftover PGD metadata:
181
+
182
+ 1 . Drop the PGD node using [ ` bdr.drop_node ` ] ( /pgd/6.1/reference/tables-views-functions/functions-internal#bdrdrop_node ) .
183
+ 2 . Fully stop and restart PostgreSQL (important!).
184
+
185
+ #### Cleanup of replication origins
186
+
187
+ You must explicitly remove replication origins with a separate step
188
+ because they're recorded persistently in a system catalog. They're
189
+ therefore included in the backup and in the restored instance. They
190
+ aren't removed automatically when dropping the BDR extension because
191
+ they aren't explicitly recorded as its dependencies.
192
+
193
+ To track progress of incoming replication in a crash-safe way,
194
+ PGD creates one replication origin for each remote master node. Therefore,
195
+ for each node in the previous cluster run this once:
196
+
197
+ ```
198
+ SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename');
199
+ ```
200
+
201
+ You can list replication origins as follows:
202
+
203
+ ```
204
+ SELECT * FROM pg_replication_origin;
205
+ ```
206
+
207
+ Those created by PGD are easily recognized by their name.
208
+
209
+ #### Cleanup of replication slots
210
+
211
+ If a physical backup was created with ` pg_basebackup ` , replication slots
212
+ are omitted from the backup.
213
+
214
+ Some other backup methods might preserve replications slots, likely in
215
+ outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots:
216
+
217
+ ```
218
+ SELECT pg_drop_replication_slot(slot_name)
219
+ FROM pg_replication_slots;
220
+ ```
221
+
222
+ If you have a reason to preserve some slots,
223
+ you can add a ` WHERE slot_name LIKE 'bdr%' ` clause, but this is rarely
224
+ useful.
225
+
226
+ !!! Warning
227
+ Never use these commands to drop replication slots on a live PGD node
228
+
229
+
230
+ ## Eventual consistency
86
231
87
232
The nodes in an EDB Postgres Distributed cluster are * eventually consistent* but not
88
233
* entirely consistent* . A physical backup of a given node provides
@@ -199,89 +344,4 @@ of changes arriving from a single master in COMMIT order.
199
344
200
345
!!! Note
201
346
This feature is available only with EDB Postgres Extended.
202
- Barman doesn't create a ` multi_recovery.conf ` file.
203
-
204
- ## Restore
205
-
206
- While you can take a physical backup with the same procedure as a
207
- standard PostgreSQL node, it's slightly more complex to
208
- restore the physical backup of a PGD node.
209
-
210
- ### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup
211
-
212
- The most common use case for restoring a physical backup involves the failure
213
- or replacement of all the PGD nodes in a cluster, for instance in the event of
214
- a data center failure.
215
-
216
- You might also want to perform this procedure to clone the current contents of a
217
- EDB Postgres Distributed cluster to seed a QA or development instance.
218
-
219
- In that case, you can restore PGD capabilities based on a physical backup
220
- of a single PGD node, optionally plus WAL archives:
221
-
222
- - If you still have some PGD nodes live and running, fence off the host you
223
- restored the PGD node to, so it can't connect to any surviving PGD nodes.
224
- This practice ensures that the new node doesn't confuse the existing cluster.
225
- - Restore a single PostgreSQL node from a physical backup of one of
226
- the PGD nodes.
227
- - If you have WAL archives associated with the backup, create a suitable
228
- ` postgresql.conf ` , and start PostgreSQL in recovery to replay up to the latest
229
- state. You can specify an alternative ` recovery_target ` here if needed.
230
- - Start the restored node, or promote it to read/write if it was in standby
231
- recovery. Keep it fenced from any surviving nodes!
232
- - Clean up any leftover PGD metadata that was included in the physical backup.
233
- - Fully stop and restart the PostgreSQL instance.
234
- - Add further PGD nodes with the standard procedure based on the
235
- ` bdr.join_node_group() ` function call.
236
-
237
- #### Cleanup of PGD metadata
238
-
239
- To clean up leftover PGD metadata:
240
-
241
- 1 . Drop the PGD node using [ ` bdr.drop_node ` ] ( /pgd/latest/reference/tables-views-functions/functions-internal#bdrdrop_node ) .
242
- 2 . Fully stop and restart PostgreSQL (important!).
243
-
244
- #### Cleanup of replication origins
245
-
246
- You must explicitly remove replication origins with a separate step
247
- because they're recorded persistently in a system catalog. They're
248
- therefore included in the backup and in the restored instance. They
249
- aren't removed automatically when dropping the BDR extension because
250
- they aren't explicitly recorded as its dependencies.
251
-
252
- To track progress of incoming replication in a crash-safe way,
253
- PGD creates one replication origin for each remote master node. Therefore,
254
- for each node in the previous cluster run this once:
255
-
256
- ```
257
- SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename');
258
- ```
259
-
260
- You can list replication origins as follows:
261
-
262
- ```
263
- SELECT * FROM pg_replication_origin;
264
- ```
265
-
266
- Those created by PGD are easily recognized by their name.
267
-
268
- #### Cleanup of replication slots
269
-
270
- If a physical backup was created with ` pg_basebackup ` , replication slots
271
- are omitted from the backup.
272
-
273
- Some other backup methods might preserve replications slots, likely in
274
- outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots:
275
-
276
- ```
277
- SELECT pg_drop_replication_slot(slot_name)
278
- FROM pg_replication_slots;
279
- ```
280
-
281
- If you have a reason to preserve some slots,
282
- you can add a ` WHERE slot_name LIKE 'bdr%' ` clause, but this is rarely
283
- useful.
284
-
285
- !!! Warning
286
- Never use these commands to drop replication slots on a live PGD node
287
-
347
+ Barman doesn't create a ` multi_recovery.conf ` file.
0 commit comments