-
Notifications
You must be signed in to change notification settings - Fork 6k
br: enable parallel restore #58724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
br: enable parallel restore #58724
Conversation
Skipping CI for Draft Pull Request. |
Hi @Tristan1900. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #58724 +/- ##
================================================
+ Coverage 73.1938% 74.9514% +1.7576%
================================================
Files 1726 1751 +25
Lines 479561 490105 +10544
================================================
+ Hits 351009 367341 +16332
+ Misses 107030 99725 -7305
- Partials 21522 23039 +1517
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
c089404
to
13e1419
Compare
d6875ff
to
d33d597
Compare
/retest |
@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
973c93a
to
2854159
Compare
/retest |
@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 14 out of 21 changed files in this pull request and generated no comments.
Files not reviewed (7)
- br/pkg/registry/BUILD.bazel: Language not supported
- br/pkg/restore/BUILD.bazel: Language not supported
- br/pkg/task/BUILD.bazel: Language not supported
- br/tests/br_parallel_restore/run.sh: Language not supported
- br/tests/br_pitr/run.sh: Language not supported
- br/tests/br_restore_checkpoint/run.sh: Language not supported
- br/tests/run_group_br_tests.sh: Language not supported
Comments suppressed due to low confidence (1)
br/pkg/utils/schema.go:34
- The comment contains a typo ('wheterh'); it should be corrected to 'whether'.
// IsTemplateSysDB checks wheterh the dbname is temporary system database(__TiDB_BR_Temporary_mysql or __TiDB_BR_Temporary_sys).
br/pkg/registry/registration.go
Outdated
filter_strings(255), | ||
start_ts, | ||
restored_ts, | ||
upstream_cluster_id, | ||
with_sys_table, | ||
cmd(255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cmd
column, maybe it is better to use enum type?
for filter_strings
, I think 255 is not long enough for type Text.
ERROR 1062 (23000): Duplicate entry '1234567890123456789012345678901234567890123456789012345678901234' for key 't1.unique_registration_params'
Do you actually want KEY
instead of UNIQUE KEY
?
br/pkg/registry/registration.go
Outdated
AND upstream_cluster_id = %%? | ||
AND with_sys_table = %%? | ||
AND cmd = %%? | ||
ORDER BY id DESC LIMIT 1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need limit 1
because there must be only one row in the final result.
br/pkg/registry/registration.go
Outdated
} | ||
// task exists but was either created by another process or has been running | ||
// check if it's in running state | ||
runningRows, _, err := execCtx.ExecRestrictedSQL( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to select again? runningRows[0]
must be the same as rows[0]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes you are right
if hasCheckpointPersisted(c, cfg) { | ||
log.Info("pausing restore task from registry", | ||
zap.Uint64("restoreId", cfg.RestoreID), zap.Error(restoreError)) | ||
if err := restoreRegistry.PauseTask(c, cfg.RestoreID); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if BR is killed because OOM? BR might not have chance to pause the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, let me think about that
br/pkg/registry/registration.go
Outdated
return 0, errors.Annotatef(err, "failed to create new registration") | ||
} | ||
|
||
// check if a row with our parameters exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe after introduce a txn (begin ... commit), BR does not need to check again.
cfed383
to
9d64c1d
Compare
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
/retest |
@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
pkg/session/bootstrap.go
Outdated
with_sys_table BOOLEAN NOT NULL DEFAULT TRUE, | ||
status VARCHAR(20) NOT NULL DEFAULT 'running', | ||
cmd TEXT, | ||
start_timestamp BIGINT UNSIGNED NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a user perspective I find the column naming confusing.
First, it is not clear how this is different from start_ts
referenced above. Can you clarifying the naming here? If they are used for different things, is the a better name that can be used to make it more obvious? Or is this just intended to offer different levels of precision (if so why the same data type)?
Second, the suffix of the column name _timestamp
implies this is aTIMESTAMP
while the type is specified as BIGINT
. This is confusing and inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for taking a look!
yes you are absolutely right the name is confusing, let me rename the latter one.
d189487
to
b27fd04
Compare
Signed-off-by: Wenqi Mou <[email protected]>
b27fd04
to
026aa8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expression/executor part lgtm
Signed-off-by: Wenqi Mou <[email protected]>
[LGTM Timeline notifier]Timeline:
|
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benmeadowcroft, D3Hunter, Leavrth, windtalker, winoros The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
for planner part |
What problem does this PR solve?
Issue Number: close #58725
Problem Summary:
What changed and how does it work?
tide_restore_registry
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.