-
Notifications
You must be signed in to change notification settings - Fork 6k
tables: always append temp index values for unique index #60340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @tangenta. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #60340 +/- ##
================================================
+ Coverage 73.1301% 75.3565% +2.2264%
================================================
Files 1710 1761 +51
Lines 472671 497731 +25060
================================================
+ Hits 345665 375073 +29408
+ Misses 105751 100039 -5712
- Partials 21255 22619 +1364
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
pkg/table/tables/index.go
Outdated
// For temp index keys, we can't get the temp value from memory buffer, even if the lazy check is enabled. | ||
// Otherwise, it may cause the temp index value to be overwritten, leading to data inconsistency. | ||
value, err = txn.Get(ctx, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 'key' here tempKey or not?
If it is, txn, get (ctx, tempKey) is a duplicated operation.
It may be confusing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, if keyIsTempIdxKey
is true, then key is actually the temporary index key (and tempKey
is nil), otherwise key
is the original new index and tempKey
the temporary index key (for the same entry).
So we need to check both the original (currently backfilling/merging) key as well as the temporary index key, to see if the entry/key already exists.
@tangenta could we do these checks lazily/GetLocal()
, and adding kv.SetPresumeKeyNotExists
to key
/tempKey
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think we cannot utilize "needPresumeKeyExists" because using this constraint means that we already know where the previous record is. Without geting both original index and temp index, we don't know if there is a duplicate one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reviewing this mostly so I can learn something myself :)
So I have some questions, also about surrounding code.
I guess that my only concern is if we are adding txn.Get()
calls, which may cause duplicate calls with the same key in some cases? Or will this only touch the local membuffer, so impact is neglectible?
@@ -292,6 +292,7 @@ func (c *index) create(sctx table.MutateContext, txn kv.Transaction, indexedValu | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the use of temporary index, I assume it is only used during ingest/fast_ddl/lightning way of creating indexes, so the original new index id would be written by the backfill and the original new index id + TempIndexPrefix
would be used by concurrent DML, until the the backfill is done and change the indexInfo.State
from model.BackfillStateRunning
to model.BackfillStateReadyToMerge
or even model.BackfillStateMerging
.
The first question would then be at line 206, if the index state is in StateDeleteOnly
/keyVer == tablecodec.TempIndexKeyTypeDelete
, why would we even continue with the create()
and not just skip it, since I assume it should not Set any index entries? (That would happen in (*index) Delete()
right?
Second question at line 258, can a keyIsTmpIdxKey
/tmpKey
be untouched? Should we not always create/write keys to temp index or are they actually checked before the call?
I'm curious if this could lead to missing an index entry from concurrent DML during backfill, when the backfill already read the row.
And at line 273, during merge, the key needs to be double written, both to the original new index, as well as the temporary index, since we cannot know if that specific key has already been merged yet or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we even continue with the create() and not just skip it
I remember there is a case could cause correctness issue but I forget the details. Let me file another PR to see if CI can pass.
can a keyIsTmpIdxKey/tmpKey be untouched?
No, untouched key-values only exist in membuffer, they are not persistent. Untouched key-values are only used for read request like union scan, but temp index is not public.
pkg/table/tables/index.go
Outdated
@@ -366,7 +376,7 @@ func (c *index) create(sctx table.MutateContext, txn kv.Transaction, indexedValu | |||
} | |||
if len(tempKey) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add a new comment on line 357, that needPresumeKeyNotExistsFlag
will check either the temp index key (if used) or the key, by a Get()
?
Also why is it really needing the c.tblInfo.ID
, which is already encoded in both key
and tempKey
?
@tangenta does this mean that we may do the Get()
twice now for the same key with the addition of Get()
on line 326 and 329?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I remove needPresumeKeyNotExistsFlag
. For temp keys, we only call Get()
once.
pkg/table/tables/index.go
Outdated
// For temp index keys, we can't get the temp value from memory buffer, even if the lazy check is enabled. | ||
// Otherwise, it may cause the temp index value to be overwritten, leading to data inconsistency. | ||
value, err = txn.Get(ctx, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, if keyIsTempIdxKey
is true, then key is actually the temporary index key (and tempKey
is nil), otherwise key
is the original new index and tempKey
the temporary index key (for the same entry).
So we need to check both the original (currently backfilling/merging) key as well as the temporary index key, to see if the entry/key already exists.
@tangenta could we do these checks lazily/GetLocal()
, and adding kv.SetPresumeKeyNotExists
to key
/tempKey
?
/retest |
/retest |
@tangenta: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
assert.NoError(t, err) | ||
} | ||
|
||
require.NoError(t, failpoint.Enable("github.com/pingcap/tidb/pkg/ddl/skipReorgWorkForTempIndex", "return(false)")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to use testfailpoint
pkg instead of the error checks + cleanup for failpoint
pkg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to make it easier to cherry-pick :)
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mjonss, wjhuang2016 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
/retest |
@mjonss: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest |
@tangenta: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest |
@tangenta: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-lightning-integration-test |
@joccau: The specified target(s) for
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: tangenta <[email protected]>
What problem does this PR solve?
Issue Number: close #60339
Problem Summary: See #60339 (comment)
What changed and how does it work?
Always append temp index values during DML execution for unique indexes.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.