-
Notifications
You must be signed in to change notification settings - Fork 243
Description
For certain unexpected cases where data issues have been confirmed, more on-site information should be collected and printed in the logs to facilitate further investigation.
For example once this log is printed
"2PC failed commit key after primary key committed"
Other situtaions like
- The commit is returned successfully by async commit protocol, but the async commit process encunter errors like
TxnLockNotFound
- The primary key is committed and commit is returned successfully, but the async secondary key commit process encunter errors like
TxnLockNotFound
- In GC batch resolve locks, the expired locks could not be resolved properly, encounterring errors like
unexpected resolve err: commit_ts_expired:<start_ts:446066598787153941 attempted_commit_ts:446066599141310467
, the attempingcommit_ts
is smaller than themin_commit_ts
of the key
It means the transaction coordinator encounters an unexpected failure when committing the secondary keys AFTER the primary key has been successfully committed (at which point the transaction is already considered successfully committed), the atomicity of the transaction can no longer be guaranteed.
In such cases, more detailed MVCC information about the failure commit key should be printed to help diagnosis, and then it could be known whether it is rolled back by some conflict transactions(with start_ts
), if so it means there is critical bug in the transaction protocol. Or there is data lost and the key is lost.
Another task is to refine the logs printed, including:
- Print error level logs if unexpected situation happens, otherwise use
warn
log level. - Print more information if concurrent
rollback
happens, which could resolve rollback data of other transactions, including lock resolver and GC.- The
resolve lock rollback
path - The GC
batch resolve lock
path- Resolve by GC batch resolve, https://github.com/tikv/client-go/blob/master/txnkv/txnlock/lock_resolver.go#L324
- The