Skip to content

txn: collect more information when inconsistency or rollback happens #1631

@cfzjywxk

Description

@cfzjywxk

For certain unexpected cases where data issues have been confirmed, more on-site information should be collected and printed in the logs to facilitate further investigation.

For example once this log is printed

"2PC failed commit key after primary key committed"

Other situtaions like

  • The commit is returned successfully by async commit protocol, but the async commit process encunter errors like TxnLockNotFound
  • The primary key is committed and commit is returned successfully, but the async secondary key commit process encunter errors like TxnLockNotFound
  • In GC batch resolve locks, the expired locks could not be resolved properly, encounterring errors like unexpected resolve err: commit_ts_expired:<start_ts:446066598787153941 attempted_commit_ts:446066599141310467, the attemping commit_ts is smaller than the min_commit_ts of the key

It means the transaction coordinator encounters an unexpected failure when committing the secondary keys AFTER the primary key has been successfully committed (at which point the transaction is already considered successfully committed), the atomicity of the transaction can no longer be guaranteed.

In such cases, more detailed MVCC information about the failure commit key should be printed to help diagnosis, and then it could be known whether it is rolled back by some conflict transactions(with start_ts), if so it means there is critical bug in the transaction protocol. Or there is data lost and the key is lost.


Another task is to refine the logs printed, including:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions