Skip to content

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Apr 3, 2025

We allowed running heartbeat during global transactions. That's not good. For example, during replication and while transitioning from full to stable sync we unregister and register journal callbacks with a preemption inbetween. If under preemption heartbeat runs, then we loose those journal entries triggering an lsn check failure.

  • early return from HeartBeat if shard is under global lock

Fixes another corner case in #4048

@kostasrim kostasrim self-assigned this Apr 3, 2025
@kostasrim kostasrim requested a review from adiholden April 3, 2025 12:15
if (db_slice.WillBlockOnJournalWrite()) {
// We acquire exlusive locks during global transactions
// If we can't acquire this it means the shard is under global lock.
const bool can_acquire_global_lock = shard_lock()->Check(IntentLock::Mode::EXCLUSIVE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note, we don't acquire here, namely we don't increment Cx and Cs (exlusive and shared intent locks). We merely test them to understand if we are under global lock or not.

@kostasrim
Copy link
Contributor Author

I will follow with another PR, to fix preemption during heartbeat. This is the last corner case.

@@ -735,11 +735,14 @@ void EngineShard::Heartbeat() {
DbSlice& db_slice = namespaces->GetDefaultNamespace().GetDbSlice(shard_id());
// Skip heartbeat if we are serializing a big value
static auto start = std::chrono::system_clock::now();
if (db_slice.WillBlockOnJournalWrite()) {
// We acquire exlusive locks during global transactions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets update the comment,
How about: Skip heartbeat if global transaction is in process. This is determined by attempting to check if shard lock can be acquired.

@kostasrim kostasrim requested a review from adiholden April 7, 2025 06:56
@kostasrim kostasrim enabled auto-merge (squash) April 7, 2025 07:45
@kostasrim kostasrim merged commit 751ce58 into main Apr 7, 2025
10 checks passed
@kostasrim kostasrim deleted the kpr3 branch April 7, 2025 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants