Skip to content

Conversation

Tapanito
Copy link
Collaborator

@Tapanito Tapanito commented Sep 2, 2025

This PR refactors the peer shutdown and failure handling mechanism to be more robust, consistent, and communicative.

The previous implementation used raw strings to represent error reasons and did not communicate these reasons to peers when shutting down a connection.

With this change disconnections are now explicitly communicated via a TMClose protocol message with strongly-typed reasons. This new approach provides better diagnostics and makes the peer disconnection process more stable and predictable.

A similar feature was implemented in #3839 by @gregtatcam, which introduced startandshutdown` messages. This is a simplified version of that change, introducing only a shutdown message.

This feature is backwards compatible, as older rippled versions ignore unknown p2p protocol messages.

Example log message:

2025-Sep-02 13:04:12.832643222 UTC Protocol:DBG [015] onMessage: TMClose peer closed the connection: Shutdown

High Level Overview of Change

Context of Change

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Performance (increase or change in throughput and/or latency)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)
  • Documentation update
  • Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • Release

API Impact

  • Public API: New feature (new methods and/or new fields)
  • Public API: Breaking change (in general, breaking changes should only impact the next api_version)
  • libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
  • Peer protocol change (must be backward compatible or bump the peer protocol version)

This commit refactors the peer shutdown and failure handling
mechanism to be more robust, consistent, and communicative.

The previous implementation used raw strings to represent error reasons
and did not communicate these reasons to peers when shutting down a connection.

With this change disconnections are now explicitly communicated via a
`TMClose` protocol message with strongly-typed reasons. This new
approach provides better diagnostics and makes the peer disconnection
process more stable and predictable.
@Tapanito Tapanito changed the title refactor(overlay): Overhaul peer disconnection logic feat: Adds peer disconnection message Sep 2, 2025
@Tapanito Tapanito marked this pull request as ready for review September 2, 2025 13:05
Copy link

codecov bot commented Sep 2, 2025

Codecov Report

❌ Patch coverage is 0% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.7%. Comparing base (e0b9812) to head (a15abd4).

Files with missing lines Patch % Lines
src/xrpld/overlay/detail/PeerImp.cpp 0.0% 59 Missing ⚠️
src/xrpld/overlay/detail/ProtocolMessage.h 0.0% 5 Missing ⚠️
src/xrpld/overlay/detail/TrafficCount.h 0.0% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #5755     +/-   ##
=========================================
- Coverage     78.7%   78.7%   -0.0%     
=========================================
  Files          816     816             
  Lines        71749   71757      +8     
  Branches      8475    8479      +4     
=========================================
- Hits         56456   56440     -16     
- Misses       15293   15317     +24     
Files with missing lines Coverage Δ
src/xrpld/overlay/detail/PeerImp.h 11.5% <ø> (ø)
src/xrpld/overlay/detail/TrafficCount.cpp 27.9% <ø> (ø)
src/xrpld/overlay/detail/TrafficCount.h 32.9% <0.0%> (-0.4%) ⬇️
src/xrpld/overlay/detail/ProtocolMessage.h 14.2% <0.0%> (-0.2%) ⬇️
src/xrpld/overlay/detail/PeerImp.cpp 3.9% <0.0%> (-<0.1%) ⬇️

... and 5 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}

void
PeerImp::gracefulClose()
PeerImp::sendAndClose(protocol::TMCloseReason reason)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really sendAndClose since it doesn't close the socket. It indicates an intention to close.

@@ -231,7 +259,8 @@ PeerImp::stop()
JLOG(journal_.info()) << "Stop";
}
}
close();

sendAndClose(protocol::TMCloseReason::crSHUTDOWN);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think close() should be called here as in the original implementation. The overlay received stop() request. IMO it should close right away.

Also I'm not sure if sendAndClose() would actually close the socket. send() has this code at the start of the function:

    auto validator = m->getValidatorKey();
    if (validator && !squelch_.expireSquelch(*validator))
    {
        overlay_.reportOutboundTraffic(
            TrafficCount::category::squelch_suppressed,
            static_cast<int>(m->getBuffer(compressionEnabled_).size()));
        return;
    }

If that condition above is true then send() is not going to send anything. And if the send queue is empty then onWriteMessage() is not going to be called and consequently close() is not going to be called.

JLOG(journal_.info()) << "EOF";
return gracefulClose();
return close();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think gracefulClose() should still be called. gracefulClose() calls async_shutdown, which is graceful and secure connection shutdown, especially with SSL/TLS streams. Calling close() might be an ungraceful and often insecure way to terminate a connection.

@Tapanito Tapanito marked this pull request as draft September 5, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants