Skip to content

Conversation

Bronek
Copy link
Collaborator

@Bronek Bronek commented Sep 3, 2025

NOTE this is for release branch only

High Level Overview of Change

This fixes a problem where rippled could crash due a regression bug in boost 1.86

Context of Change

This is the result of an internal investigation into rippled crashes in testnet. We found the crashes to correspond to a documented regression in boost 1.86 and confirmed that this bug is not present in the older version of boost 1.83 , which was used before #5264 . We do not fully revert this PR since there is no need.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

@Bronek Bronek changed the title [RELEASE] For release branch only, downgrade to boost 1.83 [RELEASE] Downgrade version 2.6 to boost 1.83 Sep 3, 2025
@Bronek Bronek requested a review from a team as a code owner September 3, 2025 14:40
Copy link

codecov bot commented Sep 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.8%. Comparing base (2df7dcf) to head (8d01f35).
⚠️ Report is 2 commits behind head on release.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           release   #5759     +/-   ##
=========================================
- Coverage     78.8%   78.8%   -0.0%     
=========================================
  Files          814     814             
  Lines        71345   71310     -35     
  Branches      8357    8345     -12     
=========================================
- Hits         56234   56192     -42     
- Misses       15111   15118      +7     
Files with missing lines Coverage Δ
src/libxrpl/protocol/BuildInfo.cpp 98.2% <ø> (ø)

... and 14 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Bronek
Copy link
Collaborator Author

Bronek commented Sep 5, 2025

Additional details:

Version 1.86 of boost introduced a regression bug in boost::beast demonstrated in boostorg/beast#2941 , where timeout on a network packet could cause termination of the process. The low level details of the bug are explained in boostorg/beast#2925 , and the fix is in boostorg/beast#2926 (also documented in https://github.com/boostorg/beast/blob/164db4bc57707b02550a53902cb1c138da99789f/CHANGELOG.md?plain=1#L32 )

This crash might produce a stacktrace from rippled with the following frames:

#2  0x0000000001963be6 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () at /usr/local/include/c++/12.5.0/bits/stl_construct.h:162
No symbol table info available.
#3  0x00000000046f5aba in __cxxabiv1::__terminate(void (*)()) ()
No symbol table info available.
#4  0x00000000046f5b25 in std::terminate() ()
No symbol table info available.
#5  0x00000000046f5c77 in __cxa_throw ()
No symbol table info available.
#6  0x0000000001859370 in boost::throw_exception<boost::asio::bad_executor> (e=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /root/.conan2/p/b/boost9dffe029038e8/p/include/boost/throw_exception.hpp:86
No locals.
#7  0x000000000185949f in boost::asio::executor::get_impl (this=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /root/.conan2/p/b/boost9dffe029038e8/p/include/boost/asio/executor.hpp:329
        ex = {<std::exception> = {<No data fields>}, <No data fields>}
.
.
.
#16 boost::asio::dispatch<boost::asio::detail::default_immediate_executor<boost::asio::executor, void, void>::type, boost::asio::append_t<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>::ops::transfer_op<false, boost::asio::const_buffers_1, boost::asio::detail::write_op<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::ssl::detail::io_op<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>, boost::asio::ssl::detail::buffered_handshake_op<boost::asio::const_buffers_1>, boost::asio::detail::spawn_handler<boost::asio::executor, void (boost::system::error_code, unsigned long)> > > >, boost::system::error_code, int> >(boost::asio::detail::default_immediate_executor<boost::asio::executor, void, void>::type const&, boost::asio::append_t<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>::ops::transfer_op<false, boost::asio::const_buffers_1, boost::asio::detail::write_op<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::ssl::detail::io_op<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>, boost::asio::ssl::detail::buffered_handshake_op<boost::asio::const_buffers_1>, boost::asio::detail::spawn_handler<boost::asio::executor, void (boost::system::error_code, unsigned long)> > > >, boost::system::error_code, int>&&, boost::asio::constraint<boost::asio::execution::is_executor<boost::asio::detail::default_immediate_executor<boost::asio::executor, void, void>::type>::value||boost::asio::is_executor<boost::asio::detail::default_immediate_executor<boost::asio::executor, void, void>::type>::value, int>::type) (ex=..., token=...) at /root/.conan2/p/b/boost9dffe029038e8/p/include/boost/asio/dispatch.hpp:156
No locals.
#17 0x00000000021bee9d in boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>::ops::transfer_op<false, boost::asio::const_buffers_1, boost::asio::detail::write_op<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>, boost::asio::mutable_buffer, boost::asio::mutable_buffer const*, boost::asio::detail::transfer_all_t, boost::asio::ssl::detail::io_op<boost::beast::basic_stream<boost::asio::ip::tcp, boost::asio::executor, boost::beast::unlimited_rate_policy>, boost::asio::ssl::detail::buffered_handshake_op<boost::asio::const_buffers_1>, boost::asio::detail::spawn_handler<boost::asio::executor, void (boost::system::error_code, unsigned long)> > > >::operator()(boost::system::error_code, unsigned long) (this=0x7f2c175afb00, ec=..., bytes_transferred=<optimized out>) at /usr/local/include/c++/12.5.0/tuple:199
        loc_333 = {file_ = 0x47d8e48 "/root/.conan2/p/b/boost9dffe029038e8/p/include/boost/beast/core/impl/basic_stream.hpp", function_ = 0x47d8ea0 "void boost::beast::basic_stream< <template-parameter-1-1>, <template-parameter-1-2>, <template-parameter-1-3> >::ops::transfer_op<isRead, Buffers, Handler>::operator()(boost::beast::error_code, std::s"..., line_ = 333, column_ = 17}
        amount = <error reading variable amount (dwarf2_find_location_expression: Corrupted DWARF expression.)>
        _coro_value = {value_ = @0x7f2c175afbc4, modified_ = true}

Frame 7 refers to an exception thrown (in noexcept context, hence process termination) in boost::asio when an event is processed without an executor. This is here https://github.com/boostorg/asio/blob/c28d453674dd2071fdc8cce5ffabcb54c910f466/include/boost/asio/executor.hpp#L329

The reason why there's no executor is explained in boostorg/beast#2925 and the frame 17 shows the buggy function being called https://github.com/boostorg/beast/blob/fee9be0be10c9c9a22ac1505a710d1d8ed5a3dfb/include/boost/beast/core/impl/basic_stream.hpp#L329 (packaged with BOOST_ASIO_CORO_YIELD which may defer execution, hence line 333 from BOOST_BEAST_ASSIGN_EC in the stacktrace above)

@xVet
Copy link

xVet commented Sep 5, 2025

"packet timeout" so this is not triggered by specific transactions but potentially by random network activities?

@Bronek
Copy link
Collaborator Author

Bronek commented Sep 5, 2025

"packet timeout" so this is not triggered by specific transactions but potentially by random network activities?

Roughly, yes. There must be a pending TCP activity which is causing timeout, and then the next network operation will result in a crash. We did not spend much time looking for the reproduction of this bug, so I cannot tell you much more.

@ximinez ximinez force-pushed the Bronek/boost_1_83_downgrade branch from bba7b4b to 8d01f35 Compare September 16, 2025 19:36
@ximinez ximinez merged commit 8d01f35 into release Sep 16, 2025
29 checks passed
@ximinez ximinez deleted the Bronek/boost_1_83_downgrade branch September 16, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants