Skip to content

Conversation

PeterChen13579
Copy link
Collaborator

Support AWS Keyspace queries

Copy link

codecov bot commented Aug 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.97%. Comparing base (3f2ada3) to head (497721e).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2454   +/-   ##
========================================
  Coverage    79.96%   79.97%           
========================================
  Files          384      384           
  Lines        15935    15936    +1     
  Branches      8340     8341    +1     
========================================
+ Hits         12743    12745    +2     
+ Misses        1912     1910    -2     
- Partials      1280     1281    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@kuznetsss kuznetsss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's wait for an approval from @godexsoft

Copy link
Collaborator

@godexsoft godexsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not thoroughly reviewed the code because i don't yet understand why we need so many changes if keyspaces is supposed to be compatible with cassandra?
In general i'm not a fan of adding custom stuff to the cassandra backend.. if we need to have custom behaviour maybe we should create a separate backend and inherit where we can, modify where we need.

if (!range_) {
executor_.writeSync(schema_->updateLedgerRange, ledgerSequence_, false, ledgerSequence_);
executor_.writeSync(schema_->insertLedgerRange, false, ledgerSequence_);
executor_.writeSync(schema_->insertLedgerRange, true, ledgerSequence_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this not needed before, do you know?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because before, the UpdateLedgerRange would insert the first range that it finds, let's say rippled started at ledger 256; this would insert 256, and the false means it is not the latest ledger in our ledger_range table.
Since the updateLedgerRange can't be used in Keyspace, I simplified it to both keyspace and scylla using this insertLedgerRange schema, as I felt like it is easier to understand the logic behind it. (ie, first time clio loads up without a ledger range table, insert both the latest and non latest range into the table)

- **Required**: True
- **Type**: string
- **Default value**: `cassandra`
- **Constraints**: The value must be one of the following: `cassandra`, `aws_keyspace`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well add scylladb to the mix then.. if we are using the same backend for multiple DBs already

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason we didn't add scylladb is because scylladb and cassandra would be the same as it uses fully the same schemas

);
}();
// AWS_keyspace supported queries
} else if (settingsProvider_.get().getSettings().provider == "aws_keyspace") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you explain why we need so many changes and what exactly is not supported so we had to do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained in my other comment 👍

PreparedStatement updateLedgerRange = [this]() {
return handle_.get().prepare(
fmt::format(
R"(
UPDATE {}
SET sequence = ?
WHERE is_latest = ?
IF sequence IN (?, null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was suggested by scylla team before IIRC.. this gave us a good speedup - let's not go back to the slower old version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert the logic so that scylladb uses this 👍

@PeterChen13579
Copy link
Collaborator Author

PeterChen13579 commented Aug 27, 2025

@godexsoft For some context, Keyspace is compatible with Cassandra but there's a few limitations. Specifically the places where we had:

IF ... IN clause 
PER PARTITION LIMIT 1 
Tuple comparisons, e.g.  (taxon, token_id) > ? 

To tackle this, I split the queries that contains the above into 2 queries. ie, 2 queries will be equivalent to the 1 from before.

The goal of this PR is to get Clio to run with Keyspace while those running cassandra/scylladb would work the exact same.
Also, keyspace is trying to support the above statements that they don't support right now. Hopefully once they get it supported, I'm happy to revert this PR.

Copy link
Collaborator

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments to mostly make the code more readable and less error-prone

) const
{
std::vector<ripple::uint256> nftIDs;
if (settingsProvider_.getSettings().provider == "aws_keyspace") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have 2 private methods, and here just call one or another under if

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this marked as resolved?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @kuznetsss and @godexsoft , we are completely changing the structure of the backend, (ie, adding a cassandraFamily level where CassandraBackend and KeyspaceBackend is inheriting from) there's not going to be anymore checking provider in backend, so I marked it as resolved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please, don't just "resolve" the issue, without any comment 😂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep it open until it's actually resolved, even if it's resolved not the way that reviewer (in this case me) suggested

@@ -347,6 +348,86 @@ class Schema {
Statements(SettingsProviderType const& settingsProvider, Handle const& handle)
: settingsProvider_{settingsProvider}, handle_{std::cref(handle)}
{
// initialize scylladb supported queries
if (settingsProvider_.get().getSettings().provider == "scylladb") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also have 2 private methods here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this marked as resolved?

Comment on lines +1008 to +1015
// For ScyllaDB / Cassandra ONLY
std::optional<PreparedStatement> selectAccountFromBeginningScylla;
std::optional<PreparedStatement> selectAccountFromTokenScylla;
std::optional<PreparedStatement> selectNFTsByIssuerScylla;

// For AWS Keyspaces ONLY
// NOTE: AWS keyspace is not able to load cache with accounts
std::optional<PreparedStatement> selectNFTsAfterTaxonKeyspaces;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be std::variant<CassandraStatements, AWSKeyspacesStatements>, and each one is a struct of needed statements, you will probably be able to get rid of std::optional as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this marked resolved?

@@ -88,6 +89,9 @@ struct Settings {
/** @brief Size of batches when writing */
std::size_t writeBatchSize = kDEFAULT_BATCH_SIZE;

/** @brief Provider to know if we are using scylladb or keyspace */
std::string provider = kDEFAULT_PROVIDER;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create an enum for the provider.
I know that the config definition doesn't support the enumeration, but I think we shouldn't use raw strings and raw strings comparison in the code itself, because it's error-prone (see my comment above for incorrect kDEFAULT_PROVIDER value)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants