Skip to content

Conversation

rescrv
Copy link
Contributor

@rescrv rescrv commented Aug 28, 2025

Description of changes

This PR changes scout logs to consult the cache on ScoutLogs. If the
manifest was recently in the cache, wal3/rls will perform a HEAD
operation to fetch the object into cache.

This PR contains tests written by Claude.

Test plan

CI

Migration plan

N/A

Observability plan

N/A

Documentation Changes

N/A

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor

propel-code-bot bot commented Aug 28, 2025

ScoutLogs Uses Cache With HEAD Optimization for Manifest Verification

This PR introduces a cache validation mechanism for ScoutLogs, allowing the log service to consult a cached manifest and verify its freshness via an S3 HEAD operation (etag check) instead of loading the entire manifest. The change extends S3, admission-controlled S3, and storage abstraction layers with a new confirm_same interface, which provides an etag consistency check without downloading the full object. When a cached manifest+etag is found, the server calls HEAD to verify its validity before using it, falling back to a full manifest fetch if verification fails. Corresponding test coverage is added for S3, manifest, log-reader, and service endpoints to ensure correct integration and observability.

Key Changes

• Added the Storage::confirm_same() method (with backend implementations) to verify that a provided etag matches the current file (manifest) in storage without fetching the whole file
• Updated ScoutLogs logic in rust/log-service/src/lib.rs to prefer using cached manifests, verifying them with HEAD/etag checks, and falling back to a fresh manifest fetch if verification or cache miss occurs
• Introduced Manifest::head() and LogReader::verify() for lightweight manifest freshness checks via etag
• Wired through S3 (S3Storage), admission-controlled S3, storage abstraction, and implemented test stubs (and appropriate NotImplemented for local/object_store backends)
• Added comprehensive k8s-integration and unit tests for HEAD/etag behavior, including edge cases (e.g., stale cache, missing files, error handling)
• Updated Cargo.lock and resolved minor package version drifts

Affected Areas

rust/log-service/src/lib.rs: logic for manifest fetching and caching in ScoutLogs
rust/wal3/src/reader.rs, manifest.rs: new methods for etag verification and manifest loading
rust/storage/src/lib.rs, s3.rs, admissioncontrolleds3.rs, local.rs, object_store.rs: storage backend implementations for confirm_same/etag logic and passthrough
• Tests: integration and unit tests in storage, manifest, log-reader, and log-service modules
Cargo.lock: dependency tree maintenance

This summary was automatically generated by @propel-code-bot

Comment on lines +70 to +72
pub async fn confirm_same(&self, _: &str, _: &ETag) -> Result<bool, StorageError> {
Err(StorageError::NotImplemented)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[TestCoverage]

This NotImplemented error will cause cache verification to always fail for local storage, preventing tests from exercising the cache-hit path. You could implement this using the existing etag_for_bytes helper to make local storage tests more realistic.

Suggested change
pub async fn confirm_same(&self, _: &str, _: &ETag) -> Result<bool, StorageError> {
Err(StorageError::NotImplemented)
}
pub async fn confirm_same(&self, key: &str, e_tag: &ETag) -> Result<bool, StorageError> {
match self.get(key).await {
Ok(bytes) => {
let current_etag = Self::etag_for_bytes(&bytes);
Ok(&current_etag == e_tag)
}
Err(StorageError::NotFound { .. }) => Ok(false),
Err(e) => Err(e),
}
}

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Comment on lines +156 to +158
pub async fn confirm_same(&self, _: &str, _: &ETag) -> Result<bool, StorageError> {
Err(StorageError::NotImplemented)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Similar to the LocalStorage implementation, returning NotImplemented here prevents testing the cache verification path for this storage backend. The underlying object_store crate supports head requests which return an ETag, so this could be implemented.

Suggested change
pub async fn confirm_same(&self, _: &str, _: &ETag) -> Result<bool, StorageError> {
Err(StorageError::NotImplemented)
}
pub async fn confirm_same(&self, key: &str, e_tag: &ETag) -> Result<bool, StorageError> {
match self.object_store.head(&object_store::path::Path::from(key)).await {
Ok(meta) => Ok(meta.e_tag == Some(e_tag.0.clone())),
Err(object_store::Error::NotFound { .. }) => Ok(false),
Err(e) => Err(e.into()),
}
}

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

@HammadB
Copy link
Collaborator

HammadB commented Aug 28, 2025

Do we want to consider racing reads here? It seems like that could be useful

@rescrv rescrv force-pushed the rescrv/scout-logs-uses-head branch from e76ace7 to 3aa12eb Compare September 8, 2025 16:54
Comment on lines +1149 to 1151
if !log_reader.verify(cached).await.unwrap_or_default() {
cached_manifest_and_e_tag.take();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The use of unwrap_or_default() here effectively treats any error during manifest verification as a cache miss. While this is a safe fallback, it hides potentially important errors (e.g., network issues, S3 permissions) that could indicate a deeper problem. Logging these errors would improve observability and help diagnose issues that might cause frequent cache misses.

Suggested change
if !log_reader.verify(cached).await.unwrap_or_default() {
cached_manifest_and_e_tag.take();
}
match log_reader.verify(cached).await {
Ok(true) => (), // All good, manifest is fresh.
Ok(false) => {
// Stale manifest, invalidate.
cached_manifest_and_e_tag.take();
}
Err(err) => {
tracing::warn!(
"Failed to verify cached manifest for collection {}: {}. Falling back to full fetch.",
collection_id,
err
);
cached_manifest_and_e_tag.take();
}
}

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

The use of `unwrap_or_default()` here effectively treats any error during manifest verification as a cache miss. While this is a safe fallback, it hides potentially important errors (e.g., network issues, S3 permissions) that could indicate a deeper problem. Logging these errors would improve observability and help diagnose issues that might cause frequent cache misses.

```suggestion
            match log_reader.verify(cached).await {
                Ok(true) => (), // All good, manifest is fresh.
                Ok(false) => {
                    // Stale manifest, invalidate.
                    cached_manifest_and_e_tag.take();
                }
                Err(err) => {
                    tracing::warn!(
                        "Failed to verify cached manifest for collection {}: {}. Falling back to full fetch.",
                        collection_id,
                        err
                    );
                    cached_manifest_and_e_tag.take();
                }
            }
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: rust/log-service/src/lib.rs
Line: 1151

@rescrv rescrv force-pushed the rescrv/scout-logs-uses-head branch from 3aa12eb to 568b89c Compare September 9, 2025 21:35
@rescrv
Copy link
Contributor Author

rescrv commented Sep 9, 2025

Offline discussion documented here: Racing reads will only help in the case that both ops race with a write that invalidates the cache.

@blacksmith-sh blacksmith-sh bot deleted a comment from rescrv Sep 9, 2025
@rescrv rescrv requested a review from Sicheng-Pan September 9, 2025 22:52
@rescrv rescrv force-pushed the rescrv/scout-logs-uses-head branch from c6171dd to bbc6610 Compare September 9, 2025 23:57
@blacksmith-sh blacksmith-sh bot deleted a comment from rescrv Sep 10, 2025
@@ -4758,7 +4758,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fc2f4eb4bc735547cfed7c0a4922cbd04a4655978c09b54f1f7b228750664c34"
dependencies = [
"cfg-if",
"windows-targets 0.52.6",
"windows-targets 0.48.5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is cargo lock change here intentional?

unrelated but maybe we should consider bump our dependencies in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was from a rebase. Will fix.

@rescrv rescrv requested a review from Sicheng-Pan September 11, 2025 21:44
@rescrv rescrv merged commit 214864d into main Sep 15, 2025
58 checks passed
@rescrv rescrv deleted the rescrv/scout-logs-uses-head branch September 15, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants