-
Notifications
You must be signed in to change notification settings - Fork 637
feat: introduce foyer layer #6366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
Please hold the merge progress after approval. I'll release a new foyer version after all problems here are fixed, then switch the foyer version in this PR to it. 🙏 |
Signed-off-by: MrCroxx <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MrCroxx for working on this, really great. Only some comments about the details.
Buffer, Error, ErrorKind, Metadata, Result, | ||
}; | ||
|
||
fn extract_err(e: FoyerError) -> Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep it. It is used in both extracting opendal error from foyer fetch API, and in unit test. I think it is important to make sure the behaviour is correct and consistent between the read function and the unit test.
Signed-off-by: MrCroxx <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cool! I'm going to tag along to see the progress.
core/src/layers/foyer.rs
Outdated
let range = BytesContentRange::default() | ||
.with_range(start, end - 1) | ||
.with_size(entry.len() as _); | ||
let buffer = entry.slice(start as usize..end as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment as a reminder? Up to you.
Hi. Just back from a vacation. I'll keep working on this PR tomorrow. 🥰 |
core/src/layers/foyer.rs
Outdated
let entry = self | ||
.inner | ||
.cache | ||
.fetch(path.clone(), || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpRead
contains a version
field that I think we should include in the cache key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found that this requirement is a bit tricky for foyer. As a cache, foyer does not support versioning (and it is also difficult to support, as caches allow only partial data retention, and supporting versioning requires a lot of additional overhead). If users want to read the latest version without a version tag, it may lead to reading incorrect objects or result in cache misses.
I think a better approach might be to bypass the cache when there is a versioning requirement, or to treat objects without a version and those with a clear version as two separate objects in the cache without fallback.
Any ideas? cc @Xuanwo @jorgehermo9 for help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of something like this https://github.com/jorgehermo9/opendal/blob/0b5520869ca53afd522fbc3faa345fba9f812d3a/core/src/layers/foyer.rs#L204 (taking it from my draft PR of the foyer cache layer)
Just adding it to the cache key so paths with different version are treated as separate cache keys
Example
reading path=/path/to/blob, version=None -> Storing entry with key "/path/to/blob-"
reading path=/path/to/blob, version= "1.0" -> Storing entry with key "/path/to/blob-1.0"
reading path=/path/t/blob, version = "2.0" -> Storing entry with key "/path/to/blob-2.0"
This way, you would have one entry per combination of (path,version). I think this should be the better approach, right?
or to treat objects without a version and those with a clear version as two separate objects in the cache without fallback.
With that, you meant this? Or did I understand it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example
reading path=/path/to/blob, version=None -> Storing entry with key "/path/to/blob-"
reading path=/path/to/blob, version= "1.0" -> Storing entry with key "/path/to/blob-1.0"
reading path=/path/t/blob, version = "2.0" -> Storing entry with key "/path/to/blob-2.0"
Yeah, this example is what I mean. But it treats the read operation without version (fallback to latest version) and read operation with version in different ways.
e.g.
obj-1: v1, v2, v3 (latest)
Expected:
read obj-1, v3 => cache read obj-1+v3 (as expected)
read obj-1 => cache read obj-1+ (shoule be: cache read obj-1+v3)
Although this will not affect the correctness, I still want to confirm whether this behavior is expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm yeah, I understand what you mean. It is a performance improvement, but I'm not sure if this layer should assume that the backend would treat None version == latest one. It is the most common approach, but for example, think of a backend that requires the version to be set, and fails with version=None (the backend does not fallback None to latest), the backend response would be different in those two cases (error with version=None and the blob bytes with version=latest)
Doing that fallback to latest would also be hard to know for this cache layer. How can the cache layer know which one is the latest version? Maybe the remote backend has version v4
, but that version did never reach the cache yet and it only recorded version v3
, should the fallback with version=None be done to version=v3
which we have on cache or go to the remote backend and check the latest version (v4
)?
I think I prefer to not optimize this case in order to not make assumptions about the underlying backend in favor of correctness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the version
field is a free string, how can you know that v3
is the latest? We can only interpret the version and induce the ordering if it complies with semver or something like that, but with the current implementation, we don't really know which tag would be the latest based only on its name
Maybe for some backend, we have another kind of version system that does v1 (latest), v2, v3. And when a new version reaches, we move each blob +1 version ahead and we would have v1 (latest)(new blob) , v2 (old v1), v3 (old v2), v4 (old v3). This is weird, but users could do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. Let me implement it.
Signed-off-by: MrCroxx <[email protected]>
Signed-off-by: MrCroxx <[email protected]>
writer.write_all(&version_len.to_le_bytes())?; | ||
writer.write_all(version.as_bytes())?; | ||
} else { | ||
writer.write_all(&0u64.to_le_bytes())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way, the version ""
(empty string) would have the same serialization as None
and its decoding would always be None
(per the condition of L72, where if version_len == 0
, then None
is returned. I think it is wrong to assume None==""
in the version. It is a corner case but it is weird to me
Maybe we could use some "magic" len value such as len = -1 (and storing that magic value in a constant?) to represente the None
value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we should do the equivalent of #[serde(skip_serializing_if = "Option::is_none")]
here and do not write the len at all, and then in the decode part, we check before the reader.read_exact(&mut u64_buf)?;
if the buffer is empty at that point. If the buffer is empty, then the version is None
because there is no more bytes describing the lenght
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I only considered the case of S3. In practice, I think these two should be the same for almost all storage backends. However, if there are counterexamples, this is not appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we should do the equivalent of
#[serde(skip_serializing_if = "Option::is_none")]
here and do not write the len at all, and then in the decode part, we check before thereader.read_exact(&mut u64_buf)?;
if the buffer is empty at that point. If the buffer is empty, then the version isNone
because there is no more bytes describing the lenght
Good idea! I like this solution. Let me fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look into how bincode does that!
https://github.com/bincode-org/bincode/blob/55fd02934cff567ce1b2ff9d007608818ea6481b/src/enc/mod.rs#L89
https://github.com/bincode-org/bincode/blob/trunk/src/de/mod.rs#L312
They encode an u8
at the start to flag whether the value is Some(_)
or None
, and if it is some, the value 1
is stored, and next to that, the String encoding. If None
, then only 0
is stored and nothing more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that we are kind of reimplementing the functionality of bincode... Should we really reimplement it to try avoid having a dependency to it? I think it would be a lot easier to just forward the struct to bincode (this is how the default serializer of foyer
works, right? if I remember correctly, it is behind a feature flag..
The issue of reimplementing is the danger of having such bugs..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a lot easier to just forward the struct to bincode (this is how the default serializer of foyer works, right?
Yep. Foyer supports serde and bincode.
Should we really reimplement it to try avoid having a dependency to it?
I'm open to both options because the key and value struct is simple enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be nice to add a test to assert that the serialization of a key with version=None and version="" is different.
And also, a test (fuzz test if possible but at least a simple test with a few cases) where we test that decode(encode(key)) == key
so we ensure that decode is the inverse of encode
Which issue does this PR close?
Closes #6370 #6372.
FYI: #5906
Rationale for this change
Introduce
FoyerLayer
as foyer hybrid cache integration.What changes are included in this PR?
Are there any user-facing changes?