Experiment with reusing blobs in WAL after flush

Currently, the blobs that are written to the WAL, are then rewritten to a blob file after flushing the memtable. That gives us a write amplification of 2 and a bit (because of the pointers in the SSTs). That is pretty good compared to something like 7-10 if the blobs were compacted over and over again.

However, something like LMDB actually has a write amp of close to 1 for very large values, because it does not write to a WAL.

Question is, can we reuse the blobs in the WAL?
This is used in https://github.com/topling/toplingdb

Basically:

1. Write blob frames (the format defined in https://github.com/fjall-rs/value-log) directly into WAL
2. Reference those blobs in the memtable
3. On rotation, somehow register the WAL as a blob file - this will take a bit of care so that it all works correctly with value-logs recovery and everything
4. At that point, the WAL file would not be added to the Journal Manager's GC list because it is governed by `value-log`

The disadvantage is that newly written blobs are stored out of order, in a log+index kind of fashion, so range reads may suffer, but when performing garbage collection we can create a new blob file in order. Though it's questionable how we will be able to sort a hijacked WAL file like that without too much IO or memory usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Experiment with reusing blobs in WAL after flush #173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Experiment with reusing blobs in WAL after flush #173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions