-
-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Currently, the blobs that are written to the WAL, are then rewritten to a blob file after flushing the memtable. That gives us a write amplification of 2 and a bit (because of the pointers in the SSTs). That is pretty good compared to something like 7-10 if the blobs were compacted over and over again.
However, something like LMDB actually has a write amp of close to 1 for very large values, because it does not write to a WAL.
Question is, can we reuse the blobs in the WAL?
This is used in https://github.com/topling/toplingdb
Basically:
- Write blob frames (the format defined in https://github.com/fjall-rs/value-log) directly into WAL
- Reference those blobs in the memtable
- On rotation, somehow register the WAL as a blob file - this will take a bit of care so that it all works correctly with value-logs recovery and everything
- At that point, the WAL file would not be added to the Journal Manager's GC list because it is governed by
value-log
The disadvantage is that newly written blobs are stored out of order, in a log+index kind of fashion, so range reads may suffer, but when performing garbage collection we can create a new blob file in order. Though it's questionable how we will be able to sort a hijacked WAL file like that without too much IO or memory usage.