-
Notifications
You must be signed in to change notification settings - Fork 511
db: SeekPrefixGE lazy positioning #5256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
db: SeekPrefixGE lazy positioning #5256
Conversation
7882e7b
to
e660b00
Compare
This commit implements the lazy positioning of SeekPrefixGE. During SeekPrefixGE, previously we had to perform the seek in each level. We loaded the heap with all the seek results we got and returned the next entry, which would be at the top of the heap. This optimization implements the idea of deferring an actual seek unless needed. We use the Block Interval Property to extract the latest suffix from the table, and create a fake or a synthetic key instead of actually loading the block, we do this if the following requirements are met 1. The latest suffix we extracted must be less than the seeked suffix. 2. Perform the synthetic key optimization if the prefix is wholly contained within the sstable's bounds If the conditions are met we create a synthetic key, mark it as one and return it. We will keep it as a synthetic key unless we rise to the top of the heap, if we are at the top of the heap we must perform the actual seek. The following are the realized benefits of this optimization. 1. Suppose we made three synthetic keys, and two of them never reached the top of the heap, we potentially saved two block reads. 2. During these saves, we also saved bloom filter reads. Here are some disadvantages of this optimization 1. We will have to disable the trySeekUsingNext optimization during SeekPrefixGE calls if the key earlier we had was a synthetic one, since we never actually seeked we don't want this optimization to be used. The problem is that we also lose the opportunity to use this even when applicable. 2. There can be times where we create synthetic keys all over the place and we end up actually seeking each one of them, in which case this optimization is basically useless. Implements cockroachdb#2002
e660b00
to
7a62afd
Compare
7a62afd
to
b3e4587
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good!
@jbowens reviewed 3 of 8 files at r1.
Reviewable status: 1 of 11 files reviewed, 8 unresolved discussions (waiting on @xinhaoz)
sstable/block_property_test_utils.go
line 149 at r2 (raw file):
func (testprop MaxTestKeysSuffixProperty) Extract( encodedProperty []byte,
I think we should extend this func (in the interface) to take a dst []byte
for the first parameter. Then down below we'll append the @
byte and AppendUint
the value to dst
, and return it. This allows the caller to reuse a buffer, reducing allocations
sstable/block_property_test_utils.go
line 161 at r2 (raw file):
return nil, false, nil } ret := strconv.AppendUint([]byte("@"), (interval.Upper - 1), 10)
What motivated the - 1? Just trying to get a tighter bound since Upper is exclusive, or was there a correctness issue you ran into?
file_cache.go
line 735 at r1 (raw file):
References: blobReferences, }, MaximumSuffixProperty: sstable.MaxTestKeysSuffixProperty{},
this should be opts.MaximumSuffixProperty
sstable/reader_iter_two_lvl.go
line 861 at r2 (raw file):
func (i *twoLevelIterator[I, PI, D, PD]) Next() *base.InternalKV { if i.secondLevel.synthetic.atSyntheticKey {
I'm confused by this; why can't this usenseekPrefixGE
?
sstable/reader.go
line 111 at r2 (raw file):
type MaximumSuffixProperty interface { Name() string Extract(encodedProperty []byte) (suffix []byte, ok bool, err error)
let's document MaximumSuffixProperty
and Extract
, and include in Extract
's comment that the implementation must not mutate encodedProperty
sstable/reader.go
line 114 at r2 (raw file):
} type SyntheticKey struct {
nit: this doesn't need to be exported (capitalized), right? I don't think we're accessing it from outside of this package.
internal/base/internal.go
line 99 at r2 (raw file):
//InternalKeyKindNoop InternalKeyKind = 13 //InternalKeyKindColumnFamilyRangeDelete InternalKeyKind = 14 InternalKeyKindSyntheticKey InternalKeyKind = 14
what was the motivation for using InternalKeyKind=14
?
sstable/reader_iter_single_lvl.go
line 837 at r2 (raw file):
} } else { maxSuffix, ok, err = i.maximumSuffixProperty.Extract([]byte(prop))
let's do an "unsafe" byte-slice conversion here:
maxSuffix, ok, err = i.maximumSuffixProperty.Extract(unsafe.Slice(unsafe.StringData(prop), len(prop)))
This will prevent an allocation. And it's safe because Extract
does not mutate property.
b3e4587
to
d6cf16c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 13 files reviewed, 7 unresolved discussions (waiting on @12, @13, @jbowens, and @xinhaoz)
file_cache.go
line 735 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
this should be
opts.MaximumSuffixProperty
Done.
internal/base/internal.go
line 99 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
what was the motivation for using
InternalKeyKind=14
?
The reason was that TestInvalidInternalKey FAILED since x19 && 0xff returned 25. I fixed it by changing the test case to x20
Code snippet (i):
// Kind returns the key kind component of the trailer.
func (t InternalKeyTrailer) Kind() InternalKeyKind {
return InternalKeyKind(t & 0xff)
}
Code snippet (ii):
"foo\x08\x07\x06\x05\x04\x03\x02",
"foo\x20\x07\x06\x05\x04\x03\x02\x01",
sstable/block_property_test_utils.go
line 149 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I think we should extend this func (in the interface) to take a
dst []byte
for the first parameter. Then down below we'll append the@
byte andAppendUint
the value todst
, and return it. This allows the caller to reuse a buffer, reducing allocations
Good point. Done.
sstable/block_property_test_utils.go
line 161 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
What motivated the - 1? Just trying to get a tighter bound since Upper is exclusive, or was there a correctness issue you ran into?
Its trying to get a tighter bound, I am imaging a case where we seek @13 and the latest suffix is @12, if we did not
do -1 there we would not perform the optimization even though we should ?
sstable/reader.go
line 111 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
let's document
MaximumSuffixProperty
andExtract
, and include inExtract
's comment that the implementation must not mutateencodedProperty
Done.
sstable/reader.go
line 114 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: this doesn't need to be exported (capitalized), right? I don't think we're accessing it from outside of this package.
Done.
sstable/reader_iter_single_lvl.go
line 837 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
let's do an "unsafe" byte-slice conversion here:
maxSuffix, ok, err = i.maximumSuffixProperty.Extract(unsafe.Slice(unsafe.StringData(prop), len(prop)))
This will prevent an allocation. And it's safe because
Extract
does not mutate property.
Good point. Done.
sstable/reader_iter_two_lvl.go
line 861 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'm confused by this; why can't this usen
seekPrefixGE
?
During the two level SeekPrefixGE, we never loaded its index block. So during next we must load the index block before calling single level iterator.
If you were talking about replacing secondLevel.Next with seekPrefixGE. We can certainly do that and get some results.
Code snippet:
if !dontSeekWithinSingleLevelIter {
kv := i.secondLevel.Next()
return kv
}
54babe7
to
992d055
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbowens reviewed 3 of 12 files at r7.
Reviewable status: 4 of 13 files reviewed, 9 unresolved discussions (waiting on @12, @13, and @xinhaoz)
sstable/block_property_test_utils.go
line 161 at r2 (raw file):
Previously, Sachuman (Sachin) wrote…
Its trying to get a tighter bound, I am imaging a case where we seek @13 and the latest suffix is @12, if we did not
do -1 there we would not perform the optimization even though we should ?
Makes sense
sstable/reader_iter_single_lvl.go
line 801 at r7 (raw file):
if i.synthetic.atSyntheticKey { // TODO : We have to disable the optimization to avoid false data invalidation if there back to back SeekPrefixGE calls // so currently we cant take advantage of trySeekUsingNext in case of synthetic reseeks.
nit: wrap the comment lines at ~80-100 chars, and format the TODO prefix as TODO(sachin): ...
sstable/reader_iter_single_lvl.go
line 819 at r7 (raw file):
// a suffix (len(key) > len(prefix)), we might be able to defer actually // performing the seek and potentially loading additional blocks. // However, perform the synthetic key optimization if the prefix is wholly contained within the sstable's bounds
nit: I think this should say something like "However, we can only perform ..."
(also wrap at 80-100 chars)
sstable/reader_iter_single_lvl.go
line 822 at r7 (raw file):
// We do so because there might be cases which result in generating a synthetic key which was not supposed to be generated // But due to the Bounds we might end up creating the synthetic key, which will result in a seek failure. A case that can // be considered is when we have a RangeDel key at the upper bound of the sstable, which possibly can result in fake synthetic key.
nit: how about instead, something like:
"If the sought prefix is NOT wholly contained within the sstable, it might be split across multiple sstables. If the first sstable with bounds that overlap the prefix does not actually contain a point key with the prefix, the sstable's properties' max suffix doesn't reflect the max of the prefix."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 13 files reviewed, 16 unresolved discussions (waiting on @12, @13, @Sachuman, and @xinhaoz)
sstable/reader_iter_two_lvl.go
line 861 at r2 (raw file):
Previously, Sachuman (Sachin) wrote…
During the two level SeekPrefixGE, we never loaded its index block. So during next we must load the index block before calling single level iterator.
If you were talking about replacing secondLevel.Next with seekPrefixGE. We can certainly do that and get some results.
Ah I think I was confusing the code structure of twoLevelIterator and singleLevelIterator.
Can we extract the body of the seek prefix GE and share the code between twoLevelIterator.SeekPrefixGE
and twoLevelIterator.Next
?
sstable/reader_iter_single_lvl.go
line 1299 at r7 (raw file):
func (i *singleLevelIterator[I, PI, D, PD]) Next() *base.InternalKV { if i.synthetic.atSyntheticKey {
nit: let's put a comment above this if statement explaining the optimization and that SeekPrefixGE may have returned a synthetic key with the "smallest" (smallest according to base.Compare) suffix contained in the sstable, avoiding performing the actual seek. If the caller is calling Next, they want to move past the synthetic key and Next is responsible for performing the actual seek.
internal/itertest/testdata/probes
line 26 at r7 (raw file):
# Create probes to exercise the UserKey predicate. On Next/Prev return a KV # with user key "bar"; on other ops return a KV with user key "foo". Then if # the returned KV's user key is "bar", override the result with a specific KV.
I'm a bit confused as to what this test case is seeking to test?
iterator_histories_test.go
line 358 at r7 (raw file):
o.UseL6Filters = true case "probe-points":
nit: remove empty line
iterator_histories_test.go
line 367 at r7 (raw file):
pointProbes[base.TableNum(i)] = itertest.MustParseProbes(parser, arg.Vals[1:]...) case "probe-rangedels":
nit: remove empty line
sstable/reader_iter_two_lvl.go
line 429 at r7 (raw file):
// We do so because there might be cases which result in generating a synthetic key which was not supposed to be generated // But due to the Bounds we might end up creating the synthetic key, which will result in a seek failure. A case that can // be considered is when we have a RangeDel key at the upper bound of the sstable, which possibly can result in fake synthetic key.
nit: ditto the comment on the singleLevelIterator
sstable/reader_iter_two_lvl.go
line 436 at r7 (raw file):
largest = i.secondLevel.reader.Comparer.Split.Prefix(largest) if i.secondLevel.cmp(prefix, smallest) > 0 && i.secondLevel.cmp(prefix, largest) < 0 {
nit: let's move checking this condition to be the innermost. the vast majority of the time this condition will be met, whereas the other conditions (eg, the seek key's suffix being < the max suffix) are less likely to be met.
testdata/iter_histories/probe_prefix
line 178 at r7 (raw file):
---- combined-iter probe-points=(4,(Log "OpSeekPrefixGE")) probe-rangedels=(4,(Log "OpSeekPrefixGE"))
these "OpSeekPrefixGE"
prefixes are a bit confusing. how about
combined-iter probe-points=(4,(Log "000004.")) probe-rangedels=(4,(Log "000004.RangeDels."))
(and analogous changes throughout the test file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 13 files reviewed, 18 unresolved discussions (waiting on @12, @13, @Sachuman, and @xinhaoz)
sstable/reader_iter_two_lvl.go
line 448 at r7 (raw file):
} } else { maxSuffix, ok, err = i.secondLevel.maximumSuffixProperty.Extract(maxSuffix, unsafe.Slice(unsafe.StringData(prop), len(prop)))
in order to use the dst
parameter to reduce allocations, we need to retain a buffer between calls to SeekPrefixGE
. We can add a maxSuffixBuf []byte
to i.synthetic
.
Then we would do something like:
maxSuffix, ok, err = i.secondLevel.maximumSuffixProperty.Extract(i.synthetic.maxSuffixBuf[:0], unsafe.Slice(unsafe.StringData(prop), len(prop)))
if err != nil {
i.secondLevel.err = err
return nil
} else if ok {
i.synthetic.maxSuffixBuf = maxSuffix
}
sstable/reader_iter_two_lvl.go
line 471 at r7 (raw file):
// to still be open by the time singleLevelIterator.Next is called // and we use the seek key to actually perform the seek. i.secondLevel.synthetic.seekKey = append(i.secondLevel.synthetic.seekKey[:0], key...)
We should retain seekKey
and the new maxSuffix
field on i.synthetic
across Closes. To do that, I think we should move i.synthetic
below i.clearForResetBoundary
in the struct definition (so i.synthetic
no longer has its memory zeroed automatically in resetForReuse
), and then explicitly update resetForReuse
to do:
clear(synthetic.seekKey[:cap(synthetic.seekKey)])
clear(synthetic.maxSuffix[:cap(synthetic.maxSuffix)])
synthetic.seekKey = synthetic.seekKey[:0]
synthetic.maxSuffix = synthetic.maxSuffix[:0]
and explicitly clear out the other fields of i.synthetic
2469966
to
b6f987a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 13 files reviewed, 11 unresolved discussions (waiting on @12, @13, @jbowens, and @xinhaoz)
sstable/reader_iter_single_lvl.go
line 801 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: wrap the comment lines at ~80-100 chars, and format the TODO prefix as
TODO(sachin): ...
Done.
sstable/reader_iter_single_lvl.go
line 819 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: I think this should say something like "However, we can only perform ..."
(also wrap at 80-100 chars)
Done.
sstable/reader_iter_single_lvl.go
line 1299 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: let's put a comment above this if statement explaining the optimization and that SeekPrefixGE may have returned a synthetic key with the "smallest" (smallest according to base.Compare) suffix contained in the sstable, avoiding performing the actual seek. If the caller is calling Next, they want to move past the synthetic key and Next is responsible for performing the actual seek.
Done.
sstable/reader_iter_two_lvl.go
line 861 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Ah I think I was confusing the code structure of twoLevelIterator and singleLevelIterator.
Can we extract the body of the seek prefix GE and share the code between
twoLevelIterator.SeekPrefixGE
andtwoLevelIterator.Next
?
Done.
sstable/reader_iter_two_lvl.go
line 429 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: ditto the comment on the singleLevelIterator
Done.
sstable/reader_iter_two_lvl.go
line 436 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: let's move checking this condition to be the innermost. the vast majority of the time this condition will be met, whereas the other conditions (eg, the seek key's suffix being < the max suffix) are less likely to be met.
Done.
sstable/reader_iter_two_lvl.go
line 448 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
in order to use the
dst
parameter to reduce allocations, we need to retain a buffer between calls toSeekPrefixGE
. We can add amaxSuffixBuf []byte
toi.synthetic
.Then we would do something like:
maxSuffix, ok, err = i.secondLevel.maximumSuffixProperty.Extract(i.synthetic.maxSuffixBuf[:0], unsafe.Slice(unsafe.StringData(prop), len(prop))) if err != nil { i.secondLevel.err = err return nil } else if ok { i.synthetic.maxSuffixBuf = maxSuffix }
I completely missed the point of reducing allocations with buffer. Thanks for this, made changes as suggested.
testdata/iter_histories/probe_prefix
line 178 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
these
"OpSeekPrefixGE"
prefixes are a bit confusing. how aboutcombined-iter probe-points=(4,(Log "000004.")) probe-rangedels=(4,(Log "000004.RangeDels."))
(and analogous changes throughout the test file)
Agreed, Done.
internal/itertest/testdata/probes
line 26 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
I'm a bit confused as to what this test case is seeking to test?
I added a test case for testing dsl function 'UserKey' since there wasn't one I believe. Made some change to the test case to make it more clear.
iterator_histories_test.go
line 358 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: remove empty line
Done.
iterator_histories_test.go
line 367 at r7 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: remove empty line
Done.
This commit adds probe tests to iterator_histories, also adds unit tests for the new optimization.
b6f987a
to
1f2d281
Compare
db: SeekPrefixGE lazy positioning
This commit implements the lazy positioning of SeekPrefixGE. During SeekPrefixGE, previously we had to perform the seek in each level.
We loaded the heap with all the seek results we got and returned the next entry, which would be at the top of the heap.
This optimization implements the idea of deferring an actual seek unless needed. We use the Block Interval Property to extract the latest suffix from the table, and create a fake or a synthetic key instead of actually loading the block, we do this if the following requirements are met
If the conditions are met we create a synthetic key, mark it as one and return it. We will keep it as a synthetic key unless we rise to the top of the heap, if we are at the top of the heap we must perform the actual seek. The following are the realized benefits of this optimization.
Here are some disadvantages of this optimization
Implements #2002
db: add probe tests to TestIterHistories
This commit adds probe tests to iterator_histories, also adds unit tests for the new optimization.