Skip to content

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Sep 3, 2025

Summary: Add a new option MultiScanArgs::max_prefetch_size that limits the memory usage of per file pinning of prefetched blocks. Note that this only accounts for compressed block size. This is intended to be a stopgap until we implement some kind of global prefetch manager that limits the global multiscan memory usage.

Test plan: new unit test ./block_based_table_reader_test --gtest_filter="*MultiScanPrefetchSizeLimit/*"

@meta-cla meta-cla bot added the CLA Signed label Sep 3, 2025
@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this in D81630629.

@cbi42 cbi42 requested review from anand1976 and krhancoc September 4, 2025 03:02
Comment on lines 985 to 986
Status s;
s.PermitUncheckedError();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems all the usage of Status is local. I would recommend to just define the status on the place we use it, instead of define one at the top of the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link

@xingbowang xingbowang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor feedback. The rest looks good.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this in D81630629.

s = table_->LookupAndPinBlocksInCache<Block_kData>(

// Check if we would exceed the prefetch size limit with this block
uint64_t block_size = data_block_handle.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This doesn't account for the trailer. Maybe use BlockSizeWithTrailer()?

// prefetch size. When the limit is exceeded, iterator will return
// Status::PrefetchLimitReached().
//
// Note that this limit is per file and is on compressed block size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify in the comment that the prefetch happens only once, to distinguish it from ReadOptions::readahead_size which applies anytime the iterator does IO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Check if we would exceed the prefetch size limit with this block
uint64_t block_size = data_block_handle.size();
total_prefetch_size += block_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be done only if the block doesn't exist in the block cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we want to limit memory usage of pinned blocks too.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this in D81630629.

@facebook-github-bot
Copy link
Contributor

@cbi42 merged this pull request in a805c9b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants