fix(block_list): Speed up Split method #5748

BagritsevichStepan · 2025-08-29T20:20:45Z

In the Split method we had a rather inefficient algorithm.

First, we used Insert, which performs a binary search to find the block and then inserts a new entry. However, during splitting we already have the document IDs sorted, meaning every new entry is simply appended to the end of the block list.

The second issue was that Insert always triggered block splitting inside BlockList, which is a linear operation. Together, these introduced significant overhead in the split method.

To address this, I added a PushBack method to BlockList that bypasses binary search and avoids block splitting.

BagritsevichStepan · 2025-08-29T20:29:07Z

Before:

Benchmark	Time (ns)	CPU (ns)	Iterations
BM_SearchRangeTreeSplits/block_size:100000	58,746,631	58,736,587	10
BM_SearchRangeTreeSplits/block_size:1000000	569,836,676	569,791,624	1

After:

Benchmark	Time (ns)	CPU (ns)	Iterations
BM_SearchRangeTreeSplits/block_size:100000	18,841,322	18,837,811	39
BM_SearchRangeTreeSplits/block_size:1000000	193,917,459	193,877,924	4

src/core/search/block_list.cc

romange · 2025-08-30T08:30:44Z

src/core/search/block_list.cc


-  std::nth_element(all_entries.begin(), all_entries.begin() + initial_size / 2, all_entries.end(),
-                   [](const Entry& l, const Entry& r) { return l.second < r.second; });
+  std::nth_element(entries_indexes.begin(), entries_indexes.begin() + elements_count / 2,


I am confused. Are all elements already sorted by the score? you use PushBack because they are sorted?
In that case, why do you need to partial sort entries_indexes here? would not it be just advancing the iterator of block_list by elements_count / 2 steps?

The elements are std::pair<DocId, double>, meaning we store a numeric value associated with its doc id. In the BlockList, elements are naturally sorted by DocId. Therefore, to find the median, we need to "sort" them by their values instead

src/core/search/block_list.cc

Signed-off-by: Stepan Bagritsevich <[email protected]>

BagritsevichStepan requested review from romange and dranikpg August 29, 2025 20:20

BagritsevichStepan self-assigned this Aug 29, 2025