Skip to content

arrow_reader_row_filter benchmark doesn't capture page cache improvements #7460

@alamb

Description

@alamb

We are trying to improve the performance of row filter application and part of that is a benchmark that we can use to guide optimization efforts.

cargo bench --all-features --bench arrow_reader_row_filter

However, as shown in #7428 we have a case where we see the performance benefit when running an end to end query in datafusion but the same improvement is not seen in the benchmark.

This ticket tracks figuring out why the benchmark doesn't show an improvement even when the end to end query does.

Interesting, the decoder cache doesn't seem to help much on my test machine (which is some crappy gcp VM). I couldn't reproduce the results listed on #7363 (comment) 🤔

Thank you @alamb , it seems no obvious improvement compares to main. This branch only improve PointLookup for 1000000 line big data set comparing to original better-decode.

I agree, we need to find how to mock clickbench result from arrow-rs side.

Originally posted by @zhuqi-lucas in #7428 (comment)

Metadata

Metadata

Assignees

Labels

arrowChanges to the arrow cratedocumentationImprovements or additions to documentationparquetChanges to the parquet crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions