-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
- Part of [EPIC] Faster performance for parquet predicate evaluation for non selective filters #7456
- Part of Parquet decoder / decoded predicate / page / results Cache #7363
We are trying to improve the performance of row filter application and part of that is a benchmark that we can use to guide optimization efforts.
cargo bench --all-features --bench arrow_reader_row_filter
However, as shown in #7428 we have a case where we see the performance benefit when running an end to end query in datafusion but the same improvement is not seen in the benchmark.
This ticket tracks figuring out why the benchmark doesn't show an improvement even when the end to end query does.
Interesting, the decoder cache doesn't seem to help much on my test machine (which is some crappy gcp VM). I couldn't reproduce the results listed on #7363 (comment) 🤔
Thank you @alamb , it seems no obvious improvement compares to main. This branch only improve PointLookup for 1000000 line big data set comparing to original better-decode.
I agree, we need to find how to mock clickbench result from arrow-rs side.
Originally posted by @zhuqi-lucas in #7428 (comment)