-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[opt](arm)Remove negative optimizations of SSE2NEON on memcmp for ARM #38759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 41755 ms
|
TPC-DS: Total hot run time: 170049 ms
|
ClickBench: Total hot run time: 30.15 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run buildall |
TPC-H: Total hot run time: 38153 ms
|
run performance |
TPC-H: Total hot run time: 38176 ms
|
TPC-DS: Total hot run time: 195793 ms
|
ClickBench: Total hot run time: 31.74 s
|
…#38759) The main issue is that _mm_movemask_epi8 does not have a one-to-one corresponding instruction on ARM. Testing shows that it performs worse compared to using memcmp, which allows the compiler to generate the corresponding ARM instructions. The following tests were conducted on ARM. ``` -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- BM_memequal16_sse 3.77 ns 3.77 ns 743238946 BM_memequal16_orgin 2.11 ns 2.11 ns 1000000000 ```
…apache#38759) The main issue is that _mm_movemask_epi8 does not have a one-to-one corresponding instruction on ARM. Testing shows that it performs worse compared to using memcmp, which allows the compiler to generate the corresponding ARM instructions. The following tests were conducted on ARM. ``` -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- BM_memequal16_sse 3.77 ns 3.77 ns 743238946 BM_memequal16_orgin 2.11 ns 2.11 ns 1000000000 ```
#43510) … (#38759) #38759 The main issue is that _mm_movemask_epi8 does not have a one-to-one corresponding instruction on ARM. Testing shows that it performs worse compared to using memcmp, which allows the compiler to generate the corresponding ARM instructions. The following tests were conducted on ARM. ``` -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- BM_memequal16_sse 3.77 ns 3.77 ns 743238946 BM_memequal16_orgin 2.11 ns 2.11 ns 1000000000 ```
The main issue is that _mm_movemask_epi8 does not have a one-to-one corresponding instruction on ARM. Testing shows that it performs worse compared to using memcmp, which allows the compiler to generate the corresponding ARM instructions.
The following tests were conducted on ARM.