-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Support Decimal32/64 types #17501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Decimal32/64 types #17501
Conversation
e8cf5e2
to
4bad6d6
Compare
d2e7917
to
3bba269
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @AdamGS
This PR looks good to me. Even if we missed some places, I think it's okay to add them as follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes do look fine from what I see; I think we should add some slt tests with these decimal types. Can use #17560 as reference for some of the existing aggregations like regular avg, min/max, etc.
@Jefffrey thanks for pointing those out, I'll add some tests, should take too long. |
One issue that is potentially breaking for backwards compatibility and is also an issue for slt tests, is when parsing SQL we use the precision and scale to infer the decimal size, if we add new types there it might change the behavior of existing code (will parse SQL that used to create decimal128 into 64/32). What do you think? |
Another issue I've just run into - casting support in arrow-rs was only released in 56.1, so this PR should probably only merge once that's available here. |
This is a good pickup; frankly I'm not sure what is the best course of action either 😅 |
Assuming support for decimal32/64 is desirable, I think the best course of action is:
We can also decide to merge this PR (with the addition of the SQL support and SLT tests) closer to the 51 release, but personally I feel like its better to at least have the basic support merged so its actually used, which will help us find other missing functionality or potential bugs. |
This is a similar but more major change, we can have a transition period with a dedicated config. I'm not a fan but I'll yield to people with more experience here if they think that's required. |
Steps 1 & 2 sound good, if we can merge the majority of decimal 32/64 work in this PR and isolate that specific SQL parsing change into a separate issue/PR so it would be easier to review & discuss (a presumably smaller PR) |
3bba269
to
16f8e50
Compare
@xudong963 while working on the SQL part I've found some more parts I've missed, so I pushed them here in the last commit. |
Update is the fix for the arrow bug apache/arrow-rs#8363 will be in arrow 56.2.0. I hope to make an RC later today or tomorrow. |
I've merged #17560 to main so will need some adjustments to this PR 😅 |
always seemed like a reasonable order of merging to me! once I fix this one I think it should be ready to be merged, with the follow-up only merged after arrow is upgraded to 56.2 (I also plan on opening it tomorrow) |
4242ed0
to
b2b243b
Compare
|
||
use arrow::array::{new_empty_array, Array}; | ||
use arrow::compute::can_cast_types; | ||
use arrow::datatypes::{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely missed this whole are on previous passes
| DataType::Decimal32(_, _) | ||
| DataType::Decimal64(_, _) | ||
| DataType::Decimal128(_, _) | ||
| DataType::Decimal256(_, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was just overlooked here, because the group accumulator does exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Which issue does this PR close?
Rationale for this change
Since Arrow 56 and #16690, support has been added to the two smaller decimal variants. This PR tries to bring support to them on-par with the support for the bigger types.
I've run into this issue when upgrading Vortex to use the upcoming release (50), one of our benchmarks fails with:
What changes are included in this PR?
Adding support for handling decimal32/64 wherever other decimal types are handled. I've tried to use multiple search methods to find places where they matter, but there might be some matches/if conditions that make them hard to find.
I've also implemented support for
AVG
,SUM
,FIRST_VALUE
/LAST_VALUE
andMIN
/MAX
.but I stopped pushingSeems like implementing these aggregations is required to get the test suite to ✅, so I'm working on adding everything.MEDIAN
andMIN_MAX
as I'm not sure if its desirable (at all) or as part of this PR.Are these changes tested?
The testing strategy to some of these parts is unclear to me, both around SQL but also for aggregate functions like
SUM
andAVG
.Are there any user-facing changes?
Code that used to fail while reading some decimals should now work. Depending on the result of the discussion here, might change the arrow-type of decimal columns created through the SQL interface.