Skip to content

Conversation

AdamGS
Copy link
Contributor

@AdamGS AdamGS commented Sep 10, 2025

Which issue does this PR close?

Rationale for this change

Since Arrow 56 and #16690, support has been added to the two smaller decimal variants. This PR tries to bring support to them on-par with the support for the bigger types.

I've run into this issue when upgrading Vortex to use the upcoming release (50), one of our benchmarks fails with:

Error during planning: Execution error: Function 'sum' user-defined coercion failed with "Execution error: Sum not supported for Decimal32(7, 2)" No function matches the given name and argument types 'sum(Decimal32(7, 2))'. You might need to add explicit type casts.
	Candidate functions:
	sum(UserDefined)

What changes are included in this PR?

Adding support for handling decimal32/64 wherever other decimal types are handled. I've tried to use multiple search methods to find places where they matter, but there might be some matches/if conditions that make them hard to find.

I've also implemented support for AVG, SUM, FIRST_VALUE/LAST_VALUE and MIN/MAX. but I stopped pushing MEDIAN and MIN_MAX as I'm not sure if its desirable (at all) or as part of this PR. Seems like implementing these aggregations is required to get the test suite to ✅, so I'm working on adding everything.

Are these changes tested?

The testing strategy to some of these parts is unclear to me, both around SQL but also for aggregate functions like SUM and AVG.

Are there any user-facing changes?

Code that used to fail while reading some decimals should now work. Depending on the result of the discussion here, might change the arrow-type of decimal columns created through the SQL interface.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions common Related to common crate proto Related to proto crate functions Changes to functions implementation labels Sep 10, 2025
@AdamGS AdamGS force-pushed the adamg/extended-decimal-support branch 3 times, most recently from e8cf5e2 to 4bad6d6 Compare September 10, 2025 10:18
@github-actions github-actions bot added the core Core DataFusion crate label Sep 10, 2025
@AdamGS AdamGS force-pushed the adamg/extended-decimal-support branch 2 times, most recently from d2e7917 to 3bba269 Compare September 10, 2025 12:40
@AdamGS AdamGS marked this pull request as ready for review September 10, 2025 12:40
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @AdamGS

This PR looks good to me. Even if we missed some places, I think it's okay to add them as follow-up PRs.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes do look fine from what I see; I think we should add some slt tests with these decimal types. Can use #17560 as reference for some of the existing aggregations like regular avg, min/max, etc.

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 15, 2025

@Jefffrey thanks for pointing those out, I'll add some tests, should take too long.

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 15, 2025

One issue that is potentially breaking for backwards compatibility and is also an issue for slt tests, is when parsing SQL we use the precision and scale to infer the decimal size, if we add new types there it might change the behavior of existing code (will parse SQL that used to create decimal128 into 64/32). What do you think?

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 15, 2025

Another issue I've just run into - casting support in arrow-rs was only released in 56.1, so this PR should probably only merge once that's available here. The dependabot PR seems to have some failures, I'll take that, hopefully its nothing too hard. Here is a PR making that upgrade.

@Jefffrey
Copy link
Contributor

Jefffrey commented Sep 15, 2025

One issue that is potentially breaking for backwards compatibility and is also an issue for slt tests, is when parsing SQL we use the precision and scale to infer the decimal size, if we add new types there it might change the behavior of existing code (will parse SQL that used to create decimal128 into 64/32). What do you think?

This is a good pickup; frankly I'm not sure what is the best course of action either 😅

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 15, 2025

Assuming support for decimal32/64 is desirable, I think the best course of action is:

  1. Initially - the SQL interface just won't support creating decimal32/64 columns. Merge this PR as-is (once we have arrow 56.1.0 for casting).
  2. Open a separate PR to make the breaking change to the SQL interface, but only merge it closer to the next major release (51, should be late November).

We can also decide to merge this PR (with the addition of the SQL support and SLT tests) closer to the 51 release, but personally I feel like its better to at least have the basic support merged so its actually used, which will help us find other missing functionality or potential bugs.

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 15, 2025

This is a similar but more major change, we can have a transition period with a dedicated config. I'm not a fan but I'll yield to people with more experience here if they think that's required.

@Jefffrey
Copy link
Contributor

Assuming support for decimal32/64 is desirable, I think the best course of action is:

1. Initially - the SQL interface just won't support creating decimal32/64 columns. Merge this PR as-is (once we have arrow 56.1.0 for casting).

2. Open a separate PR to make the breaking change to the SQL interface, but only merge it closer to the next major release (51, should be late November).

We can also decide to merge this PR (with the addition of the SQL support and SLT tests) closer to the 51 release, but personally I feel like its better to at least have the basic support merged so its actually used, which will help us find other missing functionality or potential bugs.

Steps 1 & 2 sound good, if we can merge the majority of decimal 32/64 work in this PR and isolate that specific SQL parsing change into a separate issue/PR so it would be easier to review & discuss (a presumably smaller PR)

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 15, 2025

SGTM. I'll open a follow-up PR with the the changes to datafusion-sql, I think that for them to work well I'll still need #17275 (unsurprisingly - @alamb beat me to upgrading arrow and parquet), but seems like its pretty close to being ready.

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 16, 2025

@xudong963 while working on the SQL part I've found some more parts I've missed, so I pushed them here in the last commit.
I've also found a bug in Arrow that breaks casting with decimal64 (more specifically - when using SQL to insert a literal floating-point value), so that part is essentially blocked on that in addition to the 56.1 release.

@alamb
Copy link
Contributor

alamb commented Sep 17, 2025

@xudong963 while working on the SQL part I've found some more parts I've missed, so I pushed them here in the last commit. I've also found a bug in Arrow that breaks casting with decimal64 (more specifically - when using SQL to insert a literal floating-point value), so that part is essentially blocked on that in addition to the 56.1 release.

Update is the fix for the arrow bug apache/arrow-rs#8363 will be in arrow 56.2.0. I hope to make an RC later today or tomorrow.
I have already prepared an upgrade here:

@Jefffrey
Copy link
Contributor

I've merged #17560 to main so will need some adjustments to this PR 😅

@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 18, 2025

always seemed like a reasonable order of merging to me! once I fix this one I think it should be ready to be merged, with the follow-up only merged after arrow is upgraded to 56.2 (I also plan on opening it tomorrow)

@AdamGS AdamGS force-pushed the adamg/extended-decimal-support branch from 4242ed0 to b2b243b Compare September 18, 2025 12:14

use arrow::array::{new_empty_array, Array};
use arrow::compute::can_cast_types;
use arrow::datatypes::{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely missed this whole are on previous passes

| DataType::Decimal32(_, _)
| DataType::Decimal64(_, _)
| DataType::Decimal128(_, _)
| DataType::Decimal256(_, _)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was just overlooked here, because the group accumulator does exist.

@AdamGS AdamGS requested a review from Jefffrey September 18, 2025 13:03
@AdamGS
Copy link
Contributor Author

AdamGS commented Sep 18, 2025

I think this PR is now ready for review (assuming all tests pass), I have found a few more spots I missed (see the last commit) and rebased on top of @Jefffrey's #17560.
I hope to have the follow up ready later today or tomorrow, which will have more SLT style tests.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb alamb added this pull request to the merge queue Sep 19, 2025
@alamb
Copy link
Contributor

alamb commented Sep 19, 2025

Thanks again @AdamGS and @Jefffrey

Merged via the queue into apache:main with commit 44cd972 Sep 19, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate functions Changes to functions implementation logical-expr Logical plan and expressions proto Related to proto crate sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decimal32/64 aren't as well supported as as the 128 and 256 bit variants
4 participants