feat: support partition by for iceberg table engine #21594

chenzl25 · 2025-04-27T10:06:12Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR reuses the partition_by options as how it used by the iceberg sink, so users can define their partition by columns.
We restrict the partition by column should be the prefix of the primary key.
Also, we will try to drop the table if the internal iceberg sink and iceberg source creation fail.

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

My PR needs documentation updates.

Release note

…on_by_for_iceberg_table_engine

github-actions · 2025-04-27T10:06:49Z

Hi, there.

📝 Telemetry Reminder:
If you're implementing this feature, please consider adding telemetry metrics to track its usage. This helps us understand how the feature is being used and improve it further.
You can find the function report_event of telemetry reporting in the following files. Feel free to ask questions if you need any guidance!

src/frontend/src/telemetry.rs
src/meta/src/telemetry.rs
src/stream/src/telemetry.rs
src/storage/compactor/src/telemetry.rs
Or calling report_event_common (src/common/telemetry_event/src/lib.rs) as if finding it hard to implement.
✨ Thank you for your contribution to RisingWave! ✨

This is an automated comment created by the peaceiris/actions-label-commenter. Responding to the bot or mentioning it won't have any effect.

…_engine

chenzl25 · 2025-05-21T02:19:14Z

Could you please take a look? cc @Li0k @xxchan @xxhZs

Copilot

Pull Request Overview

This PR enhances the Iceberg table engine support by allowing users to specify partition-by columns and ensuring that these columns are prefixes of the primary key. Key changes include:

Parsing and validating the partition_by option using a regex to support various formats (e.g., column, transform(column), transform(n,column)).
Dropping the partially created table when either sink or source creation fails to handle DDL atomicity limitations.
Adding end-to-end tests for various partition_by scenarios.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/frontend/src/handler/create_table.rs	Implements partition_by option parsing, validation (including prefix check on primary key), and error handling via table drop on failure.
e2e_test/iceberg/test_case/pure_slt/iceberg_engine.slt	Updates and extends tests to validate partition_by behavior and error cases.

Comments suppressed due to low confidence (1)

src/frontend/src/handler/create_table.rs:1724

[nitpick] The regex pattern used for parsing partition fields is quite complex; consider adding inline comments or refactoring it for improved clarity and maintainability.

let re = Regex::new(r"(?<transform>\w+)(\(((?<n>\d+)?(?:,|(,\s)))?(?<field>\w+)\))?").unwrap();

xxchan · 2025-05-21T02:39:01Z

src/frontend/src/handler/create_table.rs

+    if let Some(partition_by) = &partition_by {
+        // captures column, transform(column), transform(n,column), transform(n, column)
+        let re =
+            Regex::new(r"(?<transform>\w+)(\(((?<n>\d+)?(?:,|(,\s)))?(?<field>\w+)\))?").unwrap();
+        if !re.is_match(partition_by) {
+            bail!(format!(
+                "Invalid partition fields: {}\nHINT: Supported formats are column, transform(column), transform(n,column), transform(n, column)",
+                partition_by
+            ))
+        }


These code are copy-pasted from sink? 🤔

Not sure whether we need keep both of them. If needed, maybe better to have a common fn(String) -> Vec<(String, Transform)> instead

xxchan · 2025-05-21T08:57:18Z

e2e_test/iceberg/test_case/pure_slt/iceberg_engine.slt

+statement ok
+create table t_partition2(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='c1,c2') engine = iceberg;
+
+statement ok
+create table t_partition3(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='bucket(4, c1),c2') engine = iceberg;
+
+statement ok
+create table t_partition4(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='c1,truncate(8, c2)') engine = iceberg;
+
+# the partition key should be the prefix of the primary key
+statement error
+create table t_partition5(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='c2,c3') engine = iceberg;


Ideally validating partitions here. But maybe after support sth like https://iceberg.apache.org/docs/nightly/spark-queries/#partitions (apache/iceberg-rust#823)

chenzl25 added 2 commits April 27, 2025 16:53

support parititon by

40fb0b7

Merge remote-tracking branch 'origin/main' into dylan/support_partiti…

0d90bda

…on_by_for_iceberg_table_engine

chenzl25 requested review from wcy-fdu and xxhZs April 27, 2025 10:06

github-actions bot added type/feature Type: New feature. ci/run-e2e-iceberg-tests labels Apr 27, 2025

chenzl25 added the user-facing-changes Contains changes that are visible to users label Apr 27, 2025

fmt

1513263

chenzl25 requested review from Li0k and xxchan April 29, 2025 03:16

chenzl25 and others added 3 commits May 16, 2025 18:20

resolve conflicts

b2be69c

fix

628bd0d

Merge branch 'main' into dylan/support_partition_by_for_iceberg_table…

79e8564

…_engine

xxchan requested a review from Copilot May 21, 2025 02:32

Copilot AI reviewed May 21, 2025

View reviewed changes

xxchan reviewed May 21, 2025

View reviewed changes

refactor

2737fe6

chenzl25 requested a review from xxchan May 21, 2025 08:00

graphite-app bot requested a review from a team May 21, 2025 08:16

xxchan reviewed May 21, 2025

View reviewed changes

xxchan approved these changes May 21, 2025

View reviewed changes

chenzl25 added 2 commits May 21, 2025 17:11

fmt

f3a06dc

fmt

17b0a78

chenzl25 added this pull request to the merge queue May 21, 2025

Merged via the queue into main with commit ca9474b May 21, 2025
30 of 31 checks passed

chenzl25 deleted the dylan/support_partition_by_for_iceberg_table_engine branch May 21, 2025 10:25

neverchanje mentioned this pull request May 21, 2025

Document: feat: support partition by for iceberg table engine risingwavelabs/risingwave-docs#461

Closed

kwannoel pushed a commit that referenced this pull request May 21, 2025

feat: support partition by for iceberg table engine (#21594)

282c0f1

WanYixian mentioned this pull request Jun 25, 2025

Document: partition by for iceberg table engine risingwavelabs/risingwave-docs#531

Closed

3 tasks

WanYixian mentioned this pull request Jul 18, 2025

Document: partition_by for iceberg table engine risingwavelabs/risingwave-docs#571

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support partition by for iceberg table engine #21594

feat: support partition by for iceberg table engine #21594

Uh oh!

chenzl25 commented Apr 27, 2025

Uh oh!

github-actions bot commented Apr 27, 2025

Uh oh!

chenzl25 commented May 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

xxchan May 21, 2025

Uh oh!

xxchan May 21, 2025 •

edited

Loading

Uh oh!

xxchan May 21, 2025

Uh oh!

Uh oh!

Uh oh!

feat: support partition by for iceberg table engine #21594

feat: support partition by for iceberg table engine #21594

Uh oh!

Conversation

chenzl25 commented Apr 27, 2025

What's changed and what's your intention?

Checklist

Documentation

Uh oh!

github-actions bot commented Apr 27, 2025

Uh oh!

chenzl25 commented May 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

xxchan May 21, 2025

Choose a reason for hiding this comment

Uh oh!

xxchan May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xxchan May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

xxchan May 21, 2025 •

edited

Loading