Skip to content

Conversation

chenzl25
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • This PR reuses the partition_by options as how it used by the iceberg sink, so users can define their partition by columns.
  • We restrict the partition by column should be the prefix of the primary key.
  • Also, we will try to drop the table if the internal iceberg sink and iceberg source creation fail.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@chenzl25 chenzl25 requested review from wcy-fdu and xxhZs April 27, 2025 10:06
@chenzl25 chenzl25 added the user-facing-changes Contains changes that are visible to users label Apr 27, 2025
Copy link
Contributor

Hi, there.

📝 Telemetry Reminder:
If you're implementing this feature, please consider adding telemetry metrics to track its usage. This helps us understand how the feature is being used and improve it further.
You can find the function report_event of telemetry reporting in the following files. Feel free to ask questions if you need any guidance!

  • src/frontend/src/telemetry.rs
  • src/meta/src/telemetry.rs
  • src/stream/src/telemetry.rs
  • src/storage/compactor/src/telemetry.rs
    Or calling report_event_common (src/common/telemetry_event/src/lib.rs) as if finding it hard to implement.
    ✨ Thank you for your contribution to RisingWave! ✨

This is an automated comment created by the peaceiris/actions-label-commenter. Responding to the bot or mentioning it won't have any effect.

@chenzl25 chenzl25 requested review from Li0k and xxchan April 29, 2025 03:16
@chenzl25
Copy link
Contributor Author

Could you please take a look? cc @Li0k @xxchan @xxhZs

@xxchan xxchan requested a review from Copilot May 21, 2025 02:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the Iceberg table engine support by allowing users to specify partition-by columns and ensuring that these columns are prefixes of the primary key. Key changes include:

  • Parsing and validating the partition_by option using a regex to support various formats (e.g., column, transform(column), transform(n,column)).
  • Dropping the partially created table when either sink or source creation fails to handle DDL atomicity limitations.
  • Adding end-to-end tests for various partition_by scenarios.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/frontend/src/handler/create_table.rs Implements partition_by option parsing, validation (including prefix check on primary key), and error handling via table drop on failure.
e2e_test/iceberg/test_case/pure_slt/iceberg_engine.slt Updates and extends tests to validate partition_by behavior and error cases.
Comments suppressed due to low confidence (1)

src/frontend/src/handler/create_table.rs:1724

  • [nitpick] The regex pattern used for parsing partition fields is quite complex; consider adding inline comments or refactoring it for improved clarity and maintainability.
let re = Regex::new(r"(?<transform>\w+)(\(((?<n>\d+)?(?:,|(,\s)))?(?<field>\w+)\))?").unwrap();

Comment on lines 1722 to 1731
if let Some(partition_by) = &partition_by {
// captures column, transform(column), transform(n,column), transform(n, column)
let re =
Regex::new(r"(?<transform>\w+)(\(((?<n>\d+)?(?:,|(,\s)))?(?<field>\w+)\))?").unwrap();
if !re.is_match(partition_by) {
bail!(format!(
"Invalid partition fields: {}\nHINT: Supported formats are column, transform(column), transform(n,column), transform(n, column)",
partition_by
))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code are copy-pasted from sink? 🤔

Copy link
Collaborator

@xxchan xxchan May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether we need keep both of them. If needed, maybe better to have a common fn(String) -> Vec<(String, Transform)> instead

@chenzl25 chenzl25 requested a review from xxchan May 21, 2025 08:00
@graphite-app graphite-app bot requested a review from a team May 21, 2025 08:16
Comment on lines +279 to +290
statement ok
create table t_partition2(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='c1,c2') engine = iceberg;

statement ok
create table t_partition3(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='bucket(4, c1),c2') engine = iceberg;

statement ok
create table t_partition4(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='c1,truncate(8, c2)') engine = iceberg;

# the partition key should be the prefix of the primary key
statement error
create table t_partition5(c1 int, c2 int, c3 int, primary key(c1, c2, c3)) with(commit_checkpoint_interval = 1, partition_by='c2,c3') engine = iceberg;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally validating partitions here. But maybe after support sth like https://iceberg.apache.org/docs/nightly/spark-queries/#partitions (apache/iceberg-rust#823)

@chenzl25 chenzl25 added this pull request to the merge queue May 21, 2025
Merged via the queue into main with commit ca9474b May 21, 2025
30 of 31 checks passed
@chenzl25 chenzl25 deleted the dylan/support_partition_by_for_iceberg_table_engine branch May 21, 2025 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/run-e2e-iceberg-tests type/feature Type: New feature. user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants