Skip to content

Conversation

hqbhoho
Copy link
Contributor

@hqbhoho hqbhoho commented Sep 5, 2025

Description

ReOpening #24620 since it went stale.
Fixes #16946

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix column level lineage miss when use unnest. ({issue}`16946`)

@cla-bot cla-bot bot added the cla-signed label Sep 5, 2025
@hqbhoho
Copy link
Contributor Author

hqbhoho commented Sep 5, 2025

@Praveen2112 Could you help review it? thanks

@hqbhoho hqbhoho requested a review from Praveen2112 September 5, 2025 06:05
@hqbhoho hqbhoho force-pushed the feature/add_unnest_lineage branch from e3311e1 to f6b8286 Compare September 5, 2025 06:24
@hqbhoho hqbhoho force-pushed the feature/add_unnest_lineage branch 2 times, most recently from 24a364a to c01ecc0 Compare September 8, 2025 03:56
@hqbhoho hqbhoho requested a review from chenjian2664 September 8, 2025 05:09
@hqbhoho hqbhoho force-pushed the feature/add_unnest_lineage branch from c01ecc0 to 34a0097 Compare September 8, 2025 14:59
@hqbhoho hqbhoho force-pushed the feature/add_unnest_lineage branch from 34a0097 to 34b5b3c Compare September 18, 2025 01:14
@chenjian2664
Copy link
Contributor

I think overall is good, but I am not familiar and confident in lineage part. still needs @Praveen2112 to take a look

new OutputColumnMetadata("test_varchar", VARCHAR_TYPE, ImmutableSet.of(new ColumnDetail("mock", "default", "tests_table_unnest", "test_varchar_array"))),
new OutputColumnMetadata("test_bigint", BIGINT_TYPE, ImmutableSet.of(new ColumnDetail("mock", "default", "tests_table_unnest", "test_bigint"))));
assertLineage(
"SELECT test_varchar_unnest AS test_varchar, test_bigint_unnest AS test_bigint FROM mock.default.tests_table_unnest CROSS JOIN UNNEST(test_varchar_array) WITH ORDINALITY AS t(test_varchar_unnest, test_bigint_unnest)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we if provide t(test_varchar_unnest, test_bigint_unnest, row_number) and try to assert the lineage of the row number ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If use row number column, this column lineage is empty.

Comment on lines 1672 to 1680
for (Field field : outputFields) {
for (Map.Entry<NodeRef<Expression>, List<Field>> entry : mappings.entrySet()) {
Expression expression = entry.getKey().getNode();
List<Field> fields = entry.getValue();
if (fields.contains(field)) {
analysis.addSourceColumns(field, analysis.getExpressionSourceColumns(expression));
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for this for loop - instead we could add them as process each expression right ? Like

                expressionOutputs.forEach(field -> analysis.addSourceColumns(field, analysis.getExpressionSourceColumns(expression)));

And similarly for cardinality as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, only add them as process each expression. Thank you for your feedback!

@hqbhoho hqbhoho force-pushed the feature/add_unnest_lineage branch from 34b5b3c to b6013e0 Compare September 23, 2025 01:13
@hqbhoho hqbhoho requested a review from Praveen2112 September 23, 2025 01:36
@hqbhoho hqbhoho force-pushed the feature/add_unnest_lineage branch from b6013e0 to 36f52e1 Compare September 23, 2025 02:07
@hqbhoho
Copy link
Contributor Author

hqbhoho commented Sep 23, 2025

@Praveen2112 PTAL, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Column level Lineage will missing when use Unnest
3 participants