Skip to content

Conversation

JiaqiWang18
Copy link
Contributor

@JiaqiWang18 JiaqiWang18 commented Sep 15, 2025

What changes were proposed in this pull request?

In SDP, the recommended scaffolding is to put pipeline definition files in the transformations and any sub folder.

Currently if users have both sql and py pipeline definition files, they would need to do something like below to specify all of them in the pipeline spec:

libraries:
  glob:
    include: transformations/*/.py
  glob:
    include: transformations/*/.sql

This is cumbersome and requires more work from the user. transformations should only contain pipeline source files ending in .py or .sql so ideally, users shouldn't even need to specify the file extensions.

PR introduces changes to support the below pattern for source file matching and throw exception to discourage user from using the above pattern because they shouldn't put other file types in this directory.

libraries:
  glob:
    include: transformations/** # matches recursively 

 

Why are the changes needed?

Simplify the user experience of needing to manually supply the glob with file extensions.

Does this PR introduce any user-facing change?

Yes, but SDP not released.

How was this patch tested?

New and existing tests and running CLI manually

Was this patch authored or co-authored using generative AI tooling?

NO

@JiaqiWang18 JiaqiWang18 force-pushed the SPARK-53591-restrict-sdp-glob-matching branch 3 times, most recently from 8d8ffe1 to d1abe2f Compare September 16, 2025 04:10
@JiaqiWang18 JiaqiWang18 force-pushed the SPARK-53591-restrict-sdp-glob-matching branch from d1abe2f to bb5a58f Compare September 17, 2025 21:43
@JiaqiWang18 JiaqiWang18 marked this pull request as ready for review September 18, 2025 01:06
@JiaqiWang18
Copy link
Contributor Author

@AnishMahto @cloud-fan

@cloud-fan
Copy link
Contributor

cc @sryza

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants