-
Notifications
You must be signed in to change notification settings - Fork 3.2k
feat(dbt): add database and schema pattern filtering support #14689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(dbt): add database and schema pattern filtering support #14689
Conversation
- Add database_pattern and schema_pattern config fields with AllowDenyPattern support - Enhance _is_allowed_node() to filter nodes by database and schema in addition to node names - Add comprehensive integration tests for new filtering capabilities - Support combined filtering patterns for fine-grained dbt ingestion control
b7cbd87
to
d702148
Compare
Bundle ReportChanges will decrease total bundle size by 645 bytes (-0.0%) ⬇️. This is within the configured threshold ✅ Detailed changes
Affected Assets, Files, and Routes:view changes for bundle: datahub-react-web-esmAssets Changed:
|
05d2222
to
d702148
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
metadata-ingestion/src/datahub/ingestion/source/dbt/dbt_common.py
Outdated
Show resolved
Hide resolved
Hello @abdullahtariqq , thank you for sharing this contribution. I think the feature is a bit complex, due to the details of
There is an open question whether it should be done against nodes of type The
Is this aligned with the use-case you had in mind when creating it? Of course golden files need to be aligned properly. |
Overview
This PR enhances the dbt ingestion source with fine-grained filtering capabilities by adding database_pattern and schema_pattern configuration options. Users can now filter dbt nodes not only by node names but also by their database and schema attributes.
Changes Made
🔧 Core Implementation:
_is_allowed_node()
method to evaluate database and schema patterns alongside existing node name filteringallow_all()
📝 Configuration Examples:
Filter by schema (e.g., only production schemas)
Filter by database (e.g., specific BigQuery project)
Combined filtering
Use Cases