Skip to content

Conversation

askumar27
Copy link
Contributor

refactor(ingestion): looker source migration to use SDKv2 entities

  • Looker ingestion source from the legacy MCE/MCP approach to the modern SDKv2 architecture, the refactor is done in best effort as there are entities which SDK v2 does not support yet like tags. In those cases changes have been made to emit MCPs instead of MCEs.
  • Functionality of the source remains largely unchanged

Changes

  • Migrated from ChartSnapshot/DashboardSnapshot to SDKv2 Chart/Dashboard classes
  • Replaced MCP-only emissions with direct Entity objects
  • Added @DataClass for DashboardProcessingResult

Test Files Updated

  • 15 golden test files updated to reflect new SDKv2 output format
  • test_looker.py - Minor test adjustments

Key Test Updates

  • Removed deprecated charts field from dashboard entities
  • Updated container aspects for charts and dashboards
  • Added data platform instance aspects
  • Removed legacy browse paths in favor of SDKv2 approach

Breaking Changes

  • Output Format Changes
  • Chart entities now include container aspects
  • Browse path handling updated to SDKv2 format
  • Embed URLs now properly formatted for SDKv2

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Sep 5, 2025
@askumar27 askumar27 requested a review from asikowitz September 5, 2025 22:07
@github-actions github-actions bot added the community-contribution PR or Issue raised by member(s) of DataHub Community label Sep 5, 2025
Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 87.83069% with 23 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...c/datahub/ingestion/source/looker/looker_source.py 88.09% 20 Missing ⚠️
...c/datahub/ingestion/source/looker/looker_common.py 85.71% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

alwaysmeticulous bot commented Sep 5, 2025

✅ Meticulous spotted 0 visual differences across 1472 screens tested: view results.

Meticulous evaluated ~8 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit de36060. This comment will update as new commits are pushed.

Copy link

codecov bot commented Sep 5, 2025

Bundle Report

Bundle size has no change ✅

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

actionlint

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:5:134: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:3:117: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:2:73: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:3:50: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:6:27: Double quote to prevent globbing and word splitting [shellcheck]


🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2001:style:2:7: See if you can use ${variable//search/replace} instead [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:2:12: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:3:20: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]


📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:17: Double quote to prevent globbing and word splitting [shellcheck]

- name: Download build Metadata for latest head build
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:3:51: Double quote to prevent globbing and word splitting [shellcheck]

- name: Collect image:tag from build log
id: collect-images
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2116:style:4:22: Useless echo? Instead of 'cmd $(echo foo)', just use 'cmd foo' [shellcheck]

- name: Collect image:tag from build log
id: collect-images
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:4:29: Double quote to prevent globbing and word splitting [shellcheck]

- name: Download build Metadata for latest head build
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:3:51: Double quote to prevent globbing and word splitting [shellcheck]

- name: Collect image:tag from build log
id: collect-images
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2116:style:4:22: Useless echo? Instead of 'cmd $(echo foo)', just use 'cmd foo' [shellcheck]

- name: Download build Metadata for latest head build
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:3:51: Double quote to prevent globbing and word splitting [shellcheck]

- name: Collect image:tag from build log
id: collect-images
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2116:style:4:22: Useless echo? Instead of 'cmd $(echo foo)', just use 'cmd foo' [shellcheck]

- name: Collect image:tag from build log
id: collect-images
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:4:29: Double quote to prevent globbing and word splitting [shellcheck]

@@ -608,7 +610,7 @@ jobs:
path: ${{ github.workspace }}/build

- name: Push images from depot builder
if: ${{ steps.tests_passed.outputs.tests_passed == 'true' && needs.setup.outputs.use_depot_cache == 'true' && (needs.setup.outputs.publish == 'true' || needs.setup.outputs.pr-publish == 'true' ) }}
if: ${{ steps.tests_passed.outputs.tests_passed == 'true' && needs.setup.outputs.use_depot_cache == 'true' && needs.setup.outputs.publish == 'true' }}
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:5:102: Double quote to prevent globbing and word splitting [shellcheck]

@@ -608,7 +610,7 @@ jobs:
path: ${{ github.workspace }}/build

- name: Push images from depot builder
if: ${{ steps.tests_passed.outputs.tests_passed == 'true' && needs.setup.outputs.use_depot_cache == 'true' && (needs.setup.outputs.publish == 'true' || needs.setup.outputs.pr-publish == 'true' ) }}
if: ${{ steps.tests_passed.outputs.tests_passed == 'true' && needs.setup.outputs.use_depot_cache == 'true' && needs.setup.outputs.publish == 'true' }}
run: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:5:116: Double quote to prevent globbing and word splitting [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
property "extra_pip_extras" is not defined in object type {extra_pip_constraints: string; extra_pip_requirements: string; python-version: number} [expression]

run: ./gradlew -Pextra_pip_requirements='${{ matrix.extra_pip_requirements }}' -Pextra_pip_constraints='${{ matrix.extra_pip_constraints }}' -Pextra_pip_extras='${{ matrix.extra_pip_extras }}' :metadata-ingestion-modules:airflow-plugin:build

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2006:style:1:21: Use $(...) notation instead of legacy backticks ... [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:360: Double quote to prevent globbing and word splitting [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2006:style:2:22: Use $(...) notation instead of legacy backticks ... [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:2:155: Double quote to prevent globbing and word splitting [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2086:info:1:63: Double quote to prevent globbing and word splitting [shellcheck]

run: echo "NAME_TZ=$(echo ${{ matrix.timezone }} | tr '/' '-')" >> $GITHUB_ENV

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2129:style:2:3: Consider using { cmd1; cmd2; } >> file instead of individual redirects [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2129:style:6:3: Consider using { cmd1; cmd2; } >> file instead of individual redirects [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2129:style:2:3: Consider using { cmd1; cmd2; } >> file instead of individual redirects [shellcheck]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [actionlint] reported by reviewdog 🐶
shellcheck reported issue in this script: SC2129:style:7:3: Consider using { cmd1; cmd2; } >> file instead of individual redirects [shellcheck]

Copy link
Collaborator

@asikowitz asikowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks pretty reasonable, nice job! My main concern is around the conversion from MCE -> MCP, that we could in the future drop other events we don't mean to

message="Dropped due to being a personal folder",
context=f"Dashboard ID: {dashboard_object.id}",
)
assert dashboard_object.id is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. We may have done this in the past but in general want to make our ingestion connectors more robust

self, folder: LookerFolder, include_current_folder: bool = True
) -> Iterable[str]:
for ancestor in self.looker_api.folder_ancestors(folder_id=folder.id):
assert ancestor.id # to make the linter happy as `Folder` has id field marked optional - which is always returned by the API
Copy link

@aikido-pr-checks aikido-pr-checks bot Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.

Remediation: Raise an exception instead of using assert.
View details in Aikido Security

message="Dropped due to being a personal folder",
context=f"Dashboard ID: {dashboard_object.id}",
)
assert dashboard_object.id is not None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dangerous use of assert - low severity
When running Python in production in optimized mode, assert calls are not executed. This mode is enabled by setting the PYTHONOPTIMIZE command line flag. Optimized mode is usually ON in production. Any safety check done using assert will not be executed.

Remediation: Raise an exception instead of using assert.
View details in Aikido Security

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants