Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 0 additions & 23 deletions metadata-ingestion/docs/sources/looker/looker.md

This file was deleted.

62 changes: 62 additions & 0 deletions metadata-ingestion/docs/sources/looker/looker_pre.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
### Pre-Requisites

#### Set up the right permissions
You need to provide the following permissions for ingestion to work correctly.
```
access_data
explore
manage_models
see_datagroups
see_lookml
see_lookml_dashboards
see_looks
see_pdts
see_queries
see_schedules
see_sql
see_system_activity
see_user_dashboards
see_users
```
Here is an example permission set after configuration.
![Looker DataHub Permission Set](./looker_datahub_permission_set.png)

#### Get an API key

You need to get an API key for the account with the above privileges to perform ingestion. See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.


### Ingestion through UI

The following video shows you how to get started with ingesting Looker metadata through the UI.

:::note

You will need to run `lookml` ingestion through the CLI after you have ingested Looker metadata through the UI. Otherwise you will not be able to see Looker Views and their lineage to your warehouse tables.

:::

<div
style={{
position: "relative",
paddingBottom: "57.692307692307686%",
height: 0
}}
>
<iframe
src="https://www.loom.com/embed/b8b9654e02714d20a44122cc1bffc1bb"
frameBorder={0}
webkitallowfullscreen=""
mozallowfullscreen=""
allowFullScreen=""
style={{
position: "absolute",
top: 0,
left: 0,
width: "100%",
height: "100%"
}}
/>
</div>


13 changes: 0 additions & 13 deletions metadata-ingestion/docs/sources/looker/lookml.md

This file was deleted.

11 changes: 11 additions & 0 deletions metadata-ingestion/docs/sources/looker/lookml_post.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#### Configuration Notes

:::note

The integration can use an SQL parser to try to parse the tables the views depends on.

:::

This parsing is disabled by default, but can be enabled by setting `parse_table_names_from_sql: True`. The default parser is based on the [`sqllineage`](https://pypi.org/project/sqllineage/) package.
As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a custom parser and take it into use by setting the `sql_parser` configuration value. A custom SQL parser must inherit from `datahub.utilities.sql_parser.SQLParser`
and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to `module_name.ClassName` of the parser.
84 changes: 84 additions & 0 deletions metadata-ingestion/docs/sources/looker/lookml_pre.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
### Pre-requisites

#### [Optional] Create an API key

See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.
You need to ensure that the API key is attached to a user that has Admin privileges.

If that is not possible, read the configuration section and provide an offline specification of the `connection_to_platform_map` and the `project_name`.

### Ingestion through UI

Ingestion using lookml connector is not supported through the UI.
However, you can set up ingestion using a GitHub Action to push metadata whenever your main lookml repo changes.

#### Sample GitHub Action

Drop this file into your `.github/workflows` directory inside your Looker github repo.

```
name: lookml metadata upload
on:
push:
branches:
- main
paths-ignore:
- "docs/**"
- "**.md"
pull_request:
branches:
- main
paths-ignore:
- "docs/**"
- "**.md"
release:
types: [published, edited]
workflow_dispatch:


jobs:
lookml-metadata-upload:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Run LookML ingestion
run: |
pip install 'acryl-datahub[lookml,datahub-rest]'
cat << EOF > lookml_ingestion.yml
# LookML ingestion configuration
source:
type: "lookml"
config:
base_folder: ${{ github.workspace }}
parse_table_names_from_sql: true
github_info:
repo: ${{ github.repository }}
branch: ${{ github.ref }}
# Options
#connection_to_platform_map:
# acryl-snow: snowflake
#platform: snowflake
#default_db: DEMO_PIPELINE
api:
client_id: ${LOOKER_CLIENT_ID}
client_secret: ${LOOKER_CLIENT_SECRET}
base_url: ${LOOKER_BASE_URL}
sink:
type: datahub-rest
config:
server: ${DATAHUB_GMS_HOST}
token: ${DATAHUB_TOKEN}
EOF
datahub ingest -c lookml_ingestion.yml
env:
DATAHUB_GMS_HOST: ${{ secrets.DATAHUB_GMS_HOST }}
DATAHUB_TOKEN: ${{ secrets.DATAHUB_TOKEN }}
LOOKER_BASE_URL: https://acryl.cloud.looker.com # <--- replace with your Looker base URL
LOOKER_CLIENT_ID: ${{ secrets.LOOKER_CLIENT_ID }}
LOOKER_CLIENT_SECRET: ${{ secrets.LOOKER_CLIENT_SECRET }}
```

If you want to ingest lookml using the **datahub** cli directly, read on for instructions and configuration details.
5 changes: 3 additions & 2 deletions metadata-ingestion/docs/sources/looker/lookml_recipe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ source:

# Optional additional github information. Used to add github links on the dataset's entity page.
github_info:
repo: org/repo-name
repo: org/repo-name
# Default sink is datahub-rest and doesn't need to be configured
# See https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for customization options

# sink configs
29 changes: 27 additions & 2 deletions metadata-ingestion/docs/sources/snowflake/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,29 @@
To get all metadata from Snowflake you need to use two plugins `snowflake` and `snowflake-usage`. Both of them are described in this page. These will require 2 separate recipes.
Ingesting metadata from Snowflake requires either using the **snowflake-beta** module with just one recipe (recommended) or the two separate modules **snowflake** and **snowflake-usage** (soon to be deprecated) with two separate recipes.

All three modules are described on this page.

We encourage you to try out new `snowflake-beta` plugin as alternative to running both `snowflake` and `snowflake-usage` plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata .
We encourage you to try out the new **snowflake-beta** plugin as alternative to running both **snowflake** and **snowflake-usage** plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata.

## Snowflake Ingestion through the UI

The following video shows you how to ingest Snowflake metadata through the UI.

<div style={{ position: "relative", paddingBottom: "56.25%", height: 0 }}>
<iframe
src="https://www.loom.com/embed/15d0401caa1c4aa483afef1d351760db"
frameBorder={0}
webkitallowfullscreen=""
mozallowfullscreen=""
allowFullScreen=""
style={{
position: "absolute",
top: 0,
left: 0,
width: "100%",
height: "100%"
}}
/>
</div>


Read on if you are interested in ingesting Snowflake metadata using the **datahub** cli, or want to learn about all the configuration parameters that are supported by the connectors.
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
source:
type: snowflake-beta
config:

# This option is recommended to be used for the first time to ingest all lineage
ignore_start_time_lineage: true
# This is an alternative option to specify the start_time for lineage
# if you don't want to look back since beginning
start_time: '2022-03-01T00:00:00Z'
start_time: "2022-03-01T00:00:00Z"

# Coordinates
account_id: "abc48144"
Expand Down Expand Up @@ -35,9 +34,7 @@ source:
profile_table_level_only: true
profile_pattern:
allow:
- 'ACCOUNTING_DB.*.*'
- 'MARKETING_DB.*.*'


sink:
# sink configs
- "ACCOUNTING_DB.*.*"
- "MARKETING_DB.*.*"
# Default sink is datahub-rest and doesn't need to be configured
# See https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for customization options
Loading