Skip to content

Commit fe73ab9

Browse files
committed
docs(ingest): improve doc gen, docs for snowflake, looker (datahub-project#5867)
1 parent 62699a1 commit fe73ab9

File tree

13 files changed

+276
-65
lines changed

13 files changed

+276
-65
lines changed

metadata-ingestion/docs/sources/looker/looker.md

Lines changed: 0 additions & 23 deletions
This file was deleted.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
### Pre-Requisites
2+
3+
#### Set up the right permissions
4+
You need to provide the following permissions for ingestion to work correctly.
5+
```
6+
access_data
7+
explore
8+
manage_models
9+
see_datagroups
10+
see_lookml
11+
see_lookml_dashboards
12+
see_looks
13+
see_pdts
14+
see_queries
15+
see_schedules
16+
see_sql
17+
see_system_activity
18+
see_user_dashboards
19+
see_users
20+
```
21+
Here is an example permission set after configuration.
22+
![Looker DataHub Permission Set](./looker_datahub_permission_set.png)
23+
24+
#### Get an API key
25+
26+
You need to get an API key for the account with the above privileges to perform ingestion. See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.
27+
28+
29+
### Ingestion through UI
30+
31+
The following video shows you how to get started with ingesting Looker metadata through the UI.
32+
33+
:::note
34+
35+
You will need to run `lookml` ingestion through the CLI after you have ingested Looker metadata through the UI. Otherwise you will not be able to see Looker Views and their lineage to your warehouse tables.
36+
37+
:::
38+
39+
<div
40+
style={{
41+
position: "relative",
42+
paddingBottom: "57.692307692307686%",
43+
height: 0
44+
}}
45+
>
46+
<iframe
47+
src="https://www.loom.com/embed/b8b9654e02714d20a44122cc1bffc1bb"
48+
frameBorder={0}
49+
webkitallowfullscreen=""
50+
mozallowfullscreen=""
51+
allowFullScreen=""
52+
style={{
53+
position: "absolute",
54+
top: 0,
55+
left: 0,
56+
width: "100%",
57+
height: "100%"
58+
}}
59+
/>
60+
</div>
61+
62+

metadata-ingestion/docs/sources/looker/lookml.md

Lines changed: 0 additions & 13 deletions
This file was deleted.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#### Configuration Notes
2+
3+
:::note
4+
5+
The integration can use an SQL parser to try to parse the tables the views depends on.
6+
7+
:::
8+
9+
This parsing is disabled by default, but can be enabled by setting `parse_table_names_from_sql: True`. The default parser is based on the [`sqllineage`](https://pypi.org/project/sqllineage/) package.
10+
As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a custom parser and take it into use by setting the `sql_parser` configuration value. A custom SQL parser must inherit from `datahub.utilities.sql_parser.SQLParser`
11+
and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to `module_name.ClassName` of the parser.
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
### Pre-requisites
2+
3+
#### [Optional] Create an API key
4+
5+
See the [Looker authentication docs](https://docs.looker.com/reference/api-and-integration/api-auth#authentication_with_an_sdk) for the steps to create a client ID and secret.
6+
You need to ensure that the API key is attached to a user that has Admin privileges.
7+
8+
If that is not possible, read the configuration section and provide an offline specification of the `connection_to_platform_map` and the `project_name`.
9+
10+
### Ingestion through UI
11+
12+
Ingestion using lookml connector is not supported through the UI.
13+
However, you can set up ingestion using a GitHub Action to push metadata whenever your main lookml repo changes.
14+
15+
#### Sample GitHub Action
16+
17+
Drop this file into your `.github/workflows` directory inside your Looker github repo.
18+
19+
```
20+
name: lookml metadata upload
21+
on:
22+
push:
23+
branches:
24+
- main
25+
paths-ignore:
26+
- "docs/**"
27+
- "**.md"
28+
pull_request:
29+
branches:
30+
- main
31+
paths-ignore:
32+
- "docs/**"
33+
- "**.md"
34+
release:
35+
types: [published, edited]
36+
workflow_dispatch:
37+
38+
39+
jobs:
40+
lookml-metadata-upload:
41+
runs-on: ubuntu-latest
42+
steps:
43+
- uses: actions/checkout@v2
44+
- uses: actions/setup-python@v4
45+
with:
46+
python-version: '3.9'
47+
- name: Run LookML ingestion
48+
run: |
49+
pip install 'acryl-datahub[lookml,datahub-rest]'
50+
cat << EOF > lookml_ingestion.yml
51+
# LookML ingestion configuration
52+
source:
53+
type: "lookml"
54+
config:
55+
base_folder: ${{ github.workspace }}
56+
parse_table_names_from_sql: true
57+
github_info:
58+
repo: ${{ github.repository }}
59+
branch: ${{ github.ref }}
60+
# Options
61+
#connection_to_platform_map:
62+
# acryl-snow: snowflake
63+
#platform: snowflake
64+
#default_db: DEMO_PIPELINE
65+
api:
66+
client_id: ${LOOKER_CLIENT_ID}
67+
client_secret: ${LOOKER_CLIENT_SECRET}
68+
base_url: ${LOOKER_BASE_URL}
69+
sink:
70+
type: datahub-rest
71+
config:
72+
server: ${DATAHUB_GMS_HOST}
73+
token: ${DATAHUB_TOKEN}
74+
EOF
75+
datahub ingest -c lookml_ingestion.yml
76+
env:
77+
DATAHUB_GMS_HOST: ${{ secrets.DATAHUB_GMS_HOST }}
78+
DATAHUB_TOKEN: ${{ secrets.DATAHUB_TOKEN }}
79+
LOOKER_BASE_URL: https://acryl.cloud.looker.com # <--- replace with your Looker base URL
80+
LOOKER_CLIENT_ID: ${{ secrets.LOOKER_CLIENT_ID }}
81+
LOOKER_CLIENT_SECRET: ${{ secrets.LOOKER_CLIENT_SECRET }}
82+
```
83+
84+
If you want to ingest lookml using the **datahub** cli directly, read on for instructions and configuration details.

metadata-ingestion/docs/sources/looker/lookml_recipe.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ source:
3131

3232
# Optional additional github information. Used to add github links on the dataset's entity page.
3333
github_info:
34-
repo: org/repo-name
34+
repo: org/repo-name
35+
# Default sink is datahub-rest and doesn't need to be configured
36+
# See https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for customization options
3537

36-
# sink configs
Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,29 @@
1-
To get all metadata from Snowflake you need to use two plugins `snowflake` and `snowflake-usage`. Both of them are described in this page. These will require 2 separate recipes.
1+
Ingesting metadata from Snowflake requires either using the **snowflake-beta** module with just one recipe (recommended) or the two separate modules **snowflake** and **snowflake-usage** (soon to be deprecated) with two separate recipes.
22

3+
All three modules are described on this page.
34

4-
We encourage you to try out new `snowflake-beta` plugin as alternative to running both `snowflake` and `snowflake-usage` plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata .
5+
We encourage you to try out the new **snowflake-beta** plugin as alternative to running both **snowflake** and **snowflake-usage** plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata.
6+
7+
## Snowflake Ingestion through the UI
8+
9+
The following video shows you how to ingest Snowflake metadata through the UI.
10+
11+
<div style={{ position: "relative", paddingBottom: "56.25%", height: 0 }}>
12+
<iframe
13+
src="https://www.loom.com/embed/15d0401caa1c4aa483afef1d351760db"
14+
frameBorder={0}
15+
webkitallowfullscreen=""
16+
mozallowfullscreen=""
17+
allowFullScreen=""
18+
style={{
19+
position: "absolute",
20+
top: 0,
21+
left: 0,
22+
width: "100%",
23+
height: "100%"
24+
}}
25+
/>
26+
</div>
27+
28+
29+
Read on if you are interested in ingesting Snowflake metadata using the **datahub** cli, or want to learn about all the configuration parameters that are supported by the connectors.
File renamed without changes.

metadata-ingestion/docs/sources/snowflake/snowflake-beta_recipe.yml

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
source:
22
type: snowflake-beta
33
config:
4-
54
# This option is recommended to be used for the first time to ingest all lineage
65
ignore_start_time_lineage: true
76
# This is an alternative option to specify the start_time for lineage
87
# if you don't want to look back since beginning
9-
start_time: '2022-03-01T00:00:00Z'
8+
start_time: "2022-03-01T00:00:00Z"
109

1110
# Coordinates
1211
account_id: "abc48144"
@@ -35,9 +34,7 @@ source:
3534
profile_table_level_only: true
3635
profile_pattern:
3736
allow:
38-
- 'ACCOUNTING_DB.*.*'
39-
- 'MARKETING_DB.*.*'
40-
41-
42-
sink:
43-
# sink configs
37+
- "ACCOUNTING_DB.*.*"
38+
- "MARKETING_DB.*.*"
39+
# Default sink is datahub-rest and doesn't need to be configured
40+
# See https://datahubproject.io/docs/metadata-ingestion/sink_docs/datahub for customization options
File renamed without changes.

0 commit comments

Comments
 (0)