Supporting unsupported types, mapping to varchar #2395

tombaeyens · 2025-08-27T16:32:43Z

(description by Paul)

This is a monster PR that incorporates several branches of work which were happening in parallel and then eventually wound up here. I believe the Athena support and the Snowflake auth addition have already been reviewed from a prior branch. The rest is new in this branch.
(update -- Athena was already merged but for some reason it's showing up in the diff as if it was new, not sure why)

Structural changes

Use new concept of DataSourceNamespaces to replace prefixes
Metadata columns query refactored, entrypoint moved from SQlDialect to DataSourceImpl, support for extracting precision & scale
Refactored Soda data types
Added contract interfaces*

New feature implementations

Athena support
Support precision and scale in numeric and datetime types
Added bidrectional data mappings for all data sources and listed supported/unsupported types
- Mappings are a first pass, not final, we will review in a future PR. Don't consider mapping correctness in this review
Added snowflake oauth login path

Fixes

Fixed typo (supports_data_type_character_maximun_length) -> supports_data_type_character_maximum_length
fixed an issue with sql server regexes
Other small fixes

for more information, see https://pre-commit.ci

…columns metadata new style

for more information, see https://pre-commit.ci

…columns metadata new style

for more information, see https://pre-commit.ci

…columns metadata new style

for more information, see https://pre-commit.ci

…-types

Niels-b

PR looks good to me. Just left a few suggestions.
Now that tests are passing, I think its best to get this merged as soon as possible since this PR is already very large. If we need to make additional changes, we can always open a new branch and PR.

Niels-b · 2025-09-12T14:33:17Z

soda-core/src/soda_core/common/metadata_types.py

Remark in general (not to be fixed in this PR): I think the get_sql_data_type_str_with_parameters(self) function should be something data source specific (i.e. put it in the dialect) instead of being a global one. Now it requires "hacky" workarounds in some data sources to get it to work.
Not to be fixed immediately (certainly not in this PR). This doesn't seem like an easy fix.

Niels-b · 2025-09-12T14:39:16Z

soda-core/src/soda_core/common/sql_dialect.py

    def default_numeric_scale(self) -> Optional[int]:
        return None
+
+    def build_columns_metadata_query_str(self, table_namespace: DataSourceNamespace, table_name: str) -> str:


Seems like a large chunk of code to have in the dialect file.
I think the dialect is a nice place for this function, but maybe we can move the actual implementation to another file to keep it a bit cleaner?
No hard opinion, just a suggestion

I think this approach would be valid in a couple of places: replace a single file by a directory with multiple modules, maybe even introduce packages? E.g. splitting up the current soda_cloud.py a bit as well.

No hard requirement for the time being.

FYI; we tried moving this, but we can only really move two functions (build_columns_metadata_query_str and build_column_metadatas_from_query_result). The other functions are overwritten in the inheriting classes, so they require different logic. Currently makes little sense to add the extra complexity of moving those two functions, but still pointing towards the dialect. That makes it so you're bouncing between files way too much.
"Nice to have" at the moment to move all of this: we might need an extra "dialect-like" class for each datasource impl just to do the metadata stuff. But then the argument can be made to just keep it in the dialect 🙃 .

nielsn · 2025-09-15T08:14:42Z

soda-core/src/soda_core/common/data_source_impl.py

+        return self.sql_dialect.build_column_metadatas_from_query_result(query_result)
+
+    def build_columns_metadata_query_str(self, dataset_prefixes: list[str], dataset_name: str) -> str:
+        database_index: int | None = self.sql_dialect.get_database_prefix_index()


Generic remark: I'm still not happy about the way we deal with the concepts of "database" and "schema", we still need to work on a proper terminology and/or abstraction IMHO.

This snippet still seems pretty brittle, but that's work for another time.

nielsn · 2025-09-15T08:20:14Z

soda-core/src/soda_core/common/sql_dialect.py

    def default_numeric_scale(self) -> Optional[int]:
        return None
+
+    def build_columns_metadata_query_str(self, table_namespace: DataSourceNamespace, table_name: str) -> str:


I think this approach would be valid in a couple of places: replace a single file by a directory with multiple modules, maybe even introduce packages? E.g. splitting up the current soda_cloud.py a bit as well.

No hard requirement for the time being.

…the dialect file

sonarqubecloud · 2025-09-16T08:59:07Z

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Supporting unsupported types, mapping to varchar

f919c14

tombaeyens marked this pull request as draft August 27, 2025 16:32

pre-commit-ci bot and others added 28 commits August 27, 2025 16:33

[pre-commit.ci] auto fixes from pre-commit.com hooks

a4518f6

for more information, see https://pre-commit.ci

Initial unsupported types with test coverage for all postgres types

538d001

Merge branch 'v4' into dwh-mapping-unsupported-types

6e0d321

[pre-commit.ci] auto fixes from pre-commit.com hooks

03ed034

for more information, see https://pre-commit.ci

Added support for snowflake SodaDataTypeName's

f716805

Extracted DataSourceNamespace and added BigQuery support for getting …

4dd01bc

…columns metadata new style

Updated redshift made it pass test suite

610fac8

Supporting unsupported types, mapping to varchar

e3e60c2

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d946d3

for more information, see https://pre-commit.ci

Initial unsupported types with test coverage for all postgres types

135b293

[pre-commit.ci] auto fixes from pre-commit.com hooks

7e6fb5d

for more information, see https://pre-commit.ci

Added support for snowflake SodaDataTypeName's

083ae36

Extracted DataSourceNamespace and added BigQuery support for getting …

ecb18f5

…columns metadata new style

Updated redshift made it pass test suite

bae1a41

Merge

7de6301

Added support for databricks

50183c6

Added support for databricks

b9f3411

Merge

6a0cba7

Merge

897aaa0

[pre-commit.ci] auto fixes from pre-commit.com hooks

6b450b6

for more information, see https://pre-commit.ci

Support SQLServer

51a533b

Formatting

867b9c4

Supporting unsupported types, mapping to varchar

dbe55aa

[pre-commit.ci] auto fixes from pre-commit.com hooks

a32f0ff

for more information, see https://pre-commit.ci

Initial unsupported types with test coverage for all postgres types

7303516

[pre-commit.ci] auto fixes from pre-commit.com hooks

94761e7

for more information, see https://pre-commit.ci

Added support for snowflake SodaDataTypeName's

a9f7b21

Extracted DataSourceNamespace and added BigQuery support for getting …

3aa8129

…columns metadata new style

tombaeyens and others added 12 commits September 4, 2025 12:48

Oracle fixes wip for branch dwh-mapping-unsupported-types

7471c8c

[pre-commit.ci] auto fixes from pre-commit.com hooks

57882d0

for more information, see https://pre-commit.ci

Metadata fixes for duckdb

3fceae6

[pre-commit.ci] auto fixes from pre-commit.com hooks

56518cd

for more information, see https://pre-commit.ci

sqlserver fixes

9d56012

[pre-commit.ci] auto fixes from pre-commit.com hooks

dfd20e4

for more information, see https://pre-commit.ci

Add post-create commands to set permissions

9c2c4d6

[pre-commit.ci] auto fixes from pre-commit.com hooks

98a1376

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

5533a34

for more information, see https://pre-commit.ci

Merge

7df4449

Merge

868cb72

Formatting

20b25aa

paulteehan requested review from m1n0, mivds, Niels-b and nielsn September 4, 2025 12:22

Niels-b added 4 commits September 12, 2025 15:31

Merge remote-tracking branch 'origin/v4' into dwh-mapping-unsupported…

4b92e00

…-types

Fix for Athena MetadataColumns

cbac344

Code quality fixes (only low hanging fruit)

8fe7cf1

Removed rogue test.db

8de5305

Niels-b approved these changes Sep 12, 2025

View reviewed changes

nielsn approved these changes Sep 15, 2025

View reviewed changes

Niels-b added 4 commits September 15, 2025 17:40

Added mapping for snowflake

11ba5be

Fix for postgres type with length in create table

df69730

Code quality fix

2a8ca13

Added extra comment to separate metadata query logic a bit better in …

62b1bc0

…the dialect file

Niels-b marked this pull request as ready for review September 16, 2025 09:01

Niels-b merged commit 626a35b into v4 Sep 16, 2025
27 checks passed

Niels-b deleted the dwh-mapping-unsupported-types branch September 16, 2025 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting unsupported types, mapping to varchar #2395

Supporting unsupported types, mapping to varchar #2395

Uh oh!

tombaeyens commented Aug 27, 2025 •

edited by paulteehan

Loading

Uh oh!

Niels-b left a comment

Uh oh!

Niels-b Sep 12, 2025

Uh oh!

Niels-b Sep 12, 2025

Uh oh!

nielsn Sep 15, 2025

Uh oh!

Niels-b Sep 16, 2025

Uh oh!

nielsn Sep 15, 2025

Uh oh!

nielsn Sep 15, 2025

Uh oh!

sonarqubecloud bot commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Supporting unsupported types, mapping to varchar #2395

Supporting unsupported types, mapping to varchar #2395

Uh oh!

Conversation

tombaeyens commented Aug 27, 2025 • edited by paulteehan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Structural changes

New feature implementations

Fixes

Uh oh!

Niels-b left a comment

Choose a reason for hiding this comment

Uh oh!

Niels-b Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Niels-b Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

nielsn Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Niels-b Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

nielsn Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

nielsn Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Sep 16, 2025

Quality Gate passed

Uh oh!

Uh oh!

Uh oh!

tombaeyens commented Aug 27, 2025 •

edited by paulteehan

Loading