Skip to content

Commit 626a35b

Browse files
tombaeyenspre-commit-ci[bot]paulteehanNiels-b
authored
Supporting unsupported types, mapping to varchar (#2395)
* Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for snowflake SodaDataTypeName's * Extracted DataSourceNamespace and added BigQuery support for getting columns metadata new style * Updated redshift made it pass test suite * Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for snowflake SodaDataTypeName's * Extracted DataSourceNamespace and added BigQuery support for getting columns metadata new style * Updated redshift made it pass test suite * Merge * Added support for databricks * Added support for databricks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support SQLServer * Formatting * Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for snowflake SodaDataTypeName's * Extracted DataSourceNamespace and added BigQuery support for getting columns metadata new style * Updated redshift made it pass test suite * Merge * Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Extracted DataSourceNamespace and added BigQuery support for getting columns metadata new style * Added support for databricks * Added support for databricks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support SQLServer * Formatting * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix merge error * Formatting * Added databricks support * precommit formatting and added a bit of docs in build_dwh_prefixes * Fixes for Fabric * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added DuckDB support * Oracle fixes wip for branch dwh-mapping-unsupported-types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Metadata fixes for Athena and Snowflake * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Metadata fixes for duckdb * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * sqlserver fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add post-create commands to set permissions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for snowflake SodaDataTypeName's * Extracted DataSourceNamespace and added BigQuery support for getting columns metadata new style * Updated redshift made it pass test suite * Merge * Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Extracted DataSourceNamespace and added BigQuery support for getting columns metadata new style * Added support for databricks * Added support for databricks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support SQLServer * Formatting * Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Extracted DataSourceNamespace and added BigQuery support for getting columns metadata new style * Updated redshift made it pass test suite * Supporting unsupported types, mapping to varchar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Initial unsupported types with test coverage for all postgres types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support SQLServer * Formatting * Fix merge error * Formatting * Added databricks support * precommit formatting and added a bit of docs in build_dwh_prefixes * Fixes for Fabric * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added DuckDB support * Metadata fixes for Athena and Snowflake * Oracle fixes wip for branch dwh-mapping-unsupported-types * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Metadata fixes for duckdb * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * sqlserver fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add post-create commands to set permissions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Merge * Formatting * Fix for Athena MetadataColumns * Code quality fixes (only low hanging fruit) * Removed rogue test.db * Added mapping for snowflake * Fix for postgres type with length in create table * Code quality fix * Added extra comment to separate metadata query logic a bit better in the dialect file --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Paul Teehan <[email protected]> Co-authored-by: Niels Bylois <[email protected]>
1 parent 2527e49 commit 626a35b

File tree

27 files changed

+961
-599
lines changed

27 files changed

+961
-599
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ scripts/antlr*.jar
44
.idea/
55
.postgres/
66
.db2/
7+
*.db
78
.sqlserver/
89
.mysql/
910
.vscode/

requirements.txt

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1+
soda-athena
2+
soda-bigquery
13
soda-core
2-
soda-tests
4+
soda-databricks
5+
soda-duckdb
6+
soda-fabric
37
soda-postgres
8+
soda-redshift
49
soda-snowflake
5-
soda-duckdb
6-
soda-bigquery
710
soda-sqlserver
811
soda-synapse
9-
soda-redshift
10-
soda-fabric
11-
soda-databricks
12-
soda-athena
12+
soda-tests

soda-athena/src/soda_athena/common/data_sources/athena_data_source.py

Lines changed: 64 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import logging
22
import re
33
from datetime import datetime
4-
from typing import Optional
4+
from typing import Any, Optional, Tuple
55

66
import boto3
77
from soda_athena.common.data_sources.athena_data_source_connection import (
@@ -10,9 +10,6 @@
1010
from soda_athena.common.data_sources.athena_data_source_connection import (
1111
AthenaDataSourceConnection,
1212
)
13-
from soda_athena.common.statements.metadata_columns_query import (
14-
AthenaMetadataColumnsQuery,
15-
)
1613
from soda_core.common.data_source_connection import DataSourceConnection
1714
from soda_core.common.data_source_impl import DataSourceImpl
1815
from soda_core.common.data_source_results import UpdateResult
@@ -25,7 +22,6 @@
2522
CREATE_TABLE_IF_NOT_EXISTS,
2623
)
2724
from soda_core.common.sql_dialect import SqlDialect
28-
from soda_core.common.statements.metadata_columns_query import MetadataColumnsQuery
2925

3026
logger: logging.Logger = soda_logger
3127

@@ -42,9 +38,6 @@ def _create_data_source_connection(self) -> DataSourceConnection:
4238
name=self.data_source_model.name, connection_properties=self.data_source_model.connection_properties
4339
)
4440

45-
def create_metadata_columns_query(self) -> MetadataColumnsQuery:
46-
return AthenaMetadataColumnsQuery(sql_dialect=self.sql_dialect, data_source_connection=self.connection)
47-
4841
def execute_update(self, sql: str) -> UpdateResult:
4942
result: UpdateResult = super().execute_update(sql)
5043
# Athena requires some extra cleanup for a drop table or drop schema statement.
@@ -141,16 +134,27 @@ def __init__(self, data_source_impl: AthenaDataSourceImpl):
141134
super().__init__()
142135
self.data_source_impl = data_source_impl
143136

144-
def get_sql_data_type_name_by_soda_data_type_names(self) -> dict:
137+
def default_casify(self, identifier: str) -> str:
138+
return identifier.lower()
139+
140+
def metadata_casify(self, identifier: str) -> str:
141+
return identifier.lower()
142+
143+
def get_data_source_data_type_name_by_soda_data_type_names(self) -> dict:
145144
"""
146145
Maps DBDataType names to data source type names.
147146
"""
148147
return {
148+
SodaDataTypeName.CHAR: "char",
149149
SodaDataTypeName.VARCHAR: "varchar",
150150
SodaDataTypeName.TEXT: "varchar",
151+
SodaDataTypeName.SMALLINT: "smallint",
151152
SodaDataTypeName.INTEGER: "integer",
153+
SodaDataTypeName.BIGINT: "bigint",
152154
SodaDataTypeName.DECIMAL: "decimal",
153155
SodaDataTypeName.NUMERIC: "decimal",
156+
SodaDataTypeName.FLOAT: "float",
157+
SodaDataTypeName.DOUBLE: "double",
154158
SodaDataTypeName.DATE: "date",
155159
SodaDataTypeName.TIME: "date",
156160
SodaDataTypeName.TIMESTAMP: "timestamp",
@@ -204,9 +208,6 @@ def sql_expr_timestamp_literal(self, datetime_in_iso8601: str) -> str:
204208
def sql_expr_timestamp_add_day(self, timestamp_literal: str) -> str:
205209
return f"{timestamp_literal} + interval '1' day"
206210

207-
def supports_case_sensitive_column_names(self) -> bool:
208-
return False # Athena does not support case sensitive names: everything is lowercase.
209-
210211
def build_create_table_sql(
211212
self, create_table: CREATE_TABLE | CREATE_TABLE_IF_NOT_EXISTS, add_semicolon: bool = True
212213
) -> str:
@@ -242,7 +243,7 @@ def _quote_column_for_create_table(self, column_name: str) -> str:
242243
def _is_not_null_ddl_supported(self) -> bool:
243244
return False
244245

245-
def supports_data_type_character_maximun_length(self) -> bool:
246+
def supports_data_type_character_maximum_length(self) -> bool:
246247
return True
247248

248249
def supports_data_type_numeric_precision(self) -> bool:
@@ -254,6 +255,18 @@ def supports_data_type_numeric_scale(self) -> bool:
254255
def supports_data_type_datetime_precision(self) -> bool:
255256
return False # Technically it is supported, but we can't modify it in a CREATE TABLE statement (always defaults to 3)
256257

258+
def column_data_type_max_length(self) -> Optional[str]:
259+
"""Athena supports this but it's not in information schema."""
260+
return None
261+
262+
def column_data_type_numeric_precision(self) -> Optional[str]:
263+
"""Athena supports this but it's not in information schema."""
264+
return None
265+
266+
def column_data_type_numeric_scale(self) -> Optional[str]:
267+
"""Athena supports this but it's not in information schema."""
268+
return None
269+
257270
def format_metadata_data_type(self, data_type: str) -> str:
258271
"""Athena sometimes modifies data types to include precision (e.g. TIMESTAMP as TIMESTAMP(3)) in column metadata
259272
@@ -265,13 +278,47 @@ def format_metadata_data_type(self, data_type: str) -> str:
265278
return data_type
266279

267280
def data_type_has_parameter_character_maximum_length(self, data_type_name) -> bool:
268-
return data_type_name.lower() in ["varchar", "char"]
281+
return self.format_metadata_data_type(data_type_name).lower() in ["varchar", "char"]
269282

270283
def data_type_has_parameter_numeric_precision(self, data_type_name) -> bool:
271-
return data_type_name.lower() in ["decimal"]
284+
return self.format_metadata_data_type(data_type_name).lower() in ["decimal"]
272285

273286
def data_type_has_parameter_numeric_scale(self, data_type_name) -> bool:
274-
return data_type_name.lower() in ["decimal"]
287+
return self.format_metadata_data_type(data_type_name).lower() in ["decimal"]
275288

276289
def data_type_has_parameter_datetime_precision(self, data_type_name) -> bool:
277-
return data_type_name.lower() in ["timestamp", "time"]
290+
return self.format_metadata_data_type(data_type_name).lower() in ["timestamp", "time"]
291+
292+
def supports_case_sensitive_column_names(self) -> bool:
293+
return False # Athena does not support case sensitive names: everything is lowercase.
294+
295+
def extract_character_maximum_length(self, row: Tuple[Any, ...], columns: list[Tuple[Any, ...]]) -> Optional[int]:
296+
# Varchars are a special case, they may contain a length parameter, but not always!
297+
data_type_name: str = self.extract_data_type_name(row, columns)
298+
formatted_data_type_name: str = self.format_metadata_data_type(data_type_name)
299+
if not self.data_type_has_parameter_character_maximum_length(data_type_name):
300+
return None
301+
try:
302+
# extract value from inside parentheses
303+
data_type_tuple = data_type_name[len(formatted_data_type_name) + 1 : -1].split(",")
304+
return int(data_type_tuple[0])
305+
except ValueError:
306+
return None
307+
308+
def extract_numeric_precision(self, row: Tuple[Any, ...], columns: list[Tuple[Any, ...]]) -> Optional[int]:
309+
data_type_name: str = self.extract_data_type_name(row, columns)
310+
if not self.data_type_has_parameter_numeric_precision(data_type_name):
311+
return None
312+
313+
formatted_data_type_name: str = self.format_metadata_data_type(data_type_name)
314+
data_type_tuple = data_type_name[len(formatted_data_type_name) + 1 : -1].split(",")
315+
return int(data_type_tuple[0])
316+
317+
def extract_numeric_scale(self, row: Tuple[Any, ...], columns: list[Tuple[Any, ...]]) -> Optional[int]:
318+
data_type_name: str = self.extract_data_type_name(row, columns)
319+
formatted_data_type_name: str = self.format_metadata_data_type(data_type_name)
320+
if not self.data_type_has_parameter_numeric_scale(data_type_name):
321+
return None
322+
323+
data_type_tuple = data_type_name[len(formatted_data_type_name) + 1 : -1].split(",")
324+
return int(data_type_tuple[1])

soda-athena/src/soda_athena/common/statements/metadata_columns_query.py

Lines changed: 0 additions & 127 deletions
This file was deleted.

0 commit comments

Comments
 (0)