-
Notifications
You must be signed in to change notification settings - Fork 242
Supporting unsupported types, mapping to varchar #2395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…columns metadata new style
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…columns metadata new style
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…columns metadata new style
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good to me. Just left a few suggestions.
Now that tests are passing, I think its best to get this merged as soon as possible since this PR is already very large. If we need to make additional changes, we can always open a new branch and PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark in general (not to be fixed in this PR): I think the get_sql_data_type_str_with_parameters(self)
function should be something data source specific (i.e. put it in the dialect) instead of being a global one. Now it requires "hacky" workarounds in some data sources to get it to work.
Not to be fixed immediately (certainly not in this PR). This doesn't seem like an easy fix.
def default_numeric_scale(self) -> Optional[int]: | ||
return None | ||
|
||
def build_columns_metadata_query_str(self, table_namespace: DataSourceNamespace, table_name: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a large chunk of code to have in the dialect file.
I think the dialect is a nice place for this function, but maybe we can move the actual implementation to another file to keep it a bit cleaner?
No hard opinion, just a suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach would be valid in a couple of places: replace a single file by a directory with multiple modules, maybe even introduce packages? E.g. splitting up the current soda_cloud.py
a bit as well.
No hard requirement for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI; we tried moving this, but we can only really move two functions (build_columns_metadata_query_str
and build_column_metadatas_from_query_result
). The other functions are overwritten in the inheriting classes, so they require different logic. Currently makes little sense to add the extra complexity of moving those two functions, but still pointing towards the dialect. That makes it so you're bouncing between files way too much.
"Nice to have" at the moment to move all of this: we might need an extra "dialect-like" class for each datasource impl just to do the metadata stuff. But then the argument can be made to just keep it in the dialect 🙃 .
return self.sql_dialect.build_column_metadatas_from_query_result(query_result) | ||
|
||
def build_columns_metadata_query_str(self, dataset_prefixes: list[str], dataset_name: str) -> str: | ||
database_index: int | None = self.sql_dialect.get_database_prefix_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic remark: I'm still not happy about the way we deal with the concepts of "database" and "schema", we still need to work on a proper terminology and/or abstraction IMHO.
This snippet still seems pretty brittle, but that's work for another time.
def default_numeric_scale(self) -> Optional[int]: | ||
return None | ||
|
||
def build_columns_metadata_query_str(self, table_namespace: DataSourceNamespace, table_name: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach would be valid in a couple of places: replace a single file by a directory with multiple modules, maybe even introduce packages? E.g. splitting up the current soda_cloud.py
a bit as well.
No hard requirement for the time being.
|
(description by Paul)
This is a monster PR that incorporates several branches of work which were happening in parallel and then eventually wound up here. I believe the Athena support and the Snowflake auth addition have already been reviewed from a prior branch. The rest is new in this branch.
(update -- Athena was already merged but for some reason it's showing up in the diff as if it was new, not sure why)
Structural changes
New feature implementations
Fixes