[PECOBLR-666]Added support for Variant datatype in SQLAlchemy#42
[PECOBLR-666]Added support for Variant datatype in SQLAlchemy#42msrathore-db merged 5 commits intomainfrom
Conversation
jprakash-db
left a comment
There was a problem hiding this comment.
can you add the variant type in the sqlalchemy_example.py file, so that users have an example
| return tuple(value) | ||
| elif isinstance(value, dict): | ||
| return tuple(value.items()) | ||
| return tuple(sorted(value.items())) |
There was a problem hiding this comment.
is the sorting needed? the response from the server is in the same order that we insert right?
There was a problem hiding this comment.
This is needed because parse_json makes changes to the order of the insertion. So we need to verify after sorting itself,
| for key in ['variant_simple_col', 'variant_nested_col', 'variant_array_col', 'variant_mixed_col']: | ||
| if compare[key] is not None: | ||
| compare[key] = json.loads(compare[key]) |
There was a problem hiding this comment.
Is this part even needed?
There was a problem hiding this comment.
Yes because we get a string so it's better to verify after converting it from JSON to check if the output matches.
| def literal_processor(self, dialect): | ||
| """Process literal values for SQL generation. | ||
| For VARIANT columns, use PARSE_JSON() to properly insert data. | ||
| """ | ||
| def process(value): | ||
| if value is None: | ||
| return "NULL" | ||
| try: | ||
| return self.pe.escape_string(json.dumps(value, ensure_ascii=False, separators=(',', ':'))) | ||
| except (TypeError, ValueError) as e: | ||
| raise ValueError(f"Cannot serialize value {value} to JSON: {e}") | ||
|
|
||
| return f"PARSE_JSON('{process}')" | ||
|
|
There was a problem hiding this comment.
When you have bind processor, why do you need literal processor? both seems to be doing the same thing
There was a problem hiding this comment.
Literal processor is called when literal_bind parameter is set to true for a SQL alchemy query. We can remove this section since the default value for the parameter is false
| dtype_mapping = { | ||
| "variant_simple_col": DatabricksVariant, | ||
| "variant_nested_col": DatabricksVariant, | ||
| "variant_array_col": DatabricksVariant, | ||
| "variant_mixed_col": DatabricksVariant |
There was a problem hiding this comment.
What if there are other types apart from variant, such as int or array,etc. Does this dtype mapping need to provided for only the variant columns or for all
There was a problem hiding this comment.
It's better to provide the entire mapping. If we do not provide this mapping then the data is stored as a string for complex type. However for the general types like int, float, etc we do not need to explicitly map
merged changes from main
…essor for variant
jprakash-db
left a comment
There was a problem hiding this comment.
LGTM. Thanks for making the changes
Description
Added support for Variant type. Users can now insert and read from the Variant type columns.
Pandas being one of the most important use cases for SQLAlchemy, additional testing has been added to test for Pandas specific use cases. User needs to explicitly provide the
dtypeparameter forpandas.to_sql()that'll define which columns are of variant type.Testing
Added unit and E2E tests for
DatabricksVarianttypeRelated tickets and documents
PECOBLR-666