Relational Databases (MySQL, PostgreSQL) Integration#195
Relational Databases (MySQL, PostgreSQL) Integration#195dsaad68 wants to merge 58 commits intocanimus:mainfrom
Conversation
…_greater_or_equal_than`, `is_greater_than`
…ay`, `is_on_saturday`, `is_on_thursday`, `is_on_tuesday`, `is_on_wednesday`, `is_on_weekday` and `is_on_weekend`
…`, `is_in_billions`, `is_in_millions`, `is_less_or_equal_than`, `is_less_than`
…day`, `is_on_sunday`, `is_on_thursday`, `is_on_tuesday`, `is_on_wednesday`, `is_on_weekday`
⤴️ Updated Docstrings⤴️ Updated `README.md`
|
@dsaad68 thanks for this great addition. I will conduct a review and proceed with the integration. |
|
If there are any issues with this suggestion, please let me know. |
Hi @dsaad68 there is no issues with the submission. We are in a ongoing review for the Journal of Open Source software, and they have indicated the lack of some documentation. Before adding more functionality, we would like to pass the JOSS review, with the consolidated docs, at least from the core classes and modules. I think once we pass this hurdle, we will be adding the new functionality. Hope that explains the delays. Thank you in advance. |
|
@canimus I totally understand, good luck. Let me know if I can help. |
|
Hi @dsaad68 thanks for your patience with this PR. Now that the paper is out of the way, and finally published, I would like to make sure your contribution is all-in, because this branch has been stalled for some time, and a few commits and merges have passed, would you kindly review if we can bring it to the current state, and resolve the conflicts highlighted above? Thanks in advance. |
|
@canimus I will look at this week. |
cuallee
Relational Databases (MySQL, PostgreSQL) Integration
Utilizing
Polarswith theConnectorXengine to read from a database, tables in relational databases can now be checked and validated. ConnectorX, written in Rust, has native support for Apache Arrow, enabling it to transfer data directly into a Polars'sDataFramewithout copying the data (zero-copy). More information about the ConnectorX engine can be found here.This feature allows for the future integration of other DBMS, such as
Redshift(via the PostgreSQL protocol) andClickHouse(via the MySQL protocol), intocuallee.✨ Feature Enhancements:
PostgreSQLandMySQL, enabling more robust data integrity checks.pyproject.toml.🧪 Testing:
skip.)PyTestfixtures for automated unit tests forPostgreSQLandMySQL.init-db-psql.sqlandinit-db-mysql.sql.test_validate.pyto validate the type of outputted DataFrame, ensuring the resulting DataFrame is typepolars_dataframe.test_validate.pyto validate the error when a column is not found in the table.📖 Documentation:
README.mdHow-Toto set up the test environment.⛑️ Know Issues:
It's important to note that most of
PostgreSQL'sComputeare inherited fromDuckDB'sComputemethods, so any changes inDuckDB'sComputemethods will affectPostgreSQL'sComputemethods.has_stdcheck could fail due to floating point precision. The approach suggested below could resolve it, but it needs to introduce a precision error parameter:🦺 Limitation:
PostgreSQL
Not all checks are currently available for
PostgreSQL. Unavailable checks:is_dailyhas_entropyhas_workflowMySQL
Not all checks are currently available for
MySQL. Unavailable checks:is_dailyhas_entropyhas_workflowhas_percentilehas_correlationis_inside_interquartile_rangeImprove the
Computemethods to support complex queries:Because the
Computemethods are used to create a unified query, it is impossible to create complex queries with multipleComputemethods. This section needed to be improved to support complex queries.Inherency Enhancement:
A significant portion of
PostgreSQL's Compute functionalities are derived fromDuckDB's Compute methods.Therefore, any modifications to the Compute methods in
DuckDBwill directly impact the functionalities inPostgreSQL.Consolidating these shared methods into a separate class for better manageability is advisable.