Create Initial Snowflake Notebook (5 sections)#61
Open
cyclux wants to merge 8 commits into55-add-data-preparation-orchestration-modulefrom
Open
Create Initial Snowflake Notebook (5 sections)#61cyclux wants to merge 8 commits into55-add-data-preparation-orchestration-modulefrom
cyclux wants to merge 8 commits into55-add-data-preparation-orchestration-modulefrom
Conversation
- Replaced specific version constraints with more flexible ones for better compatibility. - Added 'getml' as a new dependency. - Adjusted version specifications for existing dependencies to use compatible ranges.
…reate-initial-snowflake-notebook-5-sections
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a Snowflake data integration layer for the getML Feature Store, including configuration, infrastructure bootstrapping, and data ingestion utilities. It provides a modular, environment-variable-driven setup for connecting to Snowflake, ensures required infrastructure (warehouse/database) is present, and supplies SQL templates and scripts for automating data preparation and ingestion. Additionally, it adds tools for converting Jaffle Shop CSV data to Parquet format for efficient loading.
Snowflake Data Integration Layer
Core data integration package:
datapackage with modules for Snowflake settings, session management, infrastructure bootstrapping, SQL loading utilities, and top-level imports for streamlined usage. (integration/snowflake/data/__init__.py,integration/snowflake/data/_settings.py,integration/snowflake/data/_snowflake_session.py,integration/snowflake/data/_bootstrap.py,integration/snowflake/data/_sql_loader.py) [1] [2] [3] [4] [5]SnowflakeSettingsand context-managed Snowpark session creation. [1] [2]SQL automation and templates:
integration/snowflake/data/sql/...) [1] [2] [3] [4] [5] [6] [7] [8] [9]Jaffle Shop Data Preparation
integration/jaffle-shop-data/convert_jaffle_csv_to_parquet.py,integration/jaffle-shop-data/GENERATE_JAFFLE_SHOP_PARQUET.md) [1] [2]Project Configuration
pyproject.tomlwith dependencies for data engineering, Snowflake integration, development, and linting, ensuring reproducible environments and code quality. (integration/pyproject.toml)