From 7a15cc8d9d117c34c17773faf120f464a7d803ff Mon Sep 17 00:00:00 2001 From: LanceCracker04 <151309205+LanceCracker04@users.noreply.github.com> Date: Sat, 14 Mar 2026 16:47:04 +0800 Subject: [PATCH 1/7] docs: small README improvements --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 08d3304..f9440bd 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,11 @@ DataHelm is a data engineering framework focused on: -- source ingestion orchestration +- source ingestion and orchestration - dbt transformation workflows - notebook-based dashboard execution - reusable provider connectors (SharePoint, GCS, S3, BigQuery) -- optional local-LLM analytics query scaffolding +- optional local LLM analytics query scaffolding ## Core Capabilities @@ -58,7 +58,7 @@ docs/ ### Prerequisites - Python 3.12+ -- PostgreSQL (reachable from local environment) +- PostgreSQL (accessible from the local environment) - Optional: Docker, local Ollama, dbt CLI ### Installation @@ -147,7 +147,7 @@ Run all tests: .venv/bin/python -m pytest -q ``` -Current suite covers: +The current test suite covers: - ingestion and handler behavior - analytics factory and runner logic From 3fd8bc5f6954b4be73495d3b8142e279794da6cf Mon Sep 17 00:00:00 2001 From: usopenmarket-a11y Date: Sat, 14 Mar 2026 16:33:15 +0200 Subject: [PATCH 2/7] small improvements in `README.md` --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8ed99e2..7697989 100644 --- a/README.md +++ b/README.md @@ -6,9 +6,9 @@ DataHelm is a data engineering framework focused on: - dbt transformation workflows - notebook-based dashboard execution - reusable provider connectors (SharePoint, GCS, S3, BigQuery) -- optional local-LLM analytics query scaffolding +- optional local LLM analytics query scaffolding -![alt text](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true) +![DataHelm Architecture](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true) ## Core Capabilities From 62c00be772aa6eb45f6938f5f35c0f168791c3e7 Mon Sep 17 00:00:00 2001 From: Habiba Date: Sat, 14 Mar 2026 21:42:08 +0500 Subject: [PATCH 3/7] docs: small README improvements --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 08d3304..6ef6b1b 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,12 @@ # DataHelm -DataHelm is a data engineering framework focused on: +DataHelm is a data engineering framework focused on the following: -- source ingestion orchestration -- dbt transformation workflows -- notebook-based dashboard execution -- reusable provider connectors (SharePoint, GCS, S3, BigQuery) -- optional local-LLM analytics query scaffolding +- Source ingestion orchestration +- DBT transformation workflows +- Notebook-based dashboard execution +- Reusable provider connectors (SharePoint, GCS, S3, BigQuery) +- Optional local-LLM analytics query scaffolding ## Core Capabilities From acbcfb091560f9dc81b34ec160e219f6258f1185 Mon Sep 17 00:00:00 2001 From: dx4iot Date: Sat, 14 Mar 2026 22:32:13 +0530 Subject: [PATCH 4/7] docs: README wording, grammar, and formatting improvements --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 3ee69bb..8d9509e 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ DataHelm is a data engineering framework focused on: - source ingestion and orchestration - dbt transformation workflows - notebook-based dashboard execution -- reusable provider connectors (SharePoint, GCS, S3, BigQuery) +- reusable provider connectors (SharePoint, GCS, S3, and BigQuery) - optional local LLM analytics query scaffolding ![DataHelm Architecture](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true) @@ -13,15 +13,15 @@ DataHelm is a data engineering framework focused on: ## Core Capabilities - **Config-driven ingestion** using YAML in `config/api/` -- **Dagster orchestration** for jobs, schedules, and sensors -- **dbt project execution** through `analytics/dbt_runner.py` and dbt configs +- **Dagster orchestration** for managing jobs, schedules, and sensors +- **dbt project execution** through `analytics/dbt_runner.py` and dbt configuration files - **Dashboard generation** with Dagstermill notebooks - **Reusable handlers/connectors** for multiple external providers - **Optional NL-to-SQL module** (`analytics/nl_query/`) for local Ollama-based analytics workflows ## High-Level Architecture -The repository follows layered responsibilities: +The repository follows a layered responsibility structure: - `handlers/`: provider-specific source connectors and API handlers - `ingestion/`: ingestion factory + native ingestion implementations @@ -74,7 +74,7 @@ pip install -e . ### Environment Variables -Create a `.env` file in repository root with required values, for example: +Create a `.env` file in the repository root with the required values, for example: ```env DB_HOST=${DB_HOST} @@ -149,7 +149,7 @@ Run all tests: .venv/bin/python -m pytest -q ``` -The current test suite covers: +The current test suite includes coverage for: - ingestion and handler behavior - analytics factory and runner logic From 4d66aa9c9eb374712ff24f6ba0acad06c93607cc Mon Sep 17 00:00:00 2001 From: dx4iot Date: Sat, 14 Mar 2026 22:32:13 +0530 Subject: [PATCH 5/7] docs: README wording, grammar, and formatting improvements --- README.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 12a6f13..0d8a266 100644 --- a/README.md +++ b/README.md @@ -2,24 +2,26 @@ DataHelm is a data engineering framework focused on the following: -- Source ingestion orchestration -- DBT transformation workflows -- Notebook-based dashboard execution -- Reusable provider connectors (SharePoint, GCS, S3, BigQuery) -- Optional local-LLM analytics query scaffolding +- source ingestion and orchestration +- dbt transformation workflows +- notebook-based dashboard execution +- reusable provider connectors (SharePoint, GCS, S3, and BigQuery) +- optional local LLM analytics query scaffolding + +![DataHelm Architecture](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true) ## Core Capabilities - **Config-driven ingestion** using YAML in `config/api/` -- **Dagster orchestration** for jobs, schedules, and sensors -- **dbt project execution** through `analytics/dbt_runner.py` and dbt configs +- **Dagster orchestration** for managing jobs, schedules, and sensors +- **dbt project execution** through `analytics/dbt_runner.py` and dbt configuration files - **Dashboard generation** with Dagstermill notebooks - **Reusable handlers/connectors** for multiple external providers - **Optional NL-to-SQL module** (`analytics/nl_query/`) for local Ollama-based analytics workflows ## High-Level Architecture -The repository follows layered responsibilities: +The repository follows a layered responsibility structure: - `handlers/`: provider-specific source connectors and API handlers - `ingestion/`: ingestion factory + native ingestion implementations @@ -72,7 +74,7 @@ pip install -e . ### Environment Variables -Create a `.env` file in repository root with required values, for example: +Create a `.env` file in the repository root with the required values, for example: ```env DB_HOST=${DB_HOST} @@ -147,7 +149,7 @@ Run all tests: .venv/bin/python -m pytest -q ``` -The current test suite covers: +The current test suite includes coverage for: - ingestion and handler behavior - analytics factory and runner logic From f30d4eebc6969f3e85ffb6e4976a1a0d532d39fb Mon Sep 17 00:00:00 2001 From: DevStrikerTech Date: Sat, 14 Mar 2026 17:34:41 +0000 Subject: [PATCH 6/7] docs: clarify README run option wording Made-with: Cursor --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0d8a266..ea1b175 100644 --- a/README.md +++ b/README.md @@ -91,7 +91,7 @@ CLASHOFCLANS_API_TOKEN=${CLASHOFCLANS_API_TOKEN} python scripts/run_dagster_dev.py ``` -Useful option: +Useful option for quick verification: ```bash python scripts/run_dagster_dev.py --print-only From 6ec4cb45b50094b9a5f7b2b3763f23548e3e5207 Mon Sep 17 00:00:00 2001 From: DevStrikerTech Date: Sat, 14 Mar 2026 17:43:11 +0000 Subject: [PATCH 7/7] docs: clarify Dagster gRPC runtime wording Made-with: Cursor --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ea1b175..820cb00 100644 --- a/README.md +++ b/README.md @@ -172,7 +172,7 @@ Workflows: Container image is defined via `Dockerfile`. -Default runtime command starts Dagster gRPC: +Default runtime command starts the Dagster gRPC server: ```bash python -m dagster api grpc -m dagster_op.repository