DevStrikerTech · DevStrikerTech · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026
diff --git a/README.md b/README.md
@@ -5,13 +5,33 @@ DataHelm is a data engineering framework focused on the following:
 - Source ingestion and orchestration
 - dbt transformation workflows
 - Notebook-based dashboard execution
+- Reusable provider connectors (SharePoint, GCS, S3, BigQuery)
+- Optional local LLM analytics query scaffolding
+
+## Table of Contents
+
+
+- [Core Capabilities](#core-capabilities)
+- [High-Level Architecture](#high-level-architecture)
+- [Repository Structure](#repository-structure)
+- [Local Setup](#local-setup)
+- [Configuration Model](#configuration-model)
+- [Reusable Connectors](#reusable-connectors)
+- [Local LLM Analytics Module](#local-llm-analytics-module)
+- [Testing](#testing)
+- [CI/CD and Branching](#cicd-and-branching)
+- [Containerization](#containerization)
+- [Deployment](#deployment)
+- [Contributing and Governance](#contributing-and-governance)
+- [Detailed Technical Documentation](#detailed-technical-documentation)
 - Reusable provider connectors (SharePoint, GCS, S3, and BigQuery)
 - Optional local LLM analytics query scaffolding
 
 ![DataHelm Architecture](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true)
 
 ## Core Capabilities
 
+
 - **Config-driven ingestion** using YAML in `config/api/`
 - **Dagster orchestration** for managing jobs, schedules, and sensors
 - **dbt project execution** through `analytics/dbt_runner.py` and dbt configuration files
@@ -21,18 +41,28 @@ DataHelm is a data engineering framework focused on the following:
 
 ## High-Level Architecture
 
+
+The repository follows layered responsibilities:
 The repository follows a layered responsibility structure:
 
 - `handlers/`: provider-specific source connectors and API handlers
-- `ingestion/`: ingestion factory + native ingestion implementations
+- `ingestion/`: ingestion factory and native ingestion implementations
 - `analytics/`: dbt, dashboard, and optional NL-query modules
 - `dagster_op/`: orchestration objects (jobs, schedules, repository)
-- `config/`: all runtime configuration (api, dbt, dashboard, analytics metadata)
+- `config/`: all runtime configuration (API, dbt, dashboard, analytics metadata)
 - `tests/`: unit tests for handlers, ingestion, analytics, and scripts
 
+![alt text](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true)
+
 ## Repository Structure
 
+
 ```text
+dagster_op/
+ingestion/
+tests/
+scripts/
+docs/
 config/
   api/
   dbt/
@@ -57,6 +87,7 @@ docs/
 
 ## Local Setup
 
+
 ### Prerequisites
 
 Python 3.12+
@@ -76,6 +107,7 @@ pip install -e .
 
 ### Environment Variables
 
+Create a `.env` file in the repository root with required values, for example:
 Create a file named `.env` in the root of the repository with the required values, for example:
 
 ```text
@@ -103,14 +135,16 @@ python scripts/run_dagster_dev.py --print-only
 
 ## Configuration Model
 
+
+### Ingestion Config (`config/api/*.yaml`)
 ### Ingestion Config (config/api/*.yaml)
 
 Defines source-level extraction, publish targets, schedules, and column mapping.
 Example included: CLASHOFCLANS_PLAYER_STATS
 
 ### dbt Config (config/dbt/projects.yaml)
 
-Defines dbt units, selection/exclusion rules, vars, and schedules.
+Defines dbt units, selection/exclusion rules, variables, and schedules.
 
 ### Dashboard Config (config/dashboard/projects.yaml)
 
@@ -122,6 +156,32 @@ Defines dataset metadata for the isolated NL-to-SQL module.
 
 ## Reusable Connectors
 
+
+The repository includes reusable connector classes under `handlers/`:
+
+- `handlers/sharepoint/sharepoint.py`
+  - Microsoft Graph authentication and site/file access helpers
+- `handlers/gcs/gcs.py`
+  - Upload/download/list/delete/signed URL helpers
+- `handlers/s3/s3.py`
+  - Upload/download/list/delete/presigned URL helpers
+- `handlers/bigquery/bigquery.py`
+  - Query, row fetch, dataframe load, schema helpers
+
+## Local LLM Analytics Module
+
+
+`analytics/nl_query/` is an isolated module for natural-language-to-SQL generation using local Ollama:
+
+- Semantic catalog loader
+- SQL read-only safety guard
+- Ollama client wrapper
+- Orchestration service
+
+## Testing
+
+
+Run all tests:
 The repository includes reusable connector classes under handlers/:
 
 handlers/sharepoint/sharepoint.py – Microsoft Graph auth + site/file access helpers
@@ -148,6 +208,17 @@ Run all tests with the following command:
 
 The current test suite includes coverage for:
 
+- Ingestion and handler behavior
+- Analytics factory and runner logic
+- Connector modules (SharePoint, GCS, S3, BigQuery)
+- Script behavior
+- NL-query safety and service paths
+
+## CI/CD and Branching
+
+
+- `dev`: integration branch
+- `master`: release/production branch
 * Ingestion and handler behavior
 * Analytics factory and runner logic
 * Connector modules (SharePoint, GCS, S3, BigQuery)
@@ -167,6 +238,10 @@ Workflows:
 
 ## Containerization
 
+
+The container image is defined via `Dockerfile`.
+
+The default runtime command starts Dagster gRPC:
 Container image is defined via Dockerfile.
 
 Default runtime command starts the Dagster gRPC server:
@@ -177,13 +252,25 @@ python -m dagster api grpc -m dagster_op.repository
 
 ## Deployment
 
+
 Deployment flow is workflow-based:
 
+- Production auto-path after successful Docker release
+- Manual staging/production dispatch path
+
+## Contributing and Governance
+
+
+- Contribution guide: [`CONTRIBUTING.md`](CONTRIBUTING.md)
+- Code of conduct: [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md)
+- Security reporting: [`SECURITY.md`](SECURITY.md)
 * Production auto-path after successful Docker release
 * Manual staging/production dispatch path
 
 ## Detailed Technical Documentation
 
+
 For complete, long-form project documentation (operations, architecture, and runbook-style details), see:
 
+- [`docs/document.md`](docs/document.md)
 docs/document.md