diff --git a/README.md b/README.md index 8ed99e2..820cb00 100644 --- a/README.md +++ b/README.md @@ -1,27 +1,27 @@ # DataHelm -DataHelm is a data engineering framework focused on: +DataHelm is a data engineering framework focused on the following: -- source ingestion orchestration +- source ingestion and orchestration - dbt transformation workflows - notebook-based dashboard execution -- reusable provider connectors (SharePoint, GCS, S3, BigQuery) -- optional local-LLM analytics query scaffolding +- reusable provider connectors (SharePoint, GCS, S3, and BigQuery) +- optional local LLM analytics query scaffolding -![alt text](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true) +![DataHelm Architecture](https://github.com/DevStrikerTech/datahelm/blob/master/docs/architecture.png?raw=true) ## Core Capabilities - **Config-driven ingestion** using YAML in `config/api/` -- **Dagster orchestration** for jobs, schedules, and sensors -- **dbt project execution** through `analytics/dbt_runner.py` and dbt configs +- **Dagster orchestration** for managing jobs, schedules, and sensors +- **dbt project execution** through `analytics/dbt_runner.py` and dbt configuration files - **Dashboard generation** with Dagstermill notebooks - **Reusable handlers/connectors** for multiple external providers - **Optional NL-to-SQL module** (`analytics/nl_query/`) for local Ollama-based analytics workflows ## High-Level Architecture -The repository follows layered responsibilities: +The repository follows a layered responsibility structure: - `handlers/`: provider-specific source connectors and API handlers - `ingestion/`: ingestion factory + native ingestion implementations @@ -60,7 +60,7 @@ docs/ ### Prerequisites - Python 3.12+ -- PostgreSQL (reachable from local environment) +- PostgreSQL (accessible from the local environment) - Optional: Docker, local Ollama, dbt CLI ### Installation @@ -74,7 +74,7 @@ pip install -e . ### Environment Variables -Create a `.env` file in repository root with required values, for example: +Create a `.env` file in the repository root with the required values, for example: ```env DB_HOST=${DB_HOST} @@ -91,7 +91,7 @@ CLASHOFCLANS_API_TOKEN=${CLASHOFCLANS_API_TOKEN} python scripts/run_dagster_dev.py ``` -Useful option: +Useful option for quick verification: ```bash python scripts/run_dagster_dev.py --print-only @@ -149,7 +149,7 @@ Run all tests: .venv/bin/python -m pytest -q ``` -Current suite covers: +The current test suite includes coverage for: - ingestion and handler behavior - analytics factory and runner logic @@ -172,7 +172,7 @@ Workflows: Container image is defined via `Dockerfile`. -Default runtime command starts Dagster gRPC: +Default runtime command starts the Dagster gRPC server: ```bash python -m dagster api grpc -m dagster_op.repository