DataSage AI

Production-grade, local-first, multi-agent Data Intelligence system for deep EDA, data diagnostics, and model strategy recommendations.

What this repository contains

Product and architecture documentation
Agent-level specifications
Implementation roadmap broken into delivery phases

Core goals

Ingest CSV, Parquet, and SQL datasets
Run advanced EDA and data health diagnostics
Detect:
- Missing values
- Skewness
- Class imbalance
- Multicollinearity (VIF)
- Outliers (IQR + Z-score)
- Target leakage risk
- Feature drift (PSI + KS test)
Recommend:
- Feature engineering
- Encoding strategy
- Scaling approach
- Model family (regression / classification / time series)
Generate:
- Structured EDA report
- Modeling recommendation report
- Executive summary
- Data quality score

Constraints

Local-first runtime
Ollama LLMs (llama3, mistral, phi3)
No paid APIs required
Local vector DB (FAISS or Chroma)
Multi-agent orchestration with LangGraph
State/memory management and reasoning logs
Cloud-scalable design in future

Documentation Index

docs/01_product_scope.md
docs/02_system_architecture.md
docs/03_agent_contracts.md
docs/04_orchestration_memory_logging.md
docs/05_data_quality_scoring.md
docs/06_implementation_phases.md
docs/07_deployment_evolution.md
docs/08_cloud_api_worker_scaffold.md
docs/09_run_guide.md

Quick Start (Phase 0/1/2/3/4/5/6)

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -e .[dev,sql]

Run profiling for CSV:

datasage-ai --source-type csv --path .\data\sample.csv --target-column target

Run profiling for Parquet:

datasage-ai --source-type parquet --path .\data\sample.parquet

Run profiling for SQL:

datasage-ai --source-type sql --connection-uri "sqlite:///./data/example.db" --sql-query "SELECT * FROM table_name"

Run with reference dataset for drift detection:

datasage-ai --source-type csv --path .\data\current.csv --reference-path .\data\reference.csv --target-column target

Force LangGraph orchestration:

datasage-ai --source-type csv --path .\data\sample.csv --orchestrator langgraph

Run tests:

python -m pytest -q

Outputs are generated under runs/<run_id>/artifacts and logs under runs/<run_id>/logs.

Run API locally:

pip install -e .[api]
uvicorn datasage_ai.api.app:app --host 0.0.0.0 --port 8000

Open UI:

http://localhost:8000/ui

Run worker locally:

datasage-ai-worker

Start API + worker with Docker:

docker compose up --build

Phase 2 adds statistical diagnostics artifact:

runs/<run_id>/artifacts/statistics_report.json

Phase 3 adds orchestration/memory artifacts:

runs/<run_id>/artifacts/executive_summary.md
runs/<run_id>/artifacts/run_payload.json
runs/_index/run_history.db
runs/_memory/vector_memory.jsonl

Phase 4 adds drift and quality artifacts:

runs/<run_id>/artifacts/drift_report.json
runs/<run_id>/artifacts/quality_scorecard.json

Phase 5 adds recommendation and stakeholder reports:

runs/<run_id>/artifacts/model_recommendation_report.json
runs/<run_id>/artifacts/model_recommendation_report.md
runs/<run_id>/artifacts/model_recommendation_report.html
runs/<run_id>/artifacts/stakeholder_summary.json
runs/<run_id>/artifacts/stakeholder_summary.md
runs/<run_id>/artifacts/stakeholder_summary.html

Phase 6 adds hardening and regression artifacts:

runs/<run_id>/artifacts/run_comparison.json
runs/<run_id>/artifacts/run_comparison.md
Structured error capture in run state (errors) with retry/backoff execution policy

Phase 7 adds cloud-ready scaffolding:

FastAPI service endpoints for sync and async execution
SQLite-backed worker queue for API/worker split
Storage abstraction layer for local/object-store artifact backends
Dockerfile and docker-compose.yml

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
docs		docs
src/datasage_ai		src/datasage_ai
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataSage AI

What this repository contains

Core goals

Constraints

Documentation Index

Quick Start (Phase 0/1/2/3/4/5/6)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataSage AI

What this repository contains

Core goals

Constraints

Documentation Index

Quick Start (Phase 0/1/2/3/4/5/6)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages