Production-grade, local-first, multi-agent Data Intelligence system for deep EDA, data diagnostics, and model strategy recommendations.
- Product and architecture documentation
- Agent-level specifications
- Implementation roadmap broken into delivery phases
- Ingest
CSV,Parquet, andSQLdatasets - Run advanced EDA and data health diagnostics
- Detect:
- Missing values
- Skewness
- Class imbalance
- Multicollinearity (VIF)
- Outliers (IQR + Z-score)
- Target leakage risk
- Feature drift (PSI + KS test)
- Recommend:
- Feature engineering
- Encoding strategy
- Scaling approach
- Model family (regression / classification / time series)
- Generate:
- Structured EDA report
- Modeling recommendation report
- Executive summary
- Data quality score
- Local-first runtime
- Ollama LLMs (
llama3,mistral,phi3) - No paid APIs required
- Local vector DB (
FAISSorChroma) - Multi-agent orchestration with
LangGraph - State/memory management and reasoning logs
- Cloud-scalable design in future
docs/01_product_scope.mddocs/02_system_architecture.mddocs/03_agent_contracts.mddocs/04_orchestration_memory_logging.mddocs/05_data_quality_scoring.mddocs/06_implementation_phases.mddocs/07_deployment_evolution.mddocs/08_cloud_api_worker_scaffold.mddocs/09_run_guide.md
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -e .[dev,sql]Run profiling for CSV:
datasage-ai --source-type csv --path .\data\sample.csv --target-column targetRun profiling for Parquet:
datasage-ai --source-type parquet --path .\data\sample.parquetRun profiling for SQL:
datasage-ai --source-type sql --connection-uri "sqlite:///./data/example.db" --sql-query "SELECT * FROM table_name"Run with reference dataset for drift detection:
datasage-ai --source-type csv --path .\data\current.csv --reference-path .\data\reference.csv --target-column targetForce LangGraph orchestration:
datasage-ai --source-type csv --path .\data\sample.csv --orchestrator langgraphRun tests:
python -m pytest -qOutputs are generated under runs/<run_id>/artifacts and logs under runs/<run_id>/logs.
Run API locally:
pip install -e .[api]
uvicorn datasage_ai.api.app:app --host 0.0.0.0 --port 8000Open UI:
http://localhost:8000/ui
Run worker locally:
datasage-ai-workerStart API + worker with Docker:
docker compose up --buildPhase 2 adds statistical diagnostics artifact:
runs/<run_id>/artifacts/statistics_report.json
Phase 3 adds orchestration/memory artifacts:
runs/<run_id>/artifacts/executive_summary.mdruns/<run_id>/artifacts/run_payload.jsonruns/_index/run_history.dbruns/_memory/vector_memory.jsonl
Phase 4 adds drift and quality artifacts:
runs/<run_id>/artifacts/drift_report.jsonruns/<run_id>/artifacts/quality_scorecard.json
Phase 5 adds recommendation and stakeholder reports:
runs/<run_id>/artifacts/model_recommendation_report.jsonruns/<run_id>/artifacts/model_recommendation_report.mdruns/<run_id>/artifacts/model_recommendation_report.htmlruns/<run_id>/artifacts/stakeholder_summary.jsonruns/<run_id>/artifacts/stakeholder_summary.mdruns/<run_id>/artifacts/stakeholder_summary.html
Phase 6 adds hardening and regression artifacts:
runs/<run_id>/artifacts/run_comparison.jsonruns/<run_id>/artifacts/run_comparison.md- Structured error capture in run state (
errors) with retry/backoff execution policy
Phase 7 adds cloud-ready scaffolding:
- FastAPI service endpoints for sync and async execution
- SQLite-backed worker queue for API/worker split
- Storage abstraction layer for local/object-store artifact backends
Dockerfileanddocker-compose.yml