Skip to content

AsimAftab/DataSage-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataSage AI

Production-grade, local-first, multi-agent Data Intelligence system for deep EDA, data diagnostics, and model strategy recommendations.

What this repository contains

  • Product and architecture documentation
  • Agent-level specifications
  • Implementation roadmap broken into delivery phases

Core goals

  • Ingest CSV, Parquet, and SQL datasets
  • Run advanced EDA and data health diagnostics
  • Detect:
    • Missing values
    • Skewness
    • Class imbalance
    • Multicollinearity (VIF)
    • Outliers (IQR + Z-score)
    • Target leakage risk
    • Feature drift (PSI + KS test)
  • Recommend:
    • Feature engineering
    • Encoding strategy
    • Scaling approach
    • Model family (regression / classification / time series)
  • Generate:
    • Structured EDA report
    • Modeling recommendation report
    • Executive summary
    • Data quality score

Constraints

  • Local-first runtime
  • Ollama LLMs (llama3, mistral, phi3)
  • No paid APIs required
  • Local vector DB (FAISS or Chroma)
  • Multi-agent orchestration with LangGraph
  • State/memory management and reasoning logs
  • Cloud-scalable design in future

Documentation Index

  • docs/01_product_scope.md
  • docs/02_system_architecture.md
  • docs/03_agent_contracts.md
  • docs/04_orchestration_memory_logging.md
  • docs/05_data_quality_scoring.md
  • docs/06_implementation_phases.md
  • docs/07_deployment_evolution.md
  • docs/08_cloud_api_worker_scaffold.md
  • docs/09_run_guide.md

Quick Start (Phase 0/1/2/3/4/5/6)

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -e .[dev,sql]

Run profiling for CSV:

datasage-ai --source-type csv --path .\data\sample.csv --target-column target

Run profiling for Parquet:

datasage-ai --source-type parquet --path .\data\sample.parquet

Run profiling for SQL:

datasage-ai --source-type sql --connection-uri "sqlite:///./data/example.db" --sql-query "SELECT * FROM table_name"

Run with reference dataset for drift detection:

datasage-ai --source-type csv --path .\data\current.csv --reference-path .\data\reference.csv --target-column target

Force LangGraph orchestration:

datasage-ai --source-type csv --path .\data\sample.csv --orchestrator langgraph

Run tests:

python -m pytest -q

Outputs are generated under runs/<run_id>/artifacts and logs under runs/<run_id>/logs.

Run API locally:

pip install -e .[api]
uvicorn datasage_ai.api.app:app --host 0.0.0.0 --port 8000

Open UI:

http://localhost:8000/ui

Run worker locally:

datasage-ai-worker

Start API + worker with Docker:

docker compose up --build

Phase 2 adds statistical diagnostics artifact:

  • runs/<run_id>/artifacts/statistics_report.json

Phase 3 adds orchestration/memory artifacts:

  • runs/<run_id>/artifacts/executive_summary.md
  • runs/<run_id>/artifacts/run_payload.json
  • runs/_index/run_history.db
  • runs/_memory/vector_memory.jsonl

Phase 4 adds drift and quality artifacts:

  • runs/<run_id>/artifacts/drift_report.json
  • runs/<run_id>/artifacts/quality_scorecard.json

Phase 5 adds recommendation and stakeholder reports:

  • runs/<run_id>/artifacts/model_recommendation_report.json
  • runs/<run_id>/artifacts/model_recommendation_report.md
  • runs/<run_id>/artifacts/model_recommendation_report.html
  • runs/<run_id>/artifacts/stakeholder_summary.json
  • runs/<run_id>/artifacts/stakeholder_summary.md
  • runs/<run_id>/artifacts/stakeholder_summary.html

Phase 6 adds hardening and regression artifacts:

  • runs/<run_id>/artifacts/run_comparison.json
  • runs/<run_id>/artifacts/run_comparison.md
  • Structured error capture in run state (errors) with retry/backoff execution policy

Phase 7 adds cloud-ready scaffolding:

  • FastAPI service endpoints for sync and async execution
  • SQLite-backed worker queue for API/worker split
  • Storage abstraction layer for local/object-store artifact backends
  • Dockerfile and docker-compose.yml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors