X-Adaptive EDA

Explore. Adapt. Understand.

An adaptive data analysis tool that learns your priorities and recommends the most relevant analyses.

Quick Start • Features • Demo • Architecture • Contributing

What is X-Adaptive EDA?

X-Adaptive EDA is a Streamlit-based exploratory data analysis tool that goes beyond static reporting. It adapts to how you work — learning from your feedback, prioritizing what matters to you, and explaining why each recommendation scored the way it did.

Upload a dataset → Get intelligent recommendations → Explore with interactive charts → Chat with your data → Your preferences evolve as you go.

Demo

Screenshots coming soon. Run locally to see the full experience.

Features

Core Analytics

8 Analysis Types — Distribution, Correlation, Missing Values, Categorical, Outliers, Time Series, Clustering, Feature Importance
Adaptive Scoring — Recommendations learn from your feedback and adjust in real-time
Explainable Recommendations — Every score decomposes into its components with confidence intervals
Interactive Visualizations — Plotly charts with zoom, pan, hover, and download

Intelligence

AI-Powered Insights — LLM-generated observations for each analysis (Ollama, OpenRouter, Groq, or Custom API)
Chat with Your Data — Ask natural language questions about your dataset
Smart Column Naming — AI suggests names for unnamed columns
NLQ Classifier — Understands queries like "show me outliers in revenue"

Adaptation

Preference Tracking — 👍/👎 feedback permanently adjusts analysis priorities
Temporal Decay — Older preferences fade over time
Novelty Dampening — Avoids repeating the same analyses
Column Affinity — Boosts analyses involving columns you frequently explore
ε-Greedy Exploration — Occasionally shows unexpected analyses to discover new insights

Data Quality

10-Step Quality Pipeline — Normalizes, deduplicates, infers types, and scores your data
Per-Row Outlier Explainability — See which column triggered each outlier and why
Progressive Sampling — Large datasets (>50k rows) offer stratified sampling to ~10k

Developer Experience

Session Persistence — Save/load via SQLite
68 Tests — Comprehensive test suite
Rate Limiting — Remote API calls capped at 10/minute
GPU Acceleration — Ollama auto-uses GPU with CPU fallback

Why This Project Exists

Most EDA tools give you static reports. X-Adaptive EDA does three things differently:

It learns — Every 👍/👎 shifts future recommendations toward what you care about
It explains — No black boxes. Every score shows its formula. Counterfactual sliders let you ask "what if?"
It adapts in real-time — No waiting for retraining. Feedback takes effect immediately.

This makes it ideal for:

Data scientists doing exploratory analysis
Analysts who need quick, relevant insights
Students learning data analysis
Teams exploring unfamiliar datasets

Tech Stack

Layer	Technology
UI	Streamlit (≥1.36)
Data	pandas, NumPy
Visualization	Plotly
LLM	Ollama (local), OpenRouter, Groq, Custom API
NLP	Custom tokenizer + stemmer (no external deps)
Persistence	SQLite, JSON
Testing	pytest-compatible test files

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Streamlit UI (app.py)                │
│  Sidebar: Dataset • Priorities • AI • Sessions          │
│  Main: Recommendations • Visualizations • Chat          │
└─────────┬───────────────────────────────────┬───────────┘
          │                                   │
    ┌─────▼─────┐                     ┌───────▼───────┐
    │   Data     │                     │  Recommendation│
    │  Processor │                     │    Engine      │
    │  + Quality │                     │  (scoring,     │
    │  Pipeline  │                     │   ranking)     │
    └─────┬─────┘                     └───────┬───────┘
          │                                   │
    ┌─────▼─────┐                     ┌───────▼───────┐
    │    LLM     │                     │   Preference  │
    │   Adapter  │                     │    Tracker    │
    │  (insights, │                     │  (adaptation) │
    │   chat)    │                     └───────────────┘
    └───────────┘

Data Flow

Upload → Cleanse → Profile → Score → Rank → Visualize → Feedback → Adapt
                ↓                              ↑
          Quality Report              Counterfactual Slider

Quick Start

Prerequisites

Python 3.10+
(Optional) Ollama for local LLM features

Installation

# Clone the repository
git clone https://github.com/AshayK003/XadaptiveEDA.git
cd XadaptiveEDA

# Create virtual environment
python -m venv venv
.\venv\Scripts\activate      # Windows
# source venv/bin/activate   # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Run the app
streamlit run app.py

Open http://localhost:8501 in your browser.

Environment Setup (Optional)

For LLM features, copy .env.example to .env and add your API keys:

cp .env.example .env
# Edit .env with your keys

No API key needed for local Ollama — just install and run.

Usage

Basic Workflow

Upload a CSV, Excel, or JSON file
Rename unnamed columns (AI suggestions or manual)
Finalize to generate the full analysis
Explore recommended analyses ranked by relevance
Give feedback (👍/👎) to refine future recommendations
Chat with your data in natural language

Example Session

# The app runs via Streamlit — no Python code needed
# Just run:
streamlit run app.py

# Then in the browser:
# 1. Upload sales_data.csv
# 2. Click "Finalize Dataset"
# 3. Click 👍 on "Distribution Analysis"
# 4. Ask: "What's the correlation between price and quantity?"

Expert Mode

Toggle Dev Mode in the sidebar to reveal:

Raw DataFrame viewer
CSV download button
Full recommendation JSON with all scoring components

Configuration

Analysis Goals

Choose a preset goal to automatically weight analysis types:

Goal	Distribution	Correlation	Missing	Categorical	Outliers	Time Series	Clustering	Feature Imp
General	0.5	0.5	0.5	0.5	0.5	0.5	0.5	0.5
Distributions	0.9	0.3	0.3	0.3	0.8	0.3	0.3	0.4
Relationships	0.3	0.9	0.3	0.5	0.3	0.8	0.6	0.7
Data Quality	0.3	0.3	0.9	0.7	0.5	0.3	0.3	0.5

Scoring Formula

final_score = base_score × data_relevance × user_pref × quality_adj
            × diversity_penalty × novelty_penalty × avoidance_penalty × affinity_boost

All multipliers are documented in recommendation_engine.py.

LLM Providers

Provider	Key	Default Model	Rate Limit
Local (Ollama)	None	qwen2.5-coder:7b	Unlimited
OpenRouter	`OPENROUTER_API_KEY`	qwen/qwen2.5-7b-instruct	10/60s
Groq	`GROQ_API_KEY`	llama-3.3-70b-versatile	10/60s
Custom	`CUSTOM_API_KEY` + endpoint	Configurable	10/60s

Project Structure

x-adaptive-eda/
├── app.py                    # Streamlit UI (orchestration, ~970 lines)
├── data_processor.py         # File loading, cleansing, profiling
├── data_quality.py           # 10-step quality pipeline, QualityReport
├── recommendation_engine.py  # Scoring, ranking, penalties, bootstrap CI
├── preference_learner.py     # Fixed-delta adaptation, goals, decay
├── insight_generator.py      # Explainable recommendations, comparisons
├── visualization_generator.py# Plotly charts (8 types, k-means, MI)
├── constants.py              # Analysis types, preferences, goals
├── llm_adapter.py            # LLM integration, rate limiting, chat
├── nlq_engine.py             # NLP query classifier (no external deps)
├── session_persistence.py    # SQLite save/load for sessions
├── requirements.txt          # 6 dependencies
├── .env.example              # Environment variable template
├── LICENSE                   # MIT License
├── README.md                 # This file
├── test_phase1.py            # Core tests (diversity, tracking, explanations)
├── test_phase2.py            # Column interestingness, sampling, summary
├── test_phase3.py            # Goals, decay, save/load
├── test_phase4.py            # NLQ classifier (stemming, synonyms, TF)
├── test_data_quality.py      # 12 quality pipeline tests
├── test_session_persistence.py # SQLite persistence tests
└── test_rate_limit.py        # Rate limiting tests

Development Setup

# Install in development mode
pip install -r requirements.txt

# Run tests
python test_phase1.py && python test_phase2.py && python test_phase3.py && python test_phase4.py && python test_data_quality.py && python test_session_persistence.py && python test_rate_limit.py

# Run the app
streamlit run app.py

Code Style

snake_case for functions/variables
PascalCase for classes
Docstrings on all public functions
Structured logging via logging.getLogger(__name__)
No print() in source files (only in tests)

Testing

68 tests across 7 test files:

File	Tests	Coverage
test_phase1.py	5	Diversity, tracking, explanations, regression
test_phase2.py	5	Column interestingness, sampling, summary
test_phase3.py	7	Goals, decay, save/load round-trip
test_phase4.py	12	NLQ classifier (stemming, synonyms, TF scoring)
test_data_quality.py	12	10-step quality pipeline
test_session_persistence.py	7	SQLite save/load/list/delete
test_rate_limit.py	5	Rate limiting (local, remote, separate providers)

# Run all tests
python test_phase1.py && python test_phase2.py && python test_phase3.py && python test_phase4.py && python test_data_quality.py && python test_session_persistence.py && python test_rate_limit.py

Roadmap

Completed

Planned

Spearman/Kendall correlation options
Custom k-means cluster count slider
Export analysis report as PDF/HTML
Multi-dataset comparison
Dashboard mode (persistent charts)
Plugin system for custom analysis types
Collaborative sessions (multi-user)

Contributing

Contributions welcome! Here's how:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Guidelines

Follow existing code style
Add tests for new features
Update README if needed
Keep PRs focused (one feature per PR)

License

This project is licensed under the MIT License — see LICENSE for details.

Acknowledgements

Streamlit — Web framework
Plotly — Interactive visualizations
Ollama — Local LLM hosting
pandas — Data manipulation

FAQ

Q: Do I need an API key to use this? A: No. Local Ollama works without any API keys. API keys are only needed for OpenRouter, Groq, or Custom API providers.

Q: What file formats are supported? A: CSV, XLSX, XLS, and JSON files up to ~50 MB.

Q: How does the adaptation work? A: Fixed-delta adjustments (not ML). 👍 adds +0.10, 👎 subtracts -0.10, column selection adds +0.03. All weights stay in [0.1, 1.0].

Q: Can I save my session? A: Yes. Click "Save Session" in the sidebar. Sessions persist in SQLite at ~/.eda_assistant_sessions.db.

Q: How accurate are the AI insights? A: Insights are generated from your actual data values — no pre-written templates. Quality depends on the LLM provider and model used.

Q: Is my data sent to external servers? A: Only if you use OpenRouter, Groq, or Custom API. Local Ollama keeps everything on your machine.

Troubleshooting

Issue	Solution
App won't start	Check Python version (3.10+), run `pip install -r requirements.txt`
Ollama not reachable	Run `ollama serve` in a terminal
GPU not detected	Install NVIDIA drivers, restart Ollama
Slow LLM responses	Use CPU mode: `set OLLAMA_NUM_GPU=0` before starting Ollama
Large file warning	Files >50 MB may be slow; use sampling for datasets >50k rows
Import errors	Ensure virtual environment is activated

Security

API keys stored in .env (gitignored)
No hardcoded secrets in source code
Parameterized SQL queries (no injection risk)
Local Ollama keeps data on your machine
Remote API calls rate-limited to 10/60s

Built with ❤️ for the data community

⭐ Star this repo • 🐛 Report Bug • 💡 Request Feature • ☕ Support the developer

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
constants.py		constants.py
data_processor.py		data_processor.py
data_quality.py		data_quality.py
insight_generator.py		insight_generator.py
llm_adapter.py		llm_adapter.py
nlq_engine.py		nlq_engine.py
preference_learner.py		preference_learner.py
recommendation_engine.py		recommendation_engine.py
requirements.txt		requirements.txt
session_persistence.py		session_persistence.py
test_data_quality.py		test_data_quality.py
test_phase1.py		test_phase1.py
test_phase2.py		test_phase2.py
test_phase3.py		test_phase3.py
test_phase4.py		test_phase4.py
test_rate_limit.py		test_rate_limit.py
test_session_persistence.py		test_session_persistence.py
visualization_generator.py		visualization_generator.py

Folders and files

Latest commit

History

Repository files navigation

X-Adaptive EDA

Explore. Adapt. Understand.

What is X-Adaptive EDA?

Demo

Features

Core Analytics

Intelligence

Adaptation

Data Quality

Developer Experience

Why This Project Exists

Tech Stack

Architecture

Data Flow

Quick Start

Prerequisites

Installation

Environment Setup (Optional)

Usage

Basic Workflow

Example Session

Expert Mode

Configuration

Analysis Goals

Scoring Formula

LLM Providers

Project Structure

Development Setup

Code Style

Testing

Roadmap

Completed

Planned

Contributing

Guidelines

License

Acknowledgements

FAQ

Troubleshooting

Security

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages