SQL Agent LLMOps

Ask anything about your data. Three fine-tuned LLMs convert plain English to SQL, run it on DuckDB, and render insight-driven charts.

Live demo · Models · Datasets · Architecture · Training · Quickstart

Team · Daniel Regalado Cardoso · Nefeli Zafeiri · Oliver Mazariegos · Eleniz Espina MSBA · University of Miami · 2026

🤗 Models on Hugging Face Hub

Model	Base	Status	Hub
SQL Generator	Qwen2.5-Coder-7B-Instruct	Trained · loss 0.27 · 672 k examples · 13.5 h on L40S	`sql-generator-qwen25-coder-7b-lora`
Chart Reasoner	Phi-3-Mini-4k-Instruct	Trained	merged · adapter only
SVG Renderer	DeepSeek-Coder-1.3B-Instruct	Trained	`svg-renderer-deepseek-coder-1.3b-lora`

📦 Datasets on Hugging Face Hub

Dataset	Rows	Purpose	Hub
text-to-sql-mix-v2	761,155	NL → SQL training (10 sources merged)	`text-to-sql-mix-v2`
chart-reasoning-mix-v1	~75,000	Chart spec reasoning (nvBench + GPT distillation)	`chart-reasoning-mix-v1`
svg-chart-render-v1	~25,000	Chart spec → inline SVG	`svg-chart-render-v1`

Full dataset documentation: DATASETS.md

Overview

SQL Agent LLMOps is a production-grade text-to-SQL agent. Upload a CSV, JSON, Parquet or Excel file, ask a question in English, and get back the SQL, the result table, an inline chart, and a 1-2 sentence analyst-style finding — all powered by three fine-tuned LoRAs running on Hugging Face ZeroGPU.

Design principles:

Multi-model orchestration — one specialist per task (SQL, chart spec, SVG), not one giant monolith
In-memory only — uploaded data lives in DuckDB RAM, never written to disk
Reproducible training — every dataset, training script, and model card is open-sourced
Apple × Claude UI — minimalist 2-column layout, monochrome with a single warm accent, light/dark theme
Self-correcting SQL — if a query fails, the agent retries up to 3× with the error fed back to the model

Live demo

🚀 huggingface.co/spaces/DanielRegaladoCardoso/sql-agent

Sign in to Hugging Face for full ZeroGPU quota (~25 min/day on Pro), or click Demo to try a sample dataset without uploading.

What you get for each query

The generated SQL with syntax highlighting and one-click copy
The result table as a downloadable CSV
An inline SVG chart with a downloadable standalone version
A short analyst narrative explaining the most interesting finding
Visible pipeline stages (Generating SQL → Running query → Designing chart → Rendering)

Architecture

flowchart LR
    U1["CSV / JSON / Parquet / XLSX"]
    U2["NL question"]

    SX["Schema Extractor"]
    DB[("DuckDB<br/>in-memory")]

    ORCH{{"Orchestrator"}}

    M1["SQL Generator<br/>Qwen 2.5 Coder 7B<br/>+ LoRA r=16"]
    M2["Chart Reasoner<br/>Phi-3 Mini 3.8B<br/>+ LoRA r=16"]
    M3["SVG Renderer<br/>DeepSeek Coder 1.3B<br/>+ LoRA r=16"]
    NARR["Narrator<br/>(Qwen, reused)"]

    OUT["Final UI<br/>chart + narration + downloads"]
    FB["Plotly fallback<br/>(themed)"]

    U1 --> SX --> DB
    U2 --> ORCH
    DB --> ORCH
    ORCH --> M1 -->|SQL| DB
    DB -->|result set| M2
    M2 -->|chart spec| M3 --> OUT
    M3 -.->|on invalid SVG| FB --> OUT
    DB --> NARR --> OUT

    classDef input fill:#FAFAF9,stroke:#5A5A5A,color:#0E0E0E
    classDef ingest fill:#FFFFFF,stroke:#C96442,color:#0E0E0E
    classDef orch fill:#FFFFFF,stroke:#0E0E0E,color:#0E0E0E
    classDef model fill:#FFFFFF,stroke:#C96442,color:#0E0E0E
    classDef output fill:#C96442,stroke:#0E0E0E,color:#FAFAF9
    classDef fallback fill:#FAFAF9,stroke:#5A5A5A,color:#5A5A5A

    class U1,U2 input
    class SX,DB ingest
    class ORCH orch
    class M1,M2,M3,NARR model
    class OUT output
    class FB fallback

Flow per query:

User uploads data; DuckDB ingests it as a typed table; the schema extractor produces CREATE-TABLE statements + sample rows + distinct-value hints for low-cardinality columns
SQL Generator (Qwen 7B + LoRA) takes the schema + question and emits SQL
DuckDB validates and executes the query; on failure, the agent retries (up to 3×) with the error fed back to the model
Chart Reasoner (Phi-3 + LoRA) picks chart_type, x/y columns, title from the result set
SVG Renderer (DeepSeek + LoRA) produces inline SVG; if invalid, the themed Plotly fallback takes over
Narrator (Qwen, reused) writes a 1-2 sentence analyst finding
UI renders everything with downloads, syntax-highlighted SQL, and collapsible data table

Quick Start

Run the live demo

No install needed. Open the Space: 👉 https://huggingface.co/spaces/DanielRegaladoCardoso/sql-agent

Run locally

git clone https://github.com/DanielRegaladoUMiami/sql-agent-llmops.git
cd sql-agent-llmops

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

python app.py

Visit http://localhost:7860.

Note: a 16 GB Mac will OOM trying to load all three 7B/3.8B/1.3B models. For local dev use base models in 4-bit, or work directly via the live HF Space.

Inference with the SQL Generator only

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-Coder-7B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)
model = PeftModel.from_pretrained(
    base,
    "DanielRegaladoCardoso/sql-generator-qwen25-coder-7b-lora",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct")

messages = [
    {"role": "system", "content": "You are a SQL expert. Return only the SQL."},
    {"role": "user", "content": "### Schema\nCREATE TABLE players (id INT, name VARCHAR, hometown VARCHAR);\n\n### Question\nList all players from Tampa, Florida."},
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(input_ids, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))

Project structure

sql-agent-llmops/
├── app.py                       # Gradio app (HF Spaces entry point)
├── README.md                    # This file
├── README_HF_SPACE.md           # README with YAML frontmatter for the Space
├── DATASETS.md                  # Full dataset documentation
├── QUICKSTART.md                # Step-by-step setup + reproduction
├── CONTRIBUTING.md              # Contribution guide
├── LICENSE                      # Apache 2.0
├── requirements.txt             # Pinned runtime dependencies
├── pyproject.toml               # Package config
├── Dockerfile + docker-compose.yml
│
├── src/
│   ├── orchestrator/
│   │   └── pipeline.py          # 3-model pipeline + self-correction loop
│   ├── models/
│   │   ├── sql_generator.py     # Qwen 7B + LoRA via PeftModel
│   │   ├── chart_reasoner.py    # Phi-3 + LoRA
│   │   ├── svg_renderer.py      # DeepSeek + LoRA, with Plotly fallback
│   │   └── base.py
│   ├── rag/
│   │   ├── schema_extractor.py  # DuckDB DESCRIBE + distinct-value hints
│   │   └── engine.py
│   ├── utils/
│   │   ├── sql_executor.py      # In-memory DuckDB
│   │   └── logger.py
│   ├── data_processing/
│   │   └── loader.py            # CSV/JSON/Parquet/XLSX ingestion
│   └── visualization/
│       ├── plotly_fallback.py   # Themed Plotly renderer
│       └── svg_theme.py         # SVG post-processor (responsive, themed)
│
├── configs/
│   ├── model_config.yaml
│   ├── training_config.yaml
│   └── deployment_config.yaml
│
├── training/
│   ├── data_pipelines/          # UV scripts that build the 3 datasets
│   ├── jobs/                    # HF Jobs training scripts (production)
│   ├── notebooks/               # Colab notebooks (exploratory)
│   ├── sql_generator/
│   ├── chart_reasoner/
│   └── svg_renderer/
│
└── tests/
    ├── test_orchestrator.py
    ├── test_plotly_fallback.py
    ├── test_schema_extractor.py
    └── test_sql_executor.py

Tech stack

Layer	Choice	Why
UI	Gradio 5.12, custom CSS theme	Apple/Claude minimalism; light/dark via `prefers-color-scheme`
Inference runtime	HF Spaces ZeroGPU (half-H200, 70 GB VRAM)	Free, on-demand GPU per request
Model loading	`transformers` + `peft.PeftModel` at module level	Required ZeroGPU pattern (CUDA emulation at startup, real GPU inside `@spaces.GPU`)
Analytics engine	DuckDB in-memory	Native CSV/JSON/Parquet read, ANSI SQL, 10× faster than SQLite for analytics
Training	Unsloth QLoRA (4-bit base + r=16 adapter)	2× faster, fits 7 B on a single L40S
Training infra	Hugging Face Jobs	Pay-per-second L40S, no Colab session limits
Visualization	Plotly + custom SVG post-processor	Consistent theming across model output and fallback
Privacy	DuckDB ephemeral, no persistence	Uploaded data lives in RAM only

Training

All three models are fine-tuned with Unsloth (4-bit QLoRA) + TRL SFTTrainer. Production training runs on Hugging Face Jobs; exploratory work happens in training/notebooks/.

1. SQL Generator — Qwen2.5-Coder-7B

Setting	Value
Dataset	`text-to-sql-mix-v2` — 761,155 rows
Examples used (after seq-len filter ≤ 1024)	672,949 (93.1%)
Sequences after packing	154,462
LoRA	r=16, α=32, on `q/k/v/o/gate/up/down_proj`
Hardware	1× NVIDIA L40S (48 GB) on HF Jobs
Throughput	4.93 s/step (effective batch 16, seq 1024)
Total steps	9,654 (1 epoch)
Wall-clock time	13.5 hours
Final training loss	0.2658
Cost	~$24

Reproduce: training/jobs/train_sql_generator_job.py Launch instructions: training/jobs/README.md

2. Chart Reasoner — Phi-3-Mini-3.8B

Dataset: chart-reasoning-mix-v1 — ~75 k rows from nvBench (25 k real NL/chart pairs) plus GPT-4.1-nano knowledge distillation over text-to-sql-mix-v2 (50 k pairs generated with a Tufte/Knaflic/Few storytelling prompt)
Output: structured JSON spec (chart_type, encoding, title, sort, color_strategy, annotations, rationale)
Build script: training/data_pipelines/build_chart_mix.py

3. SVG Renderer — DeepSeek-Coder-1.3B

Dataset: svg-chart-render-v1 — ~25 k (chart_spec → SVG) pairs from nvBench configs re-rendered via matplotlib's SVG backend, plus chart-shaped SVGs filtered from umuthopeyildirim/svgen-500k
Output: inline SVG; falls back to themed Plotly when invalid
Build script: training/data_pipelines/build_svg_mix.py

Cost summary

Stage	Compute	Cost
SQL Generator training	HF Jobs L40S, 13.5 h	~$24
Chart Reasoner training	Colab / HF Jobs	~$3
SVG Renderer training	Colab / HF Jobs	~$1
Chart dataset OpenAI synthesis	gpt-4.1-nano Batch API, 50 k	~$2.50
Total training spend		~$30
Inference hosting	HF Spaces ZeroGPU	$0

Engineering notes

A few non-obvious things we learned shipping this:

Load models at module level on cuda, not lazily. ZeroGPU's CUDA emulation at startup makes module-level placement much faster than loading inside @spaces.GPU. Lazy loading was burning 30-60 s of GPU quota per query.
Apply LoRAs explicitly via PeftModel.from_pretrained(base, adapter_repo). Letting AutoModelForCausalLM auto-detect the adapter triggered a base/adapter rank mismatch (Unsloth's bnb-4bit base has a different scaffolding rank).
Pin Gradio SDK ≥ 5.12. Earlier versions tracked logged-in users by IP and bucketed them with anonymous, which broke ZeroGPU quota attribution. Bumping to 5.12 fixed this.
Skip hf_oauth: true. When ZeroGPU is configured natively, adding OAuth on top conflicts with the quota service; visitors authenticated via the parent huggingface.co session work better.
DuckDB > SQLite. Native CSV/JSON/Parquet ingestion, faster analytics, ANSI SQL — and the schema extractor is much cleaner with DESCRIBE + SHOW TABLES.
Sequence packing in TRL compressed our 723 k training rows into 154 k sequences, cutting training time by ~4×.
SQL self-correction with the error in context turned a ~80 % accuracy pipeline into ~95 % at zero extra training cost.

Data and Privacy

In-memory only — uploaded data lives in DuckDB RAM, never persisted to disk
No telemetry — no logging, no analytics
No retraining on user data — models are frozen between releases
Ephemeral sessions — everything is wiped when the Space restarts

Deployment

Hugging Face Spaces (current production)

The live Space at DanielRegaladoCardoso/sql-agent is configured with:

sdk: gradio
sdk_version: 5.12.0
hardware: zero-a10g
app_file: app.py

To replicate:

Fork this repo
Create a new Space → Gradio template
Link to your fork
Select ZeroGPU hardware (requires HF Pro)
Auto-deploys on push

Docker

docker build -t sql-agent .
docker run -p 7860:7860 -e HF_TOKEN=$HF_TOKEN sql-agent

Roadmap

Things on the next sprint:

Multi-turn conversation memory ("now by region", "filter to 2024")
Statistical summary panel on dataset upload (count / distinct / nulls / min-max per column)
Anomaly detection ("there's a 30% null rate in revenue")
Eval harness on Spider / WikiSQL / BIRD test splits
Saved queries / report bookmarking
Visible planning step before SQL ("I'll: filter Q4 → group by region → sort by total")

Contributions welcome — see CONTRIBUTING.md.

Citation

@misc{sqlagent2026,
  author       = {Regalado Cardoso, Daniel and Zafeiri, Nefeli and Mazariegos, Oliver and Espina, Eleniz},
  title        = {SQL Agent LLMOps: Multi-Model Orchestration for Text-to-SQL with Visualization},
  year         = {2026},
  institution  = {MSBA, University of Miami},
  howpublished = {\url{https://github.com/DanielRegaladoUMiami/sql-agent-llmops}},
}

License

Apache 2.0 — see LICENSE.

Acknowledgments

Unsloth — 2× faster QLoRA training
TRL — SFTTrainer, sequence packing
Hugging Face Jobs — training infrastructure
DuckDB — in-memory analytics
Gradio — UI framework
Qwen, Microsoft, DeepSeek teams — base models
All authors of the source datasets (see DATASETS.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Agent LLMOps

🤗 Models on Hugging Face Hub

📦 Datasets on Hugging Face Hub

Overview

Live demo

What you get for each query

Architecture

Quick Start

Run the live demo

Run locally

Inference with the SQL Generator only

Project structure

Tech stack

Training

1. SQL Generator — Qwen2.5-Coder-7B

2. Chart Reasoner — Phi-3-Mini-3.8B

3. SVG Renderer — DeepSeek-Coder-1.3B

Cost summary

Engineering notes

Data and Privacy

Deployment

Hugging Face Spaces (current production)

Docker

Roadmap

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
configs		configs
presentation		presentation
src		src
tests		tests
training		training
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DATASETS.md		DATASETS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README_HF_SPACE.md		README_HF_SPACE.md
app.py		app.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SQL Agent LLMOps

🤗 Models on Hugging Face Hub

📦 Datasets on Hugging Face Hub

Overview

Live demo

What you get for each query

Architecture

Quick Start

Run the live demo

Run locally

Inference with the SQL Generator only

Project structure

Tech stack

Training

1. SQL Generator — Qwen2.5-Coder-7B

2. Chart Reasoner — Phi-3-Mini-3.8B

3. SVG Renderer — DeepSeek-Coder-1.3B

Cost summary

Engineering notes

Data and Privacy

Deployment

Hugging Face Spaces (current production)

Docker

Roadmap

Citation

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages