A set of demos illustrating the journey from a cloud LLM (GPT) to a locally fine-tuned student model. The task throughout is insurance damage extraction: given a narrative description of property damage, extract structured JSON with damage indicators and severity.
| Demo | What it shows |
|---|---|
| Demo 1 | Call a cloud LLM (OpenAI/Azure GPT) with a prompt + narrative; get structured JSON output |
| Demo 2 | Generate synthetic training data, train a LoRA adapter on Apple Silicon (MLX), and evaluate |
| Demo 3 | Run the same prompt as Demo 1 against the locally fine-tuned MLX model (no cloud) |
| Demo 4 | Same task as Demo 2, but train on Azure Machine Learning with PyTorch, Hugging Face, and PEFT |
| Demo 5 | Vision distillation: an Azure OpenAI vision teacher labels public Oxford-IIIT cat/dog images to produce chat-format training data for a Qwen2-VL student |
- Python 3.10+
- Demo 1: OpenAI API key (or Azure OpenAI)
- Demo 2 & 3: Apple Silicon Mac (MLX runs on M1/M2/M3)
- Demo 4: Azure subscription with an ML workspace and GPU compute
-
Clone and enter the demos directory
cd demos -
Create
.envfrom the examplecp .env.example .env
Edit
.envand set at leastOPENAI_API_KEYfor Demo 1. For Demo 4, add Azure credentials. -
Install dependencies
pip install -r requirements.txt
For Demo 2 & 3 (MLX), also install:
pip install "mlx-lm[train]"
What it shows: A baseline extraction using a cloud LLM. You send a prompt template plus a narrative to OpenAI (or Azure OpenAI) and receive structured JSON.
Run:
python demo1/demo1.pyHow it works:
- Loads
shared/demo1_prompt.txt(the extraction prompt and schema) - Injects
shared/demo1_narrative.txtinto the{narrative_text}placeholder - Sends the combined prompt to the configured model (
OPENAI_MODEL, defaultgpt-4o-mini) - Prints the model’s JSON response
Environment: OPENAI_API_KEY required. For Azure: set OPENAI_BASE_URL, OPENAI_API_VERSION, and optionally OPENAI_MODEL.
What it shows: The full pipeline for creating a locally fine-tuned model on Apple Silicon:
- Generate synthetic training data (narratives + correct JSON)
- Split into train/valid/test
- Train LoRA adapters on Qwen2.5-7B (4-bit quantized) via MLX
- Evaluate on the test set
Pipeline:
cd demo2
# 1. Generate 1000 synthetic prompt/completion pairs
python generate_training_data.py
# 2. Validate the data (optional)
python validate_training_data.py
# 3. Split into train (800) / valid (100) / test (100)
python split_training_data.py
# 4. Train LoRA adapters + evaluate
python train_and_eval_student.pyQuick demo (~2 min): Use train_and_eval_student_quick.py for a shorter run. It uses a separate adapter directory (adapters_qwen25_7b_damage_demo) so it does not overwrite the full adapters.
Output: LoRA adapters saved to adapters_qwen25_7b_damage/ (or adapters_qwen25_7b_damage_demo/ for the quick demo).
What it shows: The same extraction task as Demo 1, but using the locally fine-tuned MLX model instead of a cloud API. No network calls after the model is loaded.
Run:
python demo3/run.pyPrerequisites: Run Demo 2 first to produce demo2/adapters_qwen25_7b_damage/. Demo 3 loads Qwen2.5-7B with these adapters and runs inference on shared/demo1_narrative.txt.
Output: Pretty-printed JSON matching the Demo 1 schema.
What it shows: The same task as Demo 2 (insurance damage extraction) but training on Azure Machine Learning with PyTorch, Hugging Face Transformers, and PEFT LoRA.
Quick start:
cd demo4
python train.py --data-dir ../demo2/data --output-dir ./outputs --demoRequires Demo 2's demo2/data/ (run Demo 2 first). For full training and Azure ML job submission, an Azure subscription with a GPU compute cluster is required.
See demo4/README.md for local training, Azure ML job submission, env vars, and model variants.
What it shows: An Azure OpenAI vision model acts as a teacher, labelling public Oxford-IIIT cat/dog images and producing chat-format training data for fine-tuning an open-source vision-language model (e.g. Qwen2-VL) as a student. Unlike Demos 1-4 (text extraction), this demo is multimodal.
Quick start:
cd demo5
python create_training_data.py --dry-run --max-per-class cat:5,dog:5The first run downloads the Oxford-IIIT Pet Dataset (~800 MB, CC BY-SA 4.0).
See demo5/README.md for the full pipeline, CLI reference, environment variables, output schema, and metadata field reference.
| File | Purpose |
|---|---|
shared/demo1_prompt.txt |
Extraction prompt template with schema; {narrative_text} is replaced at runtime |
shared/demo1_narrative.txt |
Sample narrative used by Demo 1 and Demo 3 |
.env |
API keys and config (copy from .env.example) |
demos/
├── README.md
├── .env.example
├── .gitignore
├── requirements.txt
├── shared/
│ ├── demo1_prompt.txt
│ └── demo1_narrative.txt
├── demo1/
│ └── demo1.py # GPT extraction
├── demo2/
│ ├── generate_training_data.py
│ ├── validate_training_data.py
│ ├── split_training_data.py
│ ├── train_and_eval_student.py
│ ├── train_and_eval_student_quick.py
│ ├── training_data.jsonl # generated
│ ├── data/ # train.jsonl, valid.jsonl, test.jsonl
│ └── adapters_qwen25_7b_damage/ # LoRA adapters (generated)
├── demo3/
│ └── run.py # Local MLX inference
├── demo4/
│ ├── train.py # PyTorch/PEFT LoRA training
│ ├── submit_job.py # Azure ML job submission
│ ├── environment.yml # Conda env for Azure ML
│ └── README.md # Demo 4 details
└── demo5/
├── create_training_data.py # Teacher-labelling pipeline (cats vs dogs)
├── cat_dog_prompt.txt # Teacher prompt
├── README.md # Demo 5 details
└── dataset/ # Generated dataset + Oxford-IIIT cache (ignored)
Demos 1-4 share the same JSON schema for insurance damage extraction (Demo 5 uses its own schema; see demo5/README.md):
{
"damage": {
"broken_plaster": boolean,
"mould": boolean,
"floor_water_damage": boolean,
"electrical_damage": boolean,
"ceiling_damage": boolean,
"structural_crack": boolean,
"carpet_damage": boolean,
"cabinet_damage": boolean,
"appliance_damage": boolean,
"odor_present": boolean
},
"overall_severity": "low" | "moderate" | "high"
}