LLM Distillation Demos

A set of demos illustrating the journey from a cloud LLM (GPT) to a locally fine-tuned student model. The task throughout is insurance damage extraction: given a narrative description of property damage, extract structured JSON with damage indicators and severity.

Overview

Demo	What it shows
Demo 1	Call a cloud LLM (OpenAI/Azure GPT) with a prompt + narrative; get structured JSON output
Demo 2	Generate synthetic training data, train a LoRA adapter on Apple Silicon (MLX), and evaluate
Demo 3	Run the same prompt as Demo 1 against the locally fine-tuned MLX model (no cloud)
Demo 4	Same task as Demo 2, but train on Azure Machine Learning with PyTorch, Hugging Face, and PEFT
Demo 5	Vision distillation: an Azure OpenAI vision teacher labels public Oxford-IIIT cat/dog images to produce chat-format training data for a Qwen2-VL student

Prerequisites

Python 3.10+
Demo 1: OpenAI API key (or Azure OpenAI)
Demo 2 & 3: Apple Silicon Mac (MLX runs on M1/M2/M3)
Demo 4: Azure subscription with an ML workspace and GPU compute

Setup

Clone and enter the demos directory
```
cd demos
```
Create .env from the example
```
cp .env.example .env
```
Edit .env and set at least OPENAI_API_KEY for Demo 1. For Demo 4, add Azure credentials.

Install dependencies

pip install -r requirements.txt

For Demo 2 & 3 (MLX), also install:

pip install "mlx-lm[train]"

Demo 1: Cloud LLM (GPT) Extraction

What it shows: A baseline extraction using a cloud LLM. You send a prompt template plus a narrative to OpenAI (or Azure OpenAI) and receive structured JSON.

Run:

python demo1/demo1.py

How it works:

Loads shared/demo1_prompt.txt (the extraction prompt and schema)
Injects shared/demo1_narrative.txt into the {narrative_text} placeholder
Sends the combined prompt to the configured model (OPENAI_MODEL, default gpt-4o-mini)
Prints the model’s JSON response

Environment: OPENAI_API_KEY required. For Azure: set OPENAI_BASE_URL, OPENAI_API_VERSION, and optionally OPENAI_MODEL.

Demo 2: Training Data + MLX LoRA Fine-Tuning

What it shows: The full pipeline for creating a locally fine-tuned model on Apple Silicon:

Generate synthetic training data (narratives + correct JSON)
Split into train/valid/test
Train LoRA adapters on Qwen2.5-7B (4-bit quantized) via MLX
Evaluate on the test set

Pipeline:

cd demo2

# 1. Generate 1000 synthetic prompt/completion pairs
python generate_training_data.py

# 2. Validate the data (optional)
python validate_training_data.py

# 3. Split into train (800) / valid (100) / test (100)
python split_training_data.py

# 4. Train LoRA adapters + evaluate
python train_and_eval_student.py

Quick demo (~2 min): Use train_and_eval_student_quick.py for a shorter run. It uses a separate adapter directory (adapters_qwen25_7b_damage_demo) so it does not overwrite the full adapters.

Output: LoRA adapters saved to adapters_qwen25_7b_damage/ (or adapters_qwen25_7b_damage_demo/ for the quick demo).

Demo 3: Local MLX Inference

What it shows: The same extraction task as Demo 1, but using the locally fine-tuned MLX model instead of a cloud API. No network calls after the model is loaded.

Run:

python demo3/run.py

Prerequisites: Run Demo 2 first to produce demo2/adapters_qwen25_7b_damage/. Demo 3 loads Qwen2.5-7B with these adapters and runs inference on shared/demo1_narrative.txt.

Output: Pretty-printed JSON matching the Demo 1 schema.

Demo 4: Azure ML LoRA Training

What it shows: The same task as Demo 2 (insurance damage extraction) but training on Azure Machine Learning with PyTorch, Hugging Face Transformers, and PEFT LoRA.

Quick start:

cd demo4
python train.py --data-dir ../demo2/data --output-dir ./outputs --demo

Requires Demo 2's demo2/data/ (run Demo 2 first). For full training and Azure ML job submission, an Azure subscription with a GPU compute cluster is required.

See demo4/README.md for local training, Azure ML job submission, env vars, and model variants.

Demo 5: Vision Distillation (Cats vs Dogs)

What it shows: An Azure OpenAI vision model acts as a teacher, labelling public Oxford-IIIT cat/dog images and producing chat-format training data for fine-tuning an open-source vision-language model (e.g. Qwen2-VL) as a student. Unlike Demos 1-4 (text extraction), this demo is multimodal.

Quick start:

cd demo5
python create_training_data.py --dry-run --max-per-class cat:5,dog:5

The first run downloads the Oxford-IIIT Pet Dataset (~800 MB, CC BY-SA 4.0).

See demo5/README.md for the full pipeline, CLI reference, environment variables, output schema, and metadata field reference.

Shared Resources

File	Purpose
`shared/demo1_prompt.txt`	Extraction prompt template with schema; `{narrative_text}` is replaced at runtime
`shared/demo1_narrative.txt`	Sample narrative used by Demo 1 and Demo 3
`.env`	API keys and config (copy from `.env.example`)

Project Structure

demos/
├── README.md
├── .env.example
├── .gitignore
├── requirements.txt
├── shared/
│   ├── demo1_prompt.txt
│   └── demo1_narrative.txt
├── demo1/
│   └── demo1.py              # GPT extraction
├── demo2/
│   ├── generate_training_data.py
│   ├── validate_training_data.py
│   ├── split_training_data.py
│   ├── train_and_eval_student.py
│   ├── train_and_eval_student_quick.py
│   ├── training_data.jsonl    # generated
│   ├── data/                  # train.jsonl, valid.jsonl, test.jsonl
│   └── adapters_qwen25_7b_damage/   # LoRA adapters (generated)
├── demo3/
│   └── run.py                 # Local MLX inference
├── demo4/
│   ├── train.py               # PyTorch/PEFT LoRA training
│   ├── submit_job.py          # Azure ML job submission
│   ├── environment.yml       # Conda env for Azure ML
│   └── README.md              # Demo 4 details
└── demo5/
    ├── create_training_data.py # Teacher-labelling pipeline (cats vs dogs)
    ├── cat_dog_prompt.txt      # Teacher prompt
    ├── README.md               # Demo 5 details
    └── dataset/                # Generated dataset + Oxford-IIIT cache (ignored)

Schema (Demos 1-4 — Damage Extraction)

Demos 1-4 share the same JSON schema for insurance damage extraction (Demo 5 uses its own schema; see demo5/README.md):

{
  "damage": {
    "broken_plaster": boolean,
    "mould": boolean,
    "floor_water_damage": boolean,
    "electrical_damage": boolean,
    "ceiling_damage": boolean,
    "structural_crack": boolean,
    "carpet_damage": boolean,
    "cabinet_damage": boolean,
    "appliance_damage": boolean,
    "odor_present": boolean
  },
  "overall_severity": "low" | "moderate" | "high"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Distillation Demos

Overview

Prerequisites

Setup

Demo 1: Cloud LLM (GPT) Extraction

Demo 2: Training Data + MLX LoRA Fine-Tuning

Demo 3: Local MLX Inference

Demo 4: Azure ML LoRA Training

Demo 5: Vision Distillation (Cats vs Dogs)

Shared Resources

Project Structure

Schema (Demos 1-4 — Damage Extraction)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
demo1		demo1
demo2		demo2
demo3		demo3
demo4		demo4
demo5		demo5
shared		shared
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LLM Distillation Demos

Overview

Prerequisites

Setup

Demo 1: Cloud LLM (GPT) Extraction

Demo 2: Training Data + MLX LoRA Fine-Tuning

Demo 3: Local MLX Inference

Demo 4: Azure ML LoRA Training

Demo 5: Vision Distillation (Cats vs Dogs)

Shared Resources

Project Structure

Schema (Demos 1-4 — Damage Extraction)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages