This repository provides a modular, Python-based LLM orchestration system that runs completely locally. The orchestrator maintains a single instance of an LLM base model (microsoft/phi-2) in memory, and dynamically routes specific queries to specialized LoRA adapters loaded from disk.
-
🧠 Single Base Model, Multiple Skills: Runs a single instance of microsoft/phi-2 and augments it with task-specific LoRA adapters.
-
🔀 Dynamic Query Routing: Uses an intent detection layer to classify queries and route them intelligently.
-
🧩 Plug-and-Play Adapters: Easily add, remove, or swap LoRA adapters without modifying core logic.
-
⚙️ PEFT-Powered Efficiency: Lightweight fine-tuning using the peft library for minimal resource overhead.
-
💻 Fully Local Execution: No external API calls — complete privacy and offline usability.
-
📦 Modular Architecture: Clean separation of components: Intent Detector, Router, Adapter Manager, and Model Layer.
-
Task-specific AI assistants (math, summarization, coding, etc.)
-
Offline AI systems for privacy-sensitive environments
-
Research on modular LLM architectures
-
Efficient deployment of multi-capability models on limited hardware
-
Transformers – Base model handling
-
PEFT (LoRA) – Adapter management
-
PyTorch – Model execution
-
Python CLI – Interface layer
The system uses a highly componentized setup relying on Transformers and PEFT for dynamic adapter management.
+-----------------------------------------------------------+
| User Interface |
| (CLI main.py) |
+-----------------------------+-----------------------------+
|
+-----------------------------v-----------------------------+
| Intent Detector |
| (Classifies query -> base, adapter_load, preloaded) |
+-----------------------------+-----------------------------+
|
+-----------------------------v-----------------------------+
| Router |
| (Orchestrates actions based on detected intent) |
+-----------------------------+-----------------------------+
|
+-----------------------------v-----------------------------+
| Adapter Manager |
| (Manages active LoRA adapters using peft library) |
+--------------+------------------------------+-------------+
| |
+--------------v---------------+ +------------v-------------+
| Adapter Storage | | Base Model (Phi-2) |
| (LoRA weights on disk) | | (Loaded in VRAM/RAM) |
+------------------------------+ +--------------------------+
User Query
↓
Intent Detection (Rule-based or Phi-2 Fallback)
↓
Router (Decides logic based on intent)
↓
Adapter Manager (Loads or switches PEFT adapters on disk)
↓
Base Model (Phi-2 + active adapter)
↓
Output Response
-
Install Requirements: Ensure you have Python 3.9+ along with an optional NVIDIA GPU. Run:
pip install -r requirements.txt
-
Configure Adapters: The
config/adapters.jsondefines your adapters. This file will be generated automatically on your first run:{ "math": { "path": "./adapters/math_adapter", "description": "Handles arithmetic and algebra problems" }, "summary": { "path": "./adapters/summary_adapter", "description": "Summarizes long text" } } -
Running the Application: Run the CLI orchestrator natively:
python main.py
- Add new adapters by updating adapters.json
- Train custom LoRA adapters for new domains
- Improve intent detection using ML-based classifiers
Traditional LLM systems scale by increasing model size. This project explores a more efficient paradigm:
"One base model, many specialized behaviors."
By combining routing + adapters, you get scalable intelligence without scaling compute.
This project dynamically loads LoRA adapters from paths mapped in your configuration file. If you do not have trained adapters yet, you can create dummy PEFT weights to test the loading workflow:
- Create adapter directories locally:
mkdir -p adapters/math_adapter mkdir -p adapters/summary_adapter
- You can use the
peftlibrary in a brief script to extract dummy adapters for your base model and save them into these folders:from transformers import AutoModelForCausalLM from peft import get_peft_model, LoraConfig model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", trust_remote_code=True) config = LoraConfig(r=8, lora_alpha=16, target_modules=["Wqkv", "out_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM") peft_model = get_peft_model(model, config) peft_model.save_pretrained("./adapters/math_adapter") peft_model.save_pretrained("./adapters/summary_adapter")