On-Device LLM Orchestrator System

This repository provides a modular, Python-based LLM orchestration system that runs completely locally. The orchestrator maintains a single instance of an LLM base model (microsoft/phi-2) in memory, and dynamically routes specific queries to specialized LoRA adapters loaded from disk.

⚡ Key Features

🧠 Single Base Model, Multiple Skills: Runs a single instance of microsoft/phi-2 and augments it with task-specific LoRA adapters.
🔀 Dynamic Query Routing: Uses an intent detection layer to classify queries and route them intelligently.
🧩 Plug-and-Play Adapters: Easily add, remove, or swap LoRA adapters without modifying core logic.
⚙️ PEFT-Powered Efficiency: Lightweight fine-tuning using the peft library for minimal resource overhead.
💻 Fully Local Execution: No external API calls — complete privacy and offline usability.
📦 Modular Architecture: Clean separation of components: Intent Detector, Router, Adapter Manager, and Model Layer.

🧪 Use Cases
Task-specific AI assistants (math, summarization, coding, etc.)
Offline AI systems for privacy-sensitive environments
Research on modular LLM architectures
Efficient deployment of multi-capability models on limited hardware

🛠️ Tech Stack
Transformers – Base model handling
PEFT (LoRA) – Adapter management
PyTorch – Model execution
Python CLI – Interface layer

Architecture

The system uses a highly componentized setup relying on Transformers and PEFT for dynamic adapter management.

Architecture Diagram

+-----------------------------------------------------------+
|                      User Interface                       |
|                       (CLI main.py)                       |
+-----------------------------+-----------------------------+
                              |
+-----------------------------v-----------------------------+
|                     Intent Detector                       |
|   (Classifies query -> base, adapter_load, preloaded)     |
+-----------------------------+-----------------------------+
                              |
+-----------------------------v-----------------------------+
|                          Router                           |
|      (Orchestrates actions based on detected intent)      |
+-----------------------------+-----------------------------+
                              |
+-----------------------------v-----------------------------+
|                      Adapter Manager                      |
|      (Manages active LoRA adapters using peft library)    |
+--------------+------------------------------+-------------+
               |                              |
+--------------v---------------+ +------------v-------------+
|       Adapter Storage        | |    Base Model (Phi-2)    |
|   (LoRA weights on disk)     | |    (Loaded in VRAM/RAM)  |
+------------------------------+ +--------------------------+

Flow Diagram

User Query
   ↓
Intent Detection (Rule-based or Phi-2 Fallback)
   ↓
Router (Decides logic based on intent)
   ↓
Adapter Manager (Loads or switches PEFT adapters on disk)
   ↓
Base Model (Phi-2 + active adapter)
   ↓
Output Response

Setup & Installation

Install Requirements: Ensure you have Python 3.9+ along with an optional NVIDIA GPU. Run:
```
pip install -r requirements.txt
```

Configure Adapters: The config/adapters.json defines your adapters. This file will be generated automatically on your first run:

{
  "math": {
    "path": "./adapters/math_adapter",
    "description": "Handles arithmetic and algebra problems"
  },
  "summary": {
    "path": "./adapters/summary_adapter",
    "description": "Summarizes long text"
  }
}

Running the Application: Run the CLI orchestrator natively:
```
python main.py
```

🔧 Extending the System

Add new adapters by updating adapters.json
Train custom LoRA adapters for new domains
Improve intent detection using ML-based classifiers

💡 Why This Matters

Traditional LLM systems scale by increasing model size. This project explores a more efficient paradigm:

"One base model, many specialized behaviors."

By combining routing + adapters, you get scalable intelligence without scaling compute.

Creating Dummy Adapters (Instructions)

This project dynamically loads LoRA adapters from paths mapped in your configuration file. If you do not have trained adapters yet, you can create dummy PEFT weights to test the loading workflow:

Create adapter directories locally:

mkdir -p adapters/math_adapter
mkdir -p adapters/summary_adapter

You can use the peft library in a brief script to extract dummy adapters for your base model and save them into these folders:

from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", trust_remote_code=True)
config = LoraConfig(r=8, lora_alpha=16, target_modules=["Wqkv", "out_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM")

peft_model = get_peft_model(model, config)
peft_model.save_pretrained("./adapters/math_adapter")
peft_model.save_pretrained("./adapters/summary_adapter")

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
config		config
.gitignore		.gitignore
README.md		README.md
adapter_manager.py		adapter_manager.py
inference.py		inference.py
intent_detector.py		intent_detector.py
main.py		main.py
model_loader.py		model_loader.py
requirements.txt		requirements.txt
router.py		router.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On-Device LLM Orchestrator System

⚡ Key Features

🧪 Use Cases

🛠️ Tech Stack

Architecture

Architecture Diagram

Flow Diagram

Setup & Installation

🔧 Extending the System

💡 Why This Matters

Creating Dummy Adapters (Instructions)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

On-Device LLM Orchestrator System

⚡ Key Features

🧪 Use Cases

🛠️ Tech Stack

Architecture

Architecture Diagram

Flow Diagram

Setup & Installation

🔧 Extending the System

💡 Why This Matters

Creating Dummy Adapters (Instructions)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages