Skip to content

Abhaykumar9035/SLMOrchestrator

Repository files navigation

On-Device LLM Orchestrator System

This repository provides a modular, Python-based LLM orchestration system that runs completely locally. The orchestrator maintains a single instance of an LLM base model (microsoft/phi-2) in memory, and dynamically routes specific queries to specialized LoRA adapters loaded from disk.

⚡ Key Features

  • 🧠 Single Base Model, Multiple Skills: Runs a single instance of microsoft/phi-2 and augments it with task-specific LoRA adapters.

  • 🔀 Dynamic Query Routing: Uses an intent detection layer to classify queries and route them intelligently.

  • 🧩 Plug-and-Play Adapters: Easily add, remove, or swap LoRA adapters without modifying core logic.

  • ⚙️ PEFT-Powered Efficiency: Lightweight fine-tuning using the peft library for minimal resource overhead.

  • 💻 Fully Local Execution: No external API calls — complete privacy and offline usability.

  • 📦 Modular Architecture: Clean separation of components: Intent Detector, Router, Adapter Manager, and Model Layer.

    🧪 Use Cases

  • Task-specific AI assistants (math, summarization, coding, etc.)

  • Offline AI systems for privacy-sensitive environments

  • Research on modular LLM architectures

  • Efficient deployment of multi-capability models on limited hardware

    🛠️ Tech Stack

  • Transformers – Base model handling

  • PEFT (LoRA) – Adapter management

  • PyTorch – Model execution

  • Python CLI – Interface layer

Architecture

The system uses a highly componentized setup relying on Transformers and PEFT for dynamic adapter management.

Architecture Diagram

+-----------------------------------------------------------+
|                      User Interface                       |
|                       (CLI main.py)                       |
+-----------------------------+-----------------------------+
                              |
+-----------------------------v-----------------------------+
|                     Intent Detector                       |
|   (Classifies query -> base, adapter_load, preloaded)     |
+-----------------------------+-----------------------------+
                              |
+-----------------------------v-----------------------------+
|                          Router                           |
|      (Orchestrates actions based on detected intent)      |
+-----------------------------+-----------------------------+
                              |
+-----------------------------v-----------------------------+
|                      Adapter Manager                      |
|      (Manages active LoRA adapters using peft library)    |
+--------------+------------------------------+-------------+
               |                              |
+--------------v---------------+ +------------v-------------+
|       Adapter Storage        | |    Base Model (Phi-2)    |
|   (LoRA weights on disk)     | |    (Loaded in VRAM/RAM)  |
+------------------------------+ +--------------------------+

Flow Diagram

User Query
   ↓
Intent Detection (Rule-based or Phi-2 Fallback)
   ↓
Router (Decides logic based on intent)
   ↓
Adapter Manager (Loads or switches PEFT adapters on disk)
   ↓
Base Model (Phi-2 + active adapter)
   ↓
Output Response

Setup & Installation

  1. Install Requirements: Ensure you have Python 3.9+ along with an optional NVIDIA GPU. Run:

    pip install -r requirements.txt
  2. Configure Adapters: The config/adapters.json defines your adapters. This file will be generated automatically on your first run:

    {
      "math": {
        "path": "./adapters/math_adapter",
        "description": "Handles arithmetic and algebra problems"
      },
      "summary": {
        "path": "./adapters/summary_adapter",
        "description": "Summarizes long text"
      }
    }
  3. Running the Application: Run the CLI orchestrator natively:

    python main.py

🔧 Extending the System

  • Add new adapters by updating adapters.json
  • Train custom LoRA adapters for new domains
  • Improve intent detection using ML-based classifiers

💡 Why This Matters

Traditional LLM systems scale by increasing model size. This project explores a more efficient paradigm:

"One base model, many specialized behaviors."

By combining routing + adapters, you get scalable intelligence without scaling compute.

Creating Dummy Adapters (Instructions)

This project dynamically loads LoRA adapters from paths mapped in your configuration file. If you do not have trained adapters yet, you can create dummy PEFT weights to test the loading workflow:

  1. Create adapter directories locally:
    mkdir -p adapters/math_adapter
    mkdir -p adapters/summary_adapter
  2. You can use the peft library in a brief script to extract dummy adapters for your base model and save them into these folders:
    from transformers import AutoModelForCausalLM
    from peft import get_peft_model, LoraConfig
    
    model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", trust_remote_code=True)
    config = LoraConfig(r=8, lora_alpha=16, target_modules=["Wqkv", "out_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM")
    
    peft_model = get_peft_model(model, config)
    peft_model.save_pretrained("./adapters/math_adapter")
    peft_model.save_pretrained("./adapters/summary_adapter")

About

A modular, fully local LLM orchestration framework that dynamically routes user queries to specialized LoRA adapters while maintaining a single base model instance in memory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages