Intent Analysis β’ Tiered Model Selection β’ Fail-Open Design
Design | Setup | Comparison | Integrations | δΈζζζ‘£
OpenSage is a specialized routing decision engine engineered to optimize Large Language Model (LLM) workflows. Unlike traditional gateways that simply pass through requests, OpenSage analyzes the semantic intent of every prompt before execution.
It functions as a cognitive pre-processor that determines the optimal model for a given task, balancing cost, latency, and capability. By offloading routing logic to a local, lightweight oracle, OpenSage acts as an intelligent switchboard for your AI agent infrastructure.
Core Capabilities:
- Oracle Engine: Utilizes a local Small Language Model (qwen2.5:0.5b) to classify prompt complexity (1-10) and domain.
- Tiered Routing: Maps analysis results to three distinct performance tiers: Reflex, Standard, and Deep.
- Fail-Open Architecture: Ensures zero downtime by defaulting to a standard model if the local oracle is unresponsive.
The current release (v1.0.0) includes the core routing logic, local oracle integration, and a terminal user interface (TUI) for real-time monitoring.
| Component | Status | Description |
|---|---|---|
| Oracle Engine | β Ready | Local classification with strict 500ms timeout. |
| Tier Decision | β Ready | Reflex (1-3), Standard (4-7), Deep (8-10). |
| Provider Parsing | β Ready | Intelligent recursive splitting of provider strings. |
| Verification | π§ Planned | Automated output quality checks and retry logic. |
OpenSage operates a multi-stage pipeline designed to minimize latency while maximizing routing accuracy. The following diagram illustrates the critical path from user input to final response.
Workflow Safety: The system is designed to be "fail-open". If the local Ollama instance is unreachable or times out, the router automatically defaults to the Standard Tier, ensuring that the agent pipeline is never blocked by a routing failure.
The primary economic driver for OpenSage is the "80/20 rule" of LLM traffic: a significant portion of user interactions do not require state-of-the-art model capabilities.
By dynamically routing simple queries to free or low-cost models, organizations can achieve substantial cost reductions without compromising user experience on complex tasks.
| Request Type | Typical Volume | Traditional Cost Basis | OpenSage Optimized Cost |
|---|---|---|---|
| Conversational / Chit-chat | ~30% | $0.03 / req (GPT-4) | $0.00 (Local/Groq) |
| Standard Logic / Coding | ~50% | $0.03 / req (GPT-4) | $0.0002 (Llama 3) |
| Deep Reasoning | ~20% | $0.03 / req (GPT-4) | $0.03 (Claude 3.5) |
projected Savings: Up to 80% reduction in API costs for mixed-workload institutional deployments.
OpenSage requires a local inference engine to serve the Oracle model. We officially support Ollama for this purpose.
- Install Ollama: Follow the instructions at ollama.com.
- Pull the Oracle Model:
Note: The
ollama pull qwen2.5:0.5b
qwen2.5:0.5bmodel is chosen for its exceptional balance of speed and classification accuracy.
Clone the repository and install dependencies:
git clone https://github.com/Vleonone/Opensage.git
cd Opensage
npm installOpenSage is designed to be embedded directly into your agent's decision loop.
import { CognitiveRouter } from "./src/router.js";
// Initialize the singleton router
const router = CognitiveRouter.getInstance();
// Route a prompt
const result = await router.route("Fix the race condition in this React hook");
// The result object contains the optimal provider and model
console.log(result);
// Output:
// {
// provider: "openrouter",
// model: "groq/llama-3-8b-8192",
// tier: "reflex",
// judgment: {
// complexity: 3,
// domain: "coding"
// }
// }For development and monitoring, OpenSage includes a high-fidelity terminal user interface.
npm run guiTo compile the TypeScript source for production deployment:
npm run build
node dist/tui_demo.jsIn the AeonsagePro environment, OpenSage acts as a middleware interceptor in src/commands/agent.ts. It evaluates the user message and overrides the default model configuration before the session is initialized.
OpenSage is framework-agnostic. It can be integrated into any system that facilitates dynamic model selection.
// Generic Integration Pattern
async function handleRequest(prompt: string) {
const decision = await router.route(prompt);
// Configure your LLM client with the decision
const llmClient = new LLMClient({
provider: decision.provider,
model: decision.model
});
return await llmClient.complete(prompt);
}The mapping between performance tiers and specific model IDs is defined in src/routing/cascading.ts. This can be customized to match your available API keys and enterprise agreements.
export const TIER_MODEL_MAP = {
reflex: ["openrouter:groq/llama-3-8b-8192"],
standard: ["gpt-4o-mini"],
deep: ["claude-3-5-sonnet-20240620"],
};- Verification Layer: Implementing the "Self-Correction" loop for automatic tier escalation upon failure.
- Plugin Architecture: Allowing external modules to inject custom routing logic.
- Telemetry: Built-in token accounting and real-time cost visualization.
- Python SDK: Native Python port for integration with PyTorch/TensorFlow pipelines.
The codebase is organized to separate core logic, local inference handling, and documentation.
src/router.ts: Main entry point and singleton manager.src/oracle/: Contains the interface to the local Ollama instance.src/routing/: Implements the decision logic and tier mapping.docs/: Detailed technical documentation and architectural decision records.examples/: Reference implementations and demo scripts.
See CONTRIBUTING.md. We welcome pull requests for:
- New provider adapters (Google Gemini, Azure, Mistral)
- Oracle model benchmarks (Phi-3, Gemma-2b)
- Framework integration adapters (LangChainJS, Vercel AI SDK)
OpenSage is the open-source routing core of AeonsagePro. It relies on:
- Ollama β Local inference engine
- OpenRouter β Unified model marketplace
- Groq β Sub-second inference hardware
MIT - AeonSage Team

