Add blog post about Tool RAG dynamic tool discovery

kashifpk · kashifpk · commit 5d73f86c33c3 · 2026-01-25T00:54:38.000+05:00
diff --git a/blog/index.md b/blog/index.md
@@ -6,6 +6,13 @@ Technical articles about building efficient AI agents with Agentic Forge.
 
 <div class="blog-list">
 
+### [Tool RAG: Dynamic Tool Discovery for AI Agents](/blog/tool-rag-dynamic-discovery)
+*January 2026*
+
+How we implemented Tool RAG in Agentic Forge—agents discover tools on-demand through semantic search instead of loading all tools upfront, reducing context usage and improving tool selection accuracy.
+
+---
+
 ### [TOON Format: Cutting Tokens Without Cutting Information](/blog/toon-format-support)
 *January 2026*
 
diff --git a/blog/tool-rag-dynamic-discovery.md b/blog/tool-rag-dynamic-discovery.md
@@ -0,0 +1,102 @@
+# Tool RAG: Dynamic Tool Discovery for AI Agents
+
+*January 2026*
+
+As tool libraries grow, AI agents face a problem: loading every tool definition into context wastes tokens and confuses models. We implemented Tool RAG in Agentic Forge to solve this—agents now discover tools on-demand through semantic search instead of loading them all upfront.
+
+## The Problem: Tool Overload
+
+Traditional tool-calling agents receive all available tools in their system prompt. This works fine with 5-10 tools, but becomes problematic as the tool library grows:
+
+- **Context consumed by definitions** — Each tool's name, description, and parameter schema takes tokens. With 20+ tools, this can easily exceed 3000 tokens before the conversation even starts.
+- **Model confusion** — Research shows LLMs perform worse when presented with many similar tools. They may call the wrong one or hallucinate parameters.
+- **Doesn't scale** — An agent with access to 100 tools would spend most of its context on tool definitions, leaving little room for conversation history.
+
+## What is Tool RAG?
+
+Tool RAG applies Retrieval-Augmented Generation to tool selection. Instead of loading all tools upfront, the agent starts with a single meta-tool called `search_tools`. When the agent needs a capability, it describes what it's looking for, and semantic search returns only the relevant tools.
+
+This approach comes from recent research. The [ToolRAG paper](https://arxiv.org/html/2509.20386) demonstrated 3x improvement in tool selection accuracy and ~50% prompt token reduction compared to loading all tools.
+
+## Architecture
+
+The difference is straightforward:
+
+![Traditional vs Tool RAG Architecture](/diagrams/tool-rag-architecture.svg)
+
+**Traditional**: The LLM receives all tool definitions in its context. Every request pays the token cost for tools that won't be used.
+
+**Tool RAG**: The LLM receives only the `search_tools` meta-tool. When it needs a capability, it searches for relevant tools, which are then loaded into context for the next turn.
+
+## How It Works
+
+Here's the flow when an agent handles a request like "What's the weather in London?":
+
+![Tool RAG Flow](/diagrams/tool-rag-flow.svg)
+
+1. **Initial context** — The agent sees only `search_tools` in its available tools
+2. **Tool search** — The agent calls `search_tools` with a query like "get current weather for a city"
+3. **Semantic matching** — The query is embedded and compared against tool description embeddings stored in pgvector
+4. **Tools returned** — All tools above the similarity threshold are returned (default: 0.5)
+5. **Second turn** — The discovered tools are available for the agent to call
+6. **Task completion** — The agent calls the appropriate tool with the user's parameters
+
+The extra round-trip for tool discovery is handled automatically with auto-continue, so the user experience is seamless.
+
+## Implementation in Forge Armory
+
+Armory exposes Tool RAG through a mode parameter on the MCP endpoint:
+
+| Endpoint | Behavior |
+|----------|----------|
+| `/mcp` | Standard mode — returns all tools |
+| `/mcp?mode=rag` | RAG mode — returns only `search_tools` |
+
+When RAG mode is enabled, the `search_tools` meta-tool performs semantic search against the tool registry. The similarity threshold is configurable through the admin UI.
+
+![Armory Tool RAG Settings](/screens/forge-armory-tool-rag-settings.png)
+
+The UI also displays a capability manifest—a system prompt template that includes the <code v-pre>{{TOOL_LIST}}</code> placeholder. This gets populated with discovered tools after each search, letting you customize how tool capabilities are presented to the model.
+
+## Using Tool RAG
+
+In forge-ui, Tool RAG can be enabled per-conversation with a toggle in the chat settings:
+
+![forge-ui with Tool RAG](/screens/forge-ui-tool-rag.png)
+
+When enabled, the conversation starts with minimal context. The agent discovers tools as needed based on user requests.
+
+## When to Use Tool RAG
+
+Tool RAG makes the most sense when:
+
+- **Large tool libraries** — 10+ tools where loading all definitions is costly
+- **Varied task types** — Different conversations need different tool subsets
+- **Cost-sensitive applications** — Token savings compound across many requests
+- **Context-limited scenarios** — Smaller models or long conversations where every token matters
+
+It's less beneficial for:
+
+- **Small fixed tool sets** — With 3-5 tools, the overhead of search may exceed the savings
+- **Latency-critical applications** — The extra round-trip adds some latency (though auto-continue minimizes this)
+- **Highly specialized agents** — If every conversation uses the same tools, dynamic discovery adds no value
+
+## What's Next
+
+In an upcoming post, we'll share concrete benchmarks comparing token usage across three configurations: standard tool calling, TOON format optimization, and TOON + Tool RAG combined. Preliminary results show up to 58% token reduction when both optimizations are applied.
+
+## Source Code
+
+- [forge-armory](https://github.com/agentic-forge/forge-armory) — MCP gateway with Tool RAG support
+- [forge-orchestrator](https://github.com/agentic-forge/forge-orchestrator) — Agent loop with RAG mode integration
+- [forge-ui](https://github.com/agentic-forge/forge-ui) — Chat interface with RAG toggle
+
+## References
+
+- [ToolRAG: Enhancing Large Language Model Tool Interaction](https://arxiv.org/html/2509.20386) — The research paper that inspired this implementation
+- [Model Context Protocol](https://modelcontextprotocol.io/) — The protocol standard for tool integration
+- [pgvector](https://github.com/pgvector/pgvector) — Vector similarity search for PostgreSQL
+
+---
+
+*This is part of a series on building [Agentic Forge](https://agentic-forge.github.io).*
diff --git a/public/diagrams/tool-rag-architecture.svg b/public/diagrams/tool-rag-architecture.svg
@@ -0,0 +1,102 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 800 380">
+  <defs>
+    <marker id="arch-arr" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
+      <polygon points="0 0, 10 3.5, 0 7" fill="#8b5cf6"/>
+    </marker>
+    <marker id="arch-arr-green" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
+      <polygon points="0 0, 10 3.5, 0 7" fill="#10b981"/>
+    </marker>
+    <marker id="arch-arr-red" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
+      <polygon points="0 0, 10 3.5, 0 7" fill="#ef4444"/>
+    </marker>
+  </defs>
+
+  <!-- Background -->
+  <rect width="800" height="380" fill="none"/>
+
+  <!-- Title -->
+  <text x="400" y="25" text-anchor="middle" font-family="system-ui, sans-serif" font-size="16" font-weight="600" fill="#e2e8f0">Traditional vs Tool RAG Architecture</text>
+
+  <!-- Left Section: Traditional -->
+  <rect x="20" y="45" width="370" height="320" rx="10" fill="#1a1a2e" stroke="#3b4252" stroke-width="1"/>
+  <text x="205" y="70" text-anchor="middle" font-family="system-ui, sans-serif" font-size="13" font-weight="500" fill="#f87171">Traditional: All Tools in Context</text>
+
+  <!-- Traditional LLM -->
+  <rect x="50" y="100" width="120" height="70" rx="8" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="2"/>
+  <text x="110" y="130" text-anchor="middle" font-family="system-ui, sans-serif" font-size="12" fill="#c4b5fd">LLM</text>
+  <text x="110" y="150" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#94a3b8">Context: ~8000 tokens</text>
+
+  <!-- Arrow to tools block -->
+  <line x1="170" y1="135" x2="215" y2="135" stroke="#ef4444" stroke-width="2" marker-end="url(#arch-arr-red)"/>
+
+  <!-- Traditional Tools Block -->
+  <rect x="225" y="90" width="145" height="220" rx="8" fill="#450a0a" stroke="#ef4444" stroke-width="2"/>
+  <text x="297" y="115" text-anchor="middle" font-family="system-ui, sans-serif" font-size="11" fill="#fca5a5">All Tools Loaded</text>
+  <text x="297" y="132" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#94a3b8">~3000+ tokens</text>
+
+  <!-- Tool entries -->
+  <rect x="240" y="145" width="115" height="22" rx="4" fill="#7f1d1d" stroke="#dc2626" stroke-width="1"/>
+  <text x="297" y="160" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fecaca">weather__get_current</text>
+
+  <rect x="240" y="172" width="115" height="22" rx="4" fill="#7f1d1d" stroke="#991b1b" stroke-width="1"/>
+  <text x="297" y="187" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fca5a5">weather__get_forecast</text>
+
+  <rect x="240" y="199" width="115" height="22" rx="4" fill="#7f1d1d" stroke="#991b1b" stroke-width="1"/>
+  <text x="297" y="214" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fca5a5">web__search</text>
+
+  <rect x="240" y="226" width="115" height="22" rx="4" fill="#7f1d1d" stroke="#991b1b" stroke-width="1"/>
+  <text x="297" y="241" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fca5a5">web__fetch_page</text>
+
+  <rect x="240" y="253" width="115" height="22" rx="4" fill="#7f1d1d" stroke="#991b1b" stroke-width="1"/>
+  <text x="297" y="268" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fca5a5">notes__create</text>
+
+  <text x="297" y="295" text-anchor="middle" font-family="system-ui, sans-serif" font-size="10" fill="#94a3b8">...</text>
+
+  <!-- Traditional problem label -->
+  <text x="110" y="200" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#f87171">Wasted context on</text>
+  <text x="110" y="215" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#f87171">unused tools</text>
+
+  <!-- Right Section: Tool RAG -->
+  <rect x="410" y="45" width="370" height="320" rx="10" fill="#1a1a2e" stroke="#3b4252" stroke-width="1"/>
+  <text x="595" y="70" text-anchor="middle" font-family="system-ui, sans-serif" font-size="13" font-weight="500" fill="#10b981">Tool RAG: Dynamic Discovery</text>
+
+  <!-- Tool RAG LLM -->
+  <rect x="430" y="100" width="120" height="70" rx="8" fill="#1e1b4b" stroke="#8b5cf6" stroke-width="2"/>
+  <text x="490" y="130" text-anchor="middle" font-family="system-ui, sans-serif" font-size="12" fill="#c4b5fd">LLM</text>
+  <text x="490" y="150" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#94a3b8">Context: ~500 tokens</text>
+
+  <!-- Arrow to search_tools -->
+  <line x1="550" y1="135" x2="595" y2="135" stroke="#8b5cf6" stroke-width="2" marker-end="url(#arch-arr)"/>
+
+  <!-- search_tools meta-tool -->
+  <rect x="605" y="105" width="100" height="60" rx="8" fill="#064e3b" stroke="#10b981" stroke-width="2"/>
+  <text x="655" y="130" text-anchor="middle" font-family="system-ui, sans-serif" font-size="10" fill="#a7f3d0">search_tools</text>
+  <text x="655" y="148" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#6ee7b7">meta-tool</text>
+
+  <!-- Arrow down to Vector DB -->
+  <line x1="655" y1="165" x2="655" y2="195" stroke="#10b981" stroke-width="2" marker-end="url(#arch-arr-green)"/>
+
+  <!-- Vector DB -->
+  <rect x="595" y="205" width="120" height="50" rx="8" fill="#1e1b4b" stroke="#a78bfa" stroke-width="1.5"/>
+  <text x="655" y="228" text-anchor="middle" font-family="system-ui, sans-serif" font-size="10" fill="#c4b5fd">Vector DB</text>
+  <text x="655" y="243" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#94a3b8">pgvector embeddings</text>
+
+  <!-- Arrow down to Tool Registry -->
+  <line x1="655" y1="255" x2="655" y2="285" stroke="#a78bfa" stroke-width="2" marker-end="url(#arch-arr)"/>
+
+  <!-- Tool Registry (compact) -->
+  <rect x="580" y="295" width="150" height="55" rx="8" fill="#451a03" stroke="#f59e0b" stroke-width="1.5"/>
+  <text x="655" y="318" text-anchor="middle" font-family="system-ui, sans-serif" font-size="10" fill="#fcd34d">Tool Registry</text>
+  <text x="655" y="335" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#94a3b8">All tools indexed, loaded on-demand</text>
+
+  <!-- Benefits box -->
+  <rect x="430" y="200" width="130" height="80" rx="6" fill="#022c22" stroke="#059669" stroke-width="1" stroke-dasharray="4,2"/>
+  <text x="495" y="222" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" font-weight="500" fill="#34d399">Benefits</text>
+  <text x="495" y="240" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#6ee7b7">~85% fewer tokens</text>
+  <text x="495" y="255" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#6ee7b7">Scales to 1000s of tools</text>
+  <text x="495" y="270" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#6ee7b7">Better tool selection</text>
+
+  <!-- Returned tools arrow -->
+  <path d="M 605 120 Q 575 120 575 155 L 490 155 L 490 170" stroke="#10b981" stroke-width="1.5" stroke-dasharray="4,2" fill="none" marker-end="url(#arch-arr-green)"/>
+  <text x="540" y="190" text-anchor="middle" font-family="system-ui, sans-serif" font-size="7" fill="#6ee7b7">Relevant tools only</text>
+</svg>
diff --git a/public/diagrams/tool-rag-flow.svg b/public/diagrams/tool-rag-flow.svg
@@ -20,8 +20,8 @@
   <!-- User Query -->
   <rect x="40" y="60" width="140" height="70" rx="8" fill="#1e3a5f" stroke="#3b82f6" stroke-width="2"/>
   <text x="110" y="85" text-anchor="middle" font-family="system-ui, sans-serif" font-size="11" fill="#93c5fd">User Query</text>
-  <text x="110" y="105" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#64748b">"Book a flight</text>
-  <text x="110" y="118" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#64748b">to Paris"</text>
+  <text x="110" y="105" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#64748b">"What's the weather</text>
+  <text x="110" y="118" text-anchor="middle" font-family="system-ui, sans-serif" font-size="9" fill="#64748b">in London?"</text>
 
   <!-- Arrow to Embed -->
   <line x1="180" y1="95" x2="230" y2="95" stroke="#8b5cf6" stroke-width="2" marker-end="url(#rag-arr)"/>
@@ -48,16 +48,16 @@
 
   <!-- Tool entries in registry -->
   <rect x="575" y="210" width="100" height="25" rx="4" fill="#78350f" stroke="#fbbf24" stroke-width="1"/>
-  <text x="625" y="227" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fcd34d">book_flight</text>
+  <text x="625" y="227" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fcd34d">get_weather</text>
 
   <rect x="575" y="242" width="100" height="25" rx="4" fill="#78350f" stroke="#92400e" stroke-width="1"/>
-  <text x="625" y="259" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fbbf24">get_weather</text>
+  <text x="625" y="259" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fbbf24">web_search</text>
 
   <rect x="575" y="274" width="100" height="25" rx="4" fill="#78350f" stroke="#92400e" stroke-width="1"/>
-  <text x="625" y="291" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fbbf24">search_hotels</text>
+  <text x="625" y="291" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fbbf24">send_email</text>
 
   <rect x="575" y="306" width="100" height="25" rx="4" fill="#78350f" stroke="#92400e" stroke-width="1"/>
-  <text x="625" y="323" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fbbf24">send_email</text>
+  <text x="625" y="323" text-anchor="middle" font-family="system-ui, sans-serif" font-size="8" fill="#fbbf24">create_note</text>
 
   <text x="625" y="345" text-anchor="middle" font-family="system-ui, sans-serif" font-size="10" fill="#94a3b8">...</text>
 
@@ -72,17 +72,17 @@
   <text x="405" y="215" text-anchor="middle" font-family="system-ui, sans-serif" font-size="11" fill="#a7f3d0">Top-K Relevant Tools (k=5)</text>
 
   <!-- Result entries -->
-  <text x="300" y="240" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">1. book_flight</text>
-  <text x="440" y="240" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.95</text>
+  <text x="300" y="240" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">1. get_current_weather</text>
+  <text x="455" y="240" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.94</text>
 
-  <text x="300" y="260" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">2. search_flights</text>
-  <text x="440" y="260" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.87</text>
+  <text x="300" y="260" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">2. get_forecast</text>
+  <text x="455" y="260" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.89</text>
 
-  <text x="300" y="280" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">3. get_airport_info</text>
-  <text x="440" y="280" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.72</text>
+  <text x="300" y="280" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">3. get_hourly_forecast</text>
+  <text x="455" y="280" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.82</text>
 
-  <text x="300" y="300" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">4. book_hotel</text>
-  <text x="440" y="300" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.65</text>
+  <text x="300" y="300" font-family="system-ui, sans-serif" font-size="9" fill="#6ee7b7">4. get_weather_alerts</text>
+  <text x="455" y="300" font-family="system-ui, sans-serif" font-size="9" fill="#34d399">0.71</text>
 
   <!-- Arrow from Results to LLM -->
   <line x1="280" y1="255" x2="200" y2="255" stroke="#10b981" stroke-width="2" marker-end="url(#rag-arr-green)"/>
diff --git a/public/screens/forge-armory-tool-rag-settings.png b/public/screens/forge-armory-tool-rag-settings.png
diff --git a/public/screens/forge-ui-tool-rag.png b/public/screens/forge-ui-tool-rag.png