|
| 1 | +# Tool RAG: Dynamic Tool Discovery for AI Agents |
| 2 | + |
| 3 | +*January 2026* |
| 4 | + |
| 5 | +As tool libraries grow, AI agents face a problem: loading every tool definition into context wastes tokens and confuses models. We implemented Tool RAG in Agentic Forge to solve this—agents now discover tools on-demand through semantic search instead of loading them all upfront. |
| 6 | + |
| 7 | +## The Problem: Tool Overload |
| 8 | + |
| 9 | +Traditional tool-calling agents receive all available tools in their system prompt. This works fine with 5-10 tools, but becomes problematic as the tool library grows: |
| 10 | + |
| 11 | +- **Context consumed by definitions** — Each tool's name, description, and parameter schema takes tokens. With 20+ tools, this can easily exceed 3000 tokens before the conversation even starts. |
| 12 | +- **Model confusion** — Research shows LLMs perform worse when presented with many similar tools. They may call the wrong one or hallucinate parameters. |
| 13 | +- **Doesn't scale** — An agent with access to 100 tools would spend most of its context on tool definitions, leaving little room for conversation history. |
| 14 | + |
| 15 | +## What is Tool RAG? |
| 16 | + |
| 17 | +Tool RAG applies Retrieval-Augmented Generation to tool selection. Instead of loading all tools upfront, the agent starts with a single meta-tool called `search_tools`. When the agent needs a capability, it describes what it's looking for, and semantic search returns only the relevant tools. |
| 18 | + |
| 19 | +This approach comes from recent research. The [ToolRAG paper](https://arxiv.org/html/2509.20386) demonstrated 3x improvement in tool selection accuracy and ~50% prompt token reduction compared to loading all tools. |
| 20 | + |
| 21 | +## Architecture |
| 22 | + |
| 23 | +The difference is straightforward: |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +**Traditional**: The LLM receives all tool definitions in its context. Every request pays the token cost for tools that won't be used. |
| 28 | + |
| 29 | +**Tool RAG**: The LLM receives only the `search_tools` meta-tool. When it needs a capability, it searches for relevant tools, which are then loaded into context for the next turn. |
| 30 | + |
| 31 | +## How It Works |
| 32 | + |
| 33 | +Here's the flow when an agent handles a request like "What's the weather in London?": |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +1. **Initial context** — The agent sees only `search_tools` in its available tools |
| 38 | +2. **Tool search** — The agent calls `search_tools` with a query like "get current weather for a city" |
| 39 | +3. **Semantic matching** — The query is embedded and compared against tool description embeddings stored in pgvector |
| 40 | +4. **Tools returned** — All tools above the similarity threshold are returned (default: 0.5) |
| 41 | +5. **Second turn** — The discovered tools are available for the agent to call |
| 42 | +6. **Task completion** — The agent calls the appropriate tool with the user's parameters |
| 43 | + |
| 44 | +The extra round-trip for tool discovery is handled automatically with auto-continue, so the user experience is seamless. |
| 45 | + |
| 46 | +## Implementation in Forge Armory |
| 47 | + |
| 48 | +Armory exposes Tool RAG through a mode parameter on the MCP endpoint: |
| 49 | + |
| 50 | +| Endpoint | Behavior | |
| 51 | +|----------|----------| |
| 52 | +| `/mcp` | Standard mode — returns all tools | |
| 53 | +| `/mcp?mode=rag` | RAG mode — returns only `search_tools` | |
| 54 | + |
| 55 | +When RAG mode is enabled, the `search_tools` meta-tool performs semantic search against the tool registry. The similarity threshold is configurable through the admin UI. |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | +The UI also displays a capability manifest—a system prompt template that includes the <code v-pre>{{TOOL_LIST}}</code> placeholder. This gets populated with discovered tools after each search, letting you customize how tool capabilities are presented to the model. |
| 60 | + |
| 61 | +## Using Tool RAG |
| 62 | + |
| 63 | +In forge-ui, Tool RAG can be enabled per-conversation with a toggle in the chat settings: |
| 64 | + |
| 65 | + |
| 66 | + |
| 67 | +When enabled, the conversation starts with minimal context. The agent discovers tools as needed based on user requests. |
| 68 | + |
| 69 | +## When to Use Tool RAG |
| 70 | + |
| 71 | +Tool RAG makes the most sense when: |
| 72 | + |
| 73 | +- **Large tool libraries** — 10+ tools where loading all definitions is costly |
| 74 | +- **Varied task types** — Different conversations need different tool subsets |
| 75 | +- **Cost-sensitive applications** — Token savings compound across many requests |
| 76 | +- **Context-limited scenarios** — Smaller models or long conversations where every token matters |
| 77 | + |
| 78 | +It's less beneficial for: |
| 79 | + |
| 80 | +- **Small fixed tool sets** — With 3-5 tools, the overhead of search may exceed the savings |
| 81 | +- **Latency-critical applications** — The extra round-trip adds some latency (though auto-continue minimizes this) |
| 82 | +- **Highly specialized agents** — If every conversation uses the same tools, dynamic discovery adds no value |
| 83 | + |
| 84 | +## What's Next |
| 85 | + |
| 86 | +In an upcoming post, we'll share concrete benchmarks comparing token usage across three configurations: standard tool calling, TOON format optimization, and TOON + Tool RAG combined. Preliminary results show up to 58% token reduction when both optimizations are applied. |
| 87 | + |
| 88 | +## Source Code |
| 89 | + |
| 90 | +- [forge-armory](https://github.com/agentic-forge/forge-armory) — MCP gateway with Tool RAG support |
| 91 | +- [forge-orchestrator](https://github.com/agentic-forge/forge-orchestrator) — Agent loop with RAG mode integration |
| 92 | +- [forge-ui](https://github.com/agentic-forge/forge-ui) — Chat interface with RAG toggle |
| 93 | + |
| 94 | +## References |
| 95 | + |
| 96 | +- [ToolRAG: Enhancing Large Language Model Tool Interaction](https://arxiv.org/html/2509.20386) — The research paper that inspired this implementation |
| 97 | +- [Model Context Protocol](https://modelcontextprotocol.io/) — The protocol standard for tool integration |
| 98 | +- [pgvector](https://github.com/pgvector/pgvector) — Vector similarity search for PostgreSQL |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +*This is part of a series on building [Agentic Forge](https://agentic-forge.github.io).* |
0 commit comments