Skip to content

Commit 5d73f86

Browse files
committed
Add blog post about Tool RAG dynamic tool discovery
1 parent de82989 commit 5d73f86

6 files changed

Lines changed: 225 additions & 14 deletions

File tree

blog/index.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@ Technical articles about building efficient AI agents with Agentic Forge.
66

77
<div class="blog-list">
88

9+
### [Tool RAG: Dynamic Tool Discovery for AI Agents](/blog/tool-rag-dynamic-discovery)
10+
*January 2026*
11+
12+
How we implemented Tool RAG in Agentic Forge—agents discover tools on-demand through semantic search instead of loading all tools upfront, reducing context usage and improving tool selection accuracy.
13+
14+
---
15+
916
### [TOON Format: Cutting Tokens Without Cutting Information](/blog/toon-format-support)
1017
*January 2026*
1118

blog/tool-rag-dynamic-discovery.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Tool RAG: Dynamic Tool Discovery for AI Agents
2+
3+
*January 2026*
4+
5+
As tool libraries grow, AI agents face a problem: loading every tool definition into context wastes tokens and confuses models. We implemented Tool RAG in Agentic Forge to solve this—agents now discover tools on-demand through semantic search instead of loading them all upfront.
6+
7+
## The Problem: Tool Overload
8+
9+
Traditional tool-calling agents receive all available tools in their system prompt. This works fine with 5-10 tools, but becomes problematic as the tool library grows:
10+
11+
- **Context consumed by definitions** — Each tool's name, description, and parameter schema takes tokens. With 20+ tools, this can easily exceed 3000 tokens before the conversation even starts.
12+
- **Model confusion** — Research shows LLMs perform worse when presented with many similar tools. They may call the wrong one or hallucinate parameters.
13+
- **Doesn't scale** — An agent with access to 100 tools would spend most of its context on tool definitions, leaving little room for conversation history.
14+
15+
## What is Tool RAG?
16+
17+
Tool RAG applies Retrieval-Augmented Generation to tool selection. Instead of loading all tools upfront, the agent starts with a single meta-tool called `search_tools`. When the agent needs a capability, it describes what it's looking for, and semantic search returns only the relevant tools.
18+
19+
This approach comes from recent research. The [ToolRAG paper](https://arxiv.org/html/2509.20386) demonstrated 3x improvement in tool selection accuracy and ~50% prompt token reduction compared to loading all tools.
20+
21+
## Architecture
22+
23+
The difference is straightforward:
24+
25+
![Traditional vs Tool RAG Architecture](/diagrams/tool-rag-architecture.svg)
26+
27+
**Traditional**: The LLM receives all tool definitions in its context. Every request pays the token cost for tools that won't be used.
28+
29+
**Tool RAG**: The LLM receives only the `search_tools` meta-tool. When it needs a capability, it searches for relevant tools, which are then loaded into context for the next turn.
30+
31+
## How It Works
32+
33+
Here's the flow when an agent handles a request like "What's the weather in London?":
34+
35+
![Tool RAG Flow](/diagrams/tool-rag-flow.svg)
36+
37+
1. **Initial context** — The agent sees only `search_tools` in its available tools
38+
2. **Tool search** — The agent calls `search_tools` with a query like "get current weather for a city"
39+
3. **Semantic matching** — The query is embedded and compared against tool description embeddings stored in pgvector
40+
4. **Tools returned** — All tools above the similarity threshold are returned (default: 0.5)
41+
5. **Second turn** — The discovered tools are available for the agent to call
42+
6. **Task completion** — The agent calls the appropriate tool with the user's parameters
43+
44+
The extra round-trip for tool discovery is handled automatically with auto-continue, so the user experience is seamless.
45+
46+
## Implementation in Forge Armory
47+
48+
Armory exposes Tool RAG through a mode parameter on the MCP endpoint:
49+
50+
| Endpoint | Behavior |
51+
|----------|----------|
52+
| `/mcp` | Standard mode — returns all tools |
53+
| `/mcp?mode=rag` | RAG mode — returns only `search_tools` |
54+
55+
When RAG mode is enabled, the `search_tools` meta-tool performs semantic search against the tool registry. The similarity threshold is configurable through the admin UI.
56+
57+
![Armory Tool RAG Settings](/screens/forge-armory-tool-rag-settings.png)
58+
59+
The UI also displays a capability manifest—a system prompt template that includes the <code v-pre>{{TOOL_LIST}}</code> placeholder. This gets populated with discovered tools after each search, letting you customize how tool capabilities are presented to the model.
60+
61+
## Using Tool RAG
62+
63+
In forge-ui, Tool RAG can be enabled per-conversation with a toggle in the chat settings:
64+
65+
![forge-ui with Tool RAG](/screens/forge-ui-tool-rag.png)
66+
67+
When enabled, the conversation starts with minimal context. The agent discovers tools as needed based on user requests.
68+
69+
## When to Use Tool RAG
70+
71+
Tool RAG makes the most sense when:
72+
73+
- **Large tool libraries** — 10+ tools where loading all definitions is costly
74+
- **Varied task types** — Different conversations need different tool subsets
75+
- **Cost-sensitive applications** — Token savings compound across many requests
76+
- **Context-limited scenarios** — Smaller models or long conversations where every token matters
77+
78+
It's less beneficial for:
79+
80+
- **Small fixed tool sets** — With 3-5 tools, the overhead of search may exceed the savings
81+
- **Latency-critical applications** — The extra round-trip adds some latency (though auto-continue minimizes this)
82+
- **Highly specialized agents** — If every conversation uses the same tools, dynamic discovery adds no value
83+
84+
## What's Next
85+
86+
In an upcoming post, we'll share concrete benchmarks comparing token usage across three configurations: standard tool calling, TOON format optimization, and TOON + Tool RAG combined. Preliminary results show up to 58% token reduction when both optimizations are applied.
87+
88+
## Source Code
89+
90+
- [forge-armory](https://github.com/agentic-forge/forge-armory) — MCP gateway with Tool RAG support
91+
- [forge-orchestrator](https://github.com/agentic-forge/forge-orchestrator) — Agent loop with RAG mode integration
92+
- [forge-ui](https://github.com/agentic-forge/forge-ui) — Chat interface with RAG toggle
93+
94+
## References
95+
96+
- [ToolRAG: Enhancing Large Language Model Tool Interaction](https://arxiv.org/html/2509.20386) — The research paper that inspired this implementation
97+
- [Model Context Protocol](https://modelcontextprotocol.io/) — The protocol standard for tool integration
98+
- [pgvector](https://github.com/pgvector/pgvector) — Vector similarity search for PostgreSQL
99+
100+
---
101+
102+
*This is part of a series on building [Agentic Forge](https://agentic-forge.github.io).*
Lines changed: 102 additions & 0 deletions
Loading

public/diagrams/tool-rag-flow.svg

Lines changed: 14 additions & 14 deletions
Loading
193 KB
Loading
385 KB
Loading

0 commit comments

Comments
 (0)