Tools are the mechanism through which the agent interacts with the outside world. The model decides which tool to call and with what arguments; the agent executes it and feeds the result back into the conversation. This document covers the tool abstraction, the registry, all built-in tools, the shared execution backend, and how to add new tools.
┌──────────────────────────────────────────────────────┐
│ Agent Loop │
│ │
│ model response │
│ └─ tool_calls: [{name, args}, ...] │
│ │ │
│ ▼ │
│ ToolRegistry::execute(name, args) │
│ │ │
│ ▼ │
│ impl Tool::execute(args) ──► ExecutionContext │
│ │ │
│ ┌─────────┴──────────┐ │
│ │ Local │ Container │ │
│ │ │ SSH+tmux │ │
│ └─────────────────────┘ │
└──────────────────────────────────────────────────────┘
Every tool implements three methods:
#[async_trait]
pub trait Tool: Send + Sync {
/// Unique name the model uses when calling this tool.
fn name(&self) -> &'static str;
/// OpenAI-compatible function definition sent to the API in each request.
fn definition(&self) -> ToolDefinition;
/// Execute with the JSON argument string the model provided.
/// Returns a text result that is pushed back into the conversation.
async fn execute(&self, arguments: &str, context: &ToolContext) -> Result<String, ToolError>;
}Key design choices:
- Arguments arrive as a raw JSON string (OpenAI's double-encoding). Tools
deserialise with
serde_json::from_strand return aToolError::InvalidArgumentson parse failure. ToolContextprovides an optional stream sink for incremental events (started,stdout,stderr,info,completed) consumed by the runtime UI.- The return type is always
String. If execution fails and the error is not fatal, formatting the error as a string and returning it lets the model read the failure and decide what to do next. The agent loop formats hard errors as"Tool error: {e}"and continues the conversation. - Tools are
Send + Syncso they can be shared across async tasks.
The registry is a simple Vec<Box<dyn Tool>> with three operations:
registry.register(tool); // add a tool
registry.definitions(); // collect ToolDefinition for the API request
registry.execute(name, args).await // dispatch by name
registry.execute_with_context(name, args, &ctx).await // dispatch with stream sinkIf definitions() is called with no tools registered, the agent omits the
tools field from the API request entirely (providers reject an empty array).
To improve tool-selection reliability across providers/models, each tool definition description now includes explicit mini-sections:
When to useWhen NOT to useDisambiguationExampleorExamples
This content is sent in function definitions and acts as an in-band routing
guide for the model (for example, run_shell vs capture-pane vs send-keys,
or web_search vs fetch_url).
Schema-level safety requirements (for example required metadata on run_shell
and send-keys) remain encoded in each tool's JSON schema required list and
are covered by unit tests.
Twelve tools ship with the agent. Each is conditionally registered based on
config flags ([tools].shell_enabled, fetch_enabled, etc.).
Run a shell command and capture its output.
Arguments:
{
"command": "du -sh /var",
"session": "ops",
"pane": "worker",
"risk": "low",
"mutation": false,
"privesc": false,
"why": "Inspect disk usage before cleanup",
"wait": true
}| Field | Type | Default | Description |
|---|---|---|---|
command |
string | required | Shell command, executed via sh -c |
risk |
string | required | Estimated command risk: low, medium, high |
mutation |
bool | required | Whether command mutates system state |
privesc |
bool | required | Whether command uses privilege escalation |
why |
string | required | Short reason and risk justification |
session |
string | shared default session | Optional managed tmux session selector |
pane |
string | shared |
Optional managed tmux pane selector |
wait |
bool | string | int | true |
Waiting behaviour (see below) |
Wait modes:
| Value | Behaviour |
|---|---|
true (default) |
Block until command exits |
false |
Fire and forget; requires a tmux-backed target |
"30s", "10m", "1h" |
Block up to a timeout, then error |
500 (integer) |
Block up to N seconds |
Output format:
All tools return a JSON envelope:
{
"harness_timestamp": { "source": "harness", "unix_millis": 1772283708794 },
"result": {
"exit_code": 0,
"stdout": "512M\t/var",
"stderr": ""
}
}run_shell truncates both stdout and stderr to 4000 characters.
Approval flow:
When [tools].shell_confirm = true in config, the tool pauses before running
and waits for user approval. In interactive mode this goes through the REPL's
inline approval prompt; in one-shot mode it falls back to a simple stdin
prompt (Run: <cmd> [y/N]). Denied commands return
{"result":"Command execution denied by user.", ...}.
Spinner: run_shell manages its own spinner so that it can appear after
the approval prompt, not before.
Managed shared-pane guardrails:
When execution is tmux-managed, run_shell rejects shell-killing directives in
the shared shell context:
set -e/set -o errexit/setopt errexitexit/logoutexec ...(shell replacement)
Use subshell/heredoc forms (for example bash -lc 'set -e; ...') when strict
mode is needed without poisoning the parent shared shell.
Read the contents of a file.
Arguments:
{ "path": "/etc/hostname" }Returns the file's text content, truncated to 8000 characters with
...[truncated] appended if it exceeds the limit.
In container or SSH mode the read is performed via cat -- <path> on the
remote target.
Write content to a file, creating it if needed and overwriting if it exists.
Arguments:
{
"path": "/tmp/output.txt",
"content": "hello world\n"
}Returns "Wrote N bytes to /path" on success.
In container or SSH mode, the content is piped via stdin to cat > <path>.
Perform an HTTP GET and return the response body.
Arguments:
{ "url": "https://example.com/api/data" }Uses reqwest under the hood. No authentication is supported. Response body
is truncated to 8000 characters.
Useful for downloading configuration files, checking APIs, or fetching documentation pages.
Search the web via DuckDuckGo's HTML endpoint. No API key is required.
Arguments:
{ "query": "rust tokio tutorial" }Returns up to 8 results, each with title, URL, and snippet. The HTML is
parsed with scraper selectors, with a fallback extractor when the primary
result container layout changes.
Example output:
1. Tokio - An asynchronous Rust runtime
https://tokio.rs
Tokio is an event-driven, non-blocking I/O platform...
2. Tutorial | Tokio - An asynchronous Rust runtime
https://tokio.rs/tokio/tutorial
...
Capture a snapshot of a tmux pane's visible output. This tool is only
registered when a tmux pane is available (either locally via $TMUX_PANE, or
in buddy-managed local/container tmux execution, or on an SSH target with a
tmux session).
Arguments:
{
"delay": "2s",
"start": "-",
"end": "-",
"join_wrapped_lines": true
}| Field | Default | Description |
|---|---|---|
target |
active pane | Legacy raw tmux pane/session target (-t syntax) |
session |
shared default session | Optional managed tmux session selector |
pane |
shared |
Optional managed tmux pane selector |
start |
tmux default | Start line (-S); "-" = beginning of history |
end |
tmux default | End line (-E); "-" = end of visible area |
join_wrapped_lines |
true |
tmux -J flag — join soft-wrapped lines |
preserve_trailing_spaces |
false |
tmux -N flag |
include_escape_sequences |
false |
tmux -e flag (ANSI codes) |
escape_non_printable |
false |
tmux -C flag (octal encoding) |
include_alternate_screen |
false |
tmux -a flag |
delay |
none | Wait before capturing (for polling) |
Output is truncated to 8000 characters, keeping the tail (most recent
content), with [truncated N chars from start] prepended if clipped.
If include_alternate_screen is requested but no alternate screen is active,
the tool silently falls back to the main pane and appends a notice.
If an explicit managed target/session/pane selector points to a missing
managed pane, tmux_capture_pane retries once against the default shared pane
and prepends a missing-target fallback notice.
Common pattern — polling a background command:
run_shell({"command": "npm run build", "wait": false})
→ "command dispatched to tmux pane %1"
capture-pane({"delay": "5s"})
→ "[...build output so far...]"
Inject keystrokes into a tmux pane. Only available with a tmux backend.
Arguments:
{
"keys": ["C-c"],
"literal_text": "yes\n",
"enter": true,
"delay": "500ms",
"risk": "low",
"mutation": false,
"privesc": false,
"why": "Send Ctrl-C to cancel a hung command"
}| Field | Description |
|---|---|
target |
Legacy raw tmux pane/session target; defaults to active pane |
session |
Optional managed tmux session selector |
pane |
Optional managed tmux pane selector |
keys |
tmux key names: "C-c", "C-z", "Enter", "Up", "Down", etc. |
literal_text |
Literal text to type (uses tmux send-keys -l) |
enter |
Press Enter after other keys |
delay |
Wait before sending |
risk |
Required risk label: low, medium, high |
mutation |
Required mutation flag |
privesc |
Required privilege-escalation flag |
why |
Required short justification |
Keys are sent in order: literal_text first, then named keys, then Enter
if requested.
Unlike tmux_capture_pane (read-only), explicit missing managed selectors in
tmux_send_keys remain hard errors so key injection is never silently retargeted.
Common patterns:
// Cancel a stuck command
{"keys": ["C-c"]}
// Respond to an interactive prompt
{"literal_text": "yes", "enter": true}
// Navigate a menu
{"keys": ["Down", "Down", "Enter"]}Create or reuse a buddy-managed tmux session and ensure its shared pane is ready.
Required fields: session, risk, mutation, privesc, why.
Session names are canonicalized to the buddy owner prefix
(buddy-<agent.name>-...) and are bounded by [tmux].max_sessions.
Kill one buddy-managed tmux session.
- Cannot kill the default shared session.
- Fails for unmanaged sessions.
Required fields: session, risk, mutation, privesc, why.
Create or reuse a buddy-managed pane in a managed session.
Required fields: pane, risk, mutation, privesc, why.
Optional: session.
Pane names are canonicalized to buddy-managed names (except reserved
shared) and are bounded by [tmux].max_panes.
Kill one buddy-managed pane in a managed session.
- Default shared pane is protected from deletion.
- Fails for unmanaged panes.
Required fields: pane, risk, mutation, privesc, why.
Optional: session.
Return the current wall-clock time snapshot from the harness.
Arguments:
{}result includes common UTC/epoch fields (for example unix_millis,
iso_8601_utc, and rfc_2822_utc) wrapped in the standard envelope with
harness_timestamp.
This tool reports the harness wall-clock time, not the remote shell's time. It is useful when the model needs to timestamp actions or calculate durations without shelling out.
run_shell, read_file, write_file, capture-pane, and send-keys all
delegate to an ExecutionContext rather than running commands directly. This
single abstraction supports multiple execution backends transparently.
ExecutionContext now stores an internal trait object:
Arc<dyn ExecutionBackendOps>Concrete backend implementations currently include:
LocalBackendLocalTmuxContextContainerContext(docker/podman exec)ContainerTmuxContext(container exec + tmux)SshContext(SSH ControlMaster + tmux)
Shared command-oriented behavior is factored through a CommandBackend trait so
read_file/write_file and shell command execution paths are not duplicated
per backend.
All tools accept an ExecutionContext at construction time. The REPL
constructs the context based on CLI flags (--container, --ssh, --tmux)
and passes it to every tool. In current CLI flows with shell/files enabled,
local and container targets are tmux-managed by default.
Commands run via tokio::process::Command directly on the host. For
wait=false shell commands, the command is dispatched to the current tmux
pane via tmux send-keys.
File reads use tokio::fs::read_to_string; writes use tokio::fs::write.
Commands run via docker exec or podman exec. The engine is auto-detected
at startup by probing docker --version and podman --version.
docker exec <container> sh -lc '<command>'
For commands that need stdin (e.g., write_file), the interactive flag is
added: -i for Docker, --interactive for Podman.
In current CLI flows, container execution is tmux-backed by default; --tmux
is used to provide an explicit managed session name:
- a session is created/reused inside the container,
- commands are dispatched with
tmux send-keys, wait=false,capture-pane, andsend-keysbecome available.
The most sophisticated mode. See Remote Execution for the full design. In brief:
- An SSH ControlMaster socket is established at startup and reused for all subsequent commands.
- Commands are executed inside a persistent tmux pane rather than fresh SSH processes, so the operator can attach and observe what the agent is doing.
- Output is collected via a prompt-marker system that lets the agent reliably extract the output of each command from the tmux scrollback buffer.
// src/tools/my_tool.rs
use async_trait::async_trait;
use serde::Deserialize;
use super::{Tool, ToolContext};
use crate::error::ToolError;
use crate::types::{FunctionDefinition, ToolDefinition};
pub struct MyTool;
#[derive(Deserialize)]
struct Args {
message: String,
}
#[async_trait]
impl Tool for MyTool {
fn name(&self) -> &'static str {
"my_tool"
}
fn definition(&self) -> ToolDefinition {
ToolDefinition {
tool_type: "function".into(),
function: FunctionDefinition {
name: self.name().into(),
description: "Does something useful.".into(),
parameters: serde_json::json!({
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The message to process"
}
},
"required": ["message"]
}),
},
}
}
async fn execute(&self, arguments: &str, _context: &ToolContext) -> Result<String, ToolError> {
let args: Args = serde_json::from_str(arguments)
.map_err(|e| ToolError::InvalidArguments(e.to_string()))?;
Ok(format!("processed: {}", args.message))
}
}// src/tools/mod.rs
pub mod my_tool;// src/app/entry.rs (inside build_tools)
registry.register(MyTool);Add my_tool_enabled: bool to ToolsConfig in src/config/types.rs and wrap the
registration:
if config.tools.my_tool_enabled {
registry.register(MyTool);
}Tool output is truncated before being stored in conversation history. Keeping results small prevents accidental context exhaustion.
| Tool | Limit | Truncation style |
|---|---|---|
run_shell stdout |
4000 chars | head (appends ...[truncated]) |
run_shell stderr |
4000 chars | head |
read_file |
8000 chars | head |
fetch_url |
8000 chars | head |
capture-pane |
8000 chars | tail (prepends [truncated N chars from start]) |
capture-pane truncates from the tail instead of the head because the
most recent screen content is more relevant than old scrollback.
Tools can return two kinds of errors:
enum ToolError {
InvalidArguments(String), // bad JSON from model
ExecutionFailed(String), // runtime failure
}Neither variant aborts the agent loop. The agent formats errors as
"Tool error: {e}" and pushes the string as a tool_result message.
To reduce retry loops, Buddy suppresses repeated identical failures for the
same (tool_name, arguments) after a small threshold and returns a
deterministic remediation error instead of re-executing the tool call.