You can include files in a conversation using Paths:
from mcp_agent.core.prompt import Prompt
from pathlib import Path
plans = agent.send(
Prompt.user(
"Summarise this PDF",
Path("secret-plans.pdf")
)
)This works for any mime type that can be tokenized by the model.
MCP Server resources can be conveniently included in a message with:
description = agent.with_resource(
"What is in this image?",
"mcp_image_server",
"resource://images/cat.png"
)Prompt Files can include Resources:
agent_script.txt
---USER
Please extract the major colours from this CSS file:
---RESOURCE
index.css
They can either be loaded with the load_prompt_multipart function, or delivered via the built-in prompt-server.
Defining an agent is as simple as:
@fast.agent(
instruction="Given an object, respond only with an estimate of its size."
)We can then send messages to the Agent:
async with fast.run() as agent:
moon_size = await agent("the moon")
print(moon_size)Or start an interactive chat with the Agent:
async with fast.run() as agent:
await agent.interactive()Here is the complete sizer.py Agent application, with boilerplate code:
sizer.py
import asyncio
from mcp_agent.core.fastagent import FastAgent
# Create the application
fast = FastAgent("Agent Example")
@fast.agent(
instruction="Given an object, respond only with an estimate of its size."
)
async def main():
async with fast.run() as agent:
await agent()
if __name__ == "__main__":
asyncio.run(main())The Agent can then be run with uv run sizer.py.
Specify a model with the --model switch - for example uv run sizer.py --model sonnet.
To generate examples use fast-agent quickstart workflow. This example can be run with uv run workflow/chaining.py. fast-agent looks for configuration files in the current directory before checking parent directories recursively.
Agents can be chained to build a workflow, using MCP Servers defined in the fastagent.config.yaml file:
fastagent.config.yaml
# Example of a STDIO sever named "fetch"
mcp:
servers:
fetch:
command: "uvx"
args: ["mcp-server-fetch"]
social.py
@fast.agent(
"url_fetcher",
"Given a URL, provide a complete and comprehensive summary",
servers=["fetch"], # Name of an MCP Server defined in fastagent.config.yaml
)
@fast.agent(
"social_media",
"""
Write a 280 character social media post for any given text.
Respond only with the post, never use hashtags.
""",
)
@fast.chain(
name="post_writer",
sequence=["url_fetcher", "social_media"],
)
async def main():
async with fast.run() as agent:
# using chain workflow
await agent.post_writer("http://fast-agent.ai")All Agents and Workflows respond to .send("message"). The agent app responds to .interactive() to start a chat session.
Saved as social.py we can now run this workflow from the command line with:
uv run workflow/chaining.py --agent post_writer --message "<url>"
Add the --quiet switch to disable progress and message display and return only the final response - useful for simple automations.
Read more about running fast-agent agents here
fast-agent has built-in support for the patterns referenced in Anthropic's Building Effective Agents paper.
The chain workflow offers a declarative approach to calling Agents in sequence:
@fast.chain(
"post_writer",
sequence=["url_fetcher","social_media"]
)
# we can them prompt it directly:
async with fast.run() as agent:
await agent.interactive(agent="post_writer")This starts an interactive session, which produces a short social media post for a given URL. If a chain is prompted it returns to a chat with last Agent in the chain. You can switch agents by typing @agent-name.
Chains can be incorporated in other workflows, or contain other workflow elements (including other Chains). You can set an instruction to describe it's capabilities to other workflow steps if needed.
Chains are also helpful for capturing content before being dispatched by a router, or summarizing content before being used in the downstream workflow.
Agents can request Human Input to assist with a task or get additional context:
@fast.agent(
instruction="An AI agent that assists with basic tasks. Request Human Input when needed.",
human_input=True,
)
await agent("print the next number in the sequence")In the example human_input.py, the Agent will prompt the User for additional information to complete the task.
The Parallel Workflow sends the same message to multiple Agents simultaneously (fan-out), then uses the fan-in Agent to process the combined content.
@fast.agent("translate_fr", "Translate the text to French")
@fast.agent("translate_de", "Translate the text to German")
@fast.agent("translate_es", "Translate the text to Spanish")
@fast.parallel(
name="translate",
fan_out=["translate_fr","translate_de","translate_es"]
)
@fast.chain(
"post_writer",
sequence=["url_fetcher","social_media","translate"]
)If you don't specify a fan-in agent, the parallel returns the combined Agent results verbatim.
parallel is also useful to ensemble ideas from different LLMs.
When using parallel in other workflows, specify an instruction to describe its operation.
Evaluator-Optimizers combine 2 agents: one to generate content (the generator), and the other to judge that content and provide actionable feedback (the evaluator). Messages are sent to the generator first, then the pair run in a loop until either the evaluator is satisfied with the quality, or the maximum number of refinements is reached. The final result from the Generator is returned.
If the Generator has use_history off, the previous iteration is returned when asking for improvements - otherwise conversational context is used.
@fast.evaluator_optimizer(
name="researcher"
generator="web_searcher"
evaluator="quality_assurance"
min_rating="EXCELLENT"
max_refinements=3
)
async with fast.run() as agent:
await agent.researcher.send("produce a report on how to make the perfect espresso")When used in a workflow, it returns the last generator message as the result.
See the evaluator.py workflow example, or fast-agent quickstart researcher for a more complete example.
Routers use an LLM to assess a message, and route it to the most appropriate Agent. The routing prompt is automatically generated based on the Agent instructions and available Servers.
@fast.router(
name="route"
agents["agent1","agent2","agent3"]
)NB - If only one agent is supplied to the router, it forwards directly.
Look at the router.py workflow for an example.
Given a complex task, the Orchestrator uses an LLM to generate a plan to divide the task amongst the available Agents. The planning and aggregation prompts are generated by the Orchestrator, which benefits from using more capable models. Plans can either be built once at the beginning (plantype="full") or iteratively (plantype="iterative").
@fast.orchestrator(
name="orchestrate"
agents=["task1","task2","task3"]
)See the orchestrator.py or agent_build.py workflow example.
All definitions allow omitting the name and instructions arguments for brevity:
@fast.agent("You are a helpful agent") # Create an agent with a default name.
@fast.agent("greeter","Respond cheerfully!") # Create an agent with the name "greeter"
moon_size = await agent("the moon") # Call the default (first defined agent) with a message
result = await agent.greeter("Good morning!") # Send a message to an agent by name using dot notation
result = await agent.greeter.send("Hello!") # You can call 'send' explicitly
agent["greeter"].send("Good Evening!") # Dictionary access to agents is also supportedRead more about prompting agents here
@fast.agent(
name="agent", # name of the agent
instruction="You are a helpful Agent", # base instruction for the agent
servers=["filesystem"], # list of MCP Servers for the agent
model="o3-mini.high", # specify a model for the agent
use_history=True, # agent maintains chat history
request_params=RequestParams(temperature= 0.7), # additional parameters for the LLM (or RequestParams())
human_input=True, # agent can request human input
)@fast.chain(
name="chain", # name of the chain
sequence=["agent1", "agent2", ...], # list of agents in execution order
instruction="instruction", # instruction to describe the chain for other workflows
cumulative=False # whether to accumulate messages through the chain
continue_with_final=True, # open chat with agent at end of chain after prompting
)@fast.parallel(
name="parallel", # name of the parallel workflow
fan_out=["agent1", "agent2"], # list of agents to run in parallel
fan_in="aggregator", # name of agent that combines results (optional)
instruction="instruction", # instruction to describe the parallel for other workflows
include_request=True, # include original request in fan-in message
)@fast.evaluator_optimizer(
name="researcher", # name of the workflow
generator="web_searcher", # name of the content generator agent
evaluator="quality_assurance", # name of the evaluator agent
min_rating="GOOD", # minimum acceptable quality (EXCELLENT, GOOD, FAIR, POOR)
max_refinements=3, # maximum number of refinement iterations
)@fast.router(
name="route", # name of the router
agents=["agent1", "agent2", "agent3"], # list of agent names router can delegate to
instruction="routing instruction", # any extra routing instructions
servers=["filesystem"] # list of servers for the routing agent
model="o3-mini.high", # specify routing model
use_history=False, # router maintains conversation history
human_input=False, # whether router can request human input
)@fast.orchestrator(
name="orchestrator", # name of the orchestrator
instruction="instruction", # base instruction for the orchestrator
agents=["agent1", "agent2"], # list of agent names this orchestrator can use
model="o3-mini.high", # specify orchestrator planning model
use_history=False, # orchestrator doesn't maintain chat history (no effect).
human_input=False, # whether orchestrator can request human input
plan_type="full", # planning approach: "full" or "iterative"
max_iterations=5, # maximum number of full plan attempts, or iterations
)fast-agent provides a flexible MCP based API for sending messages to agents, with convenience methods for handling Files, Prompts and Resources.
Read more about the use of MCP types in fast-agent here.
The simplest way of sending a message to an agent is the send method:
response: str = await agent.send("how are you?")This returns the text of the agent's response as a string, making it ideal for simple interactions.
You can attach files by using Prompt.user() method to construct your message:
from mcp_agent.core.prompt import Prompt
from pathlib import Path
plans: str = await agent.send(
Prompt.user(
"Summarise this PDF",
Path("secret-plans.pdf")
)
)Prompt.user() automatically converts content to the appropriate MCP Type. For example, image/png becomes ImageContent and application/pdf becomes an EmbeddedResource.
You can also use MCP Types directly - for example:
from mcp.types import ImageContent, TextContent
mcp_text: TextContent = TextContent(type="text", text="Analyse this image.")
mcp_image: ImageContent = ImageContent(type="image",
mimeType="image/png",
data=base_64_encoded)
response: str = await agent.send(
Prompt.user(
mcp_text,
mcp_image
)
)Note: use
Prompt.assistant()to produce messages for theassistantrole.
The generate() method allows you to access multimodal content from an agent, or its Tool Calls as well as send conversational pairs.
from mcp_agent.core.prompt import Prompt
from mcp_agent.mcp.prompt_message_multipart import PromptMessageMultipart
message = Prompt.user("Describe an image of a sunset")
response: PromptMessageMultipart = await agent.generate([message])
print(response.last_text()) # Main text responseThe key difference between send() and generate() is that generate() returns a PromptMessageMultipart object, giving you access to the complete response structure:
last_text(): Gets the main text responsefirst_text(): Gets the first text content if multiple text blocks existall_text(): Combines all text content in the responsecontent: Direct access to the full list of content parts, including Images and EmbeddedResources
This is particularly useful when working with multimodal responses or tool outputs:
# Generate a response that might include multiple content types
response = await agent.generate([
Prompt.user("Analyze this image", Path("chart.png"))
])
for content in response.content:
if content.type == "text":
print("Text response:", content.text[:100], "...")
elif content.type == "image":
print("Image content:", content.mimeType)
elif content.type == "resource":
print("Resource:", content.resource.uri)You can also use generate() for multi-turn conversations by passing multiple messages:
messages = [
Prompt.user("What is the capital of France?"),
Prompt.assistant("The capital of France is Paris."),
Prompt.user("And what is its population?")
]
response = await agent.generate(messages)The generate() method provides the foundation for working with content returned by the LLM, and MCP Tool, Prompt and Resource calls.
When you need the agent to return data in a specific format, use the structured() method. This parses the agent's response into a Pydantic model:
from pydantic import BaseModel
from typing import List
# Define your expected response structure
class CityInfo(BaseModel):
name: str
country: str
population: int
landmarks: List[str]
# Request structured information
result, message = await agent.structured(
[Prompt.user("Tell me about Paris")],
CityInfo
)
# Now you have strongly typed data
if result:
print(f"City: {result.name}, Population: {result.population:,}")
for landmark in result.landmarks:
print(f"- {landmark}")The structured() method returns a tuple containing:
- The parsed Pydantic model instance (or
Noneif parsing failed) - The full
PromptMessageMultipartresponse
This approach is ideal for:
- Extracting specific data points in a consistent format
- Building workflows where agents need structured inputs/outputs
- Integrating agent responses with typed systems
Always check if the first value is None to handle cases where the response couldn't be parsed into your model:
result, message = await agent.structured([Prompt.user("Describe Paris")], CityInfo)
if result is None:
# Fall back to the text response
print("Could not parse structured data, raw response:")
print(message.last_text())The structured() method provides the same request parameter options as generate().
Note
LLMs produce JSON when producing Structured responses, which can conflict with Tool Calls. Use a chain to combine Tool Calls with Structured Outputs.
Apply a Prompt from an MCP Server to the agent with:
response: str = await agent.apply_prompt(
"setup_sizing",
arguments: {"units","metric"}
)You can list and get Prompts from attached MCP Servers:
from mcp.types import GetPromptResult, PromptMessage
prompt: GetPromptResult = await agent.get_prompt("setup_sizing")
first_message: PromptMessage = prompt[0]and send the native MCP PromptMessage to the agent with:
response: str = agent.send(first_message)If the last message in the conversation is from the
assistant, it is returned as the response.
Prompt.user also works with MCP Resources:
from mcp.types import ReadResourceResult
resource: ReadResourceResult = agent.get_resource(
"resource://images/cat.png", "mcp_server_name"
)
response: str = agent.send(
Prompt.user("What is in this image?", resource)
)Alternatively, use the with_resource convenience method:
response: str = agent.with_resource(
"What is in this image?",
"resource://images/cat.png"
"mcp_server_name",
)Long prompts can be stored in text files, and loaded with the load_prompt utility:
from mcp_agent.mcp.prompts import load_prompt
from mcp.types import PromptMessage
prompt: List[PromptMessage] = load_prompt(Path("two_cities.txt"))
result: str = await agent.send(prompt[0])two_cities.txt
### The Period
It was the best of times, it was the worst of times, it was the age of
wisdom, it was the age of foolishness, it was the epoch of belief, it was
the epoch of incredulity, ...
Prompts files can contain conversations to aid in-context learning or allow you to replay conversations with the Playback LLM:
sizing_conversation.txt
---USER
the moon
---ASSISTANT
object: MOON
size: 3,474.8
units: KM
---USER
the earth
---ASSISTANT
object: EARTH
size: 12,742
units: KM
---USER
how big is a tiger?
---ASSISTANT
object: TIGER
size: 1.2
units: M
Multiple messages (conversations) can be applied with the generate() method:
from mcp_agent.mcp.prompts import load_prompt
from mcp.types import PromptMessage
prompt: List[PromptMessage] = load_prompt(Path("sizing_conversation.txt"))
result: PromptMessageMultipart = await agent.generate(prompt)Conversation files can also be used to include resources:
prompt_secret_plans.txt
---USER
Please review the following documents:
---RESOURCE
secret_plan.pdf
---RESOURCE
repomix.xml
---ASSISTANT
Thank you for those documents, the PDF contains secret plans, and some
source code was attached to achieve those plans. Can I help further?
It is usually better (but not necessary) to use load_prompt_multipart:
from mcp_agent.mcp.prompts import load_prompt_multipart
from mcp_agent.mcp.PromptMessageMultipart
prompt: List[PromptMessageMultipart] = load_prompt_multipart(Path("prompt_secret_plans.txt"))
result: PromptMessageMultipart = await agent.generate(prompt)File Format / MCP Serialization
If the filetype is json, then messages are deserialized using the MCP Prompt schema format. The load_prompt, load_prompt_multipart and prompt-server will load either the text or JSON format directly. See History Saving to learn how to save a conversation to a file for editing or playback.
Prompt files can also be served using the inbuilt prompt-server. The prompt-server command is installed with fast-agent making it convenient to set up and use:
fastagent.config.yaml
mcp:
servers:
prompts:
command: "prompt-server"
args: ["prompt_secret_plans.txt"]
This configures an MCP Server that will serve a prompt_secret_plans MCP Prompt, and secret_plan.pdf and repomix.xml as MCP Resources.
If arguments are supplied in the template file, these are also handled by the prompt-server
prompt_with_args.txt
---USER
Hello {{assistant_name}}, how are you?
---ASSISTANT
Great to meet you {{user_name}} how can I be of assistance?
fast-agent provides flexible deployment options to meet a variety of use cases, from interactive development to production server deployments.
Run fast-agent programs interactively for development, debugging, or direct user interaction.
agent.py
import asyncio
from mcp_agent.core.fastagent import FastAgent
fast = FastAgent("My Interactive Agent")
@fast.agent(instruction="You are a helpful assistant")
async def main():
async with fast.run() as agent:
# Start interactive prompt
await agent()
if __name__ == "__main__":
asyncio.run(main())When started with uv run agent.py, this begins an interactive prompt where you can chat directly with the configured agents, apply prompts, save history and so on.
fast-agent supports command-line arguments to run agents and workflows with specific messages.
# Send a message to a specific agent
uv run agent.py --agent default --message "Analyze this dataset"
# Override the default model
uv run agent.py --model gpt-4o --agent default --message "Complex question"
# Run with minimal output
uv run agent.py --quiet --agent default --message "Background task"
This is perfect for scripting, automation, or one-off queries.
The --quiet flag switches off the Progress, Chat and Tool displays.
Any fast-agent application can be deployed as an MCP (Message Control Protocol) server with a simple command-line switch.
# Start as an SSE server (HTTP)
uv run agent.py --server --transport sse --port 8080
# Start as a stdio server (for piping to other processes)
uv run agent.py --server --transport stdio
Each agent exposes an MCP Tool for sending messages to the agent, and a Prompt that returns the conversation history.
This enables cross-agent state transfer via the MCP Prompts.
The MCP Server can also be started programatically.
import asyncio
from mcp_agent.core.fastagent import FastAgent
fast = FastAgent("Server Agent")
@fast.agent(instruction="You are an API agent")
async def main():
# Start as a server programmatically
await fast.start_server(
transport="sse",
host="0.0.0.0",
port=8080,
server_name="API-Agent-Server",
server_description="Provides API access to my agent"
)
if __name__ == "__main__":
asyncio.run(main())Embed fast-agent into existing Python applications to add MCP agent capabilities.
import asyncio
from mcp_agent.core.fastagent import FastAgent
fast = FastAgent("Embedded Agent")
@fast.agent(instruction="You are a data analysis assistant")
async def analyze_data(data):
async with fast.run() as agent:
result = await agent.send(f"Analyze this data: {data}")
return result
# Use in your application
async def main():
user_data = get_user_data()
analysis = await analyze_data(user_data)
display_results(analysis)
if __name__ == "__main__":
asyncio.run(main())Models in fast-agent are specified with a model string, that takes the format provider.model_name.<reasoning_effort>
Model specifications in fast-agent follow this precedence order (highest to lowest):
- Explicitly set in agent decorators
- Command line arguments with
--modelflag - Default model in
fastagent.config.yaml
Model strings follow this format: provider.model_name.reasoning_effort
- provider: The LLM provider (e.g.,
anthropic,openai,deepseek,generic,openrouter,tensorzero) - model_name: The specific model to use in API calls
- reasoning_effort (optional): Controls the reasoning effort for supported models
Examples:
anthropic.claude-3-7-sonnet-latestopenai.gpt-4oopenai.o3-mini.highgeneric.llama3.2:latestopenrouter.google/gemini-2.5-pro-exp-03-25:freetensorzero.my_tensorzero_function
For models that support it (o1, o1-preview and o3-mini), you can specify a reasoning effort of high, medium or low - for example openai.o3-mini.high. medium is the default if not specified.
For convenience, popular models have an alias set such as gpt-4o or sonnet. These are documented on the LLM Providers page.
You can set a default model for your application in your fastagent.config.yaml:
default_model: "openai.gpt-4o" # Default model for all agents
You can save the conversation history to a file by sending a ***SAVE_HISTORY <filename> message. This can then be reviewed, edited, loaded, or served with the prompt-server or replayed with the playback model.
File Format / MCP Serialization
If the filetype is json, then messages are serialized/deserialized using the MCP Prompt schema. The load_prompt, load_prompt_multipart and prompt-server will load either the text or JSON format directly.
This can be helpful when developing applications to:
- Save a conversation for editing
- Set up in-context learning
- Produce realistic test scenarios to exercise edge conditions etc. with the Playback model
fast-agent comes with two internal models to aid development and testing: passthrough and playback.
By default, the passthrough model echos messages sent to it.
By sending a ***FIXED_RESPONSE <message> message, the model will return <message> to any request.
By sending a ***CALL_TOOL <tool_name> [<json>] message, the model will call the specified MCP Tool, and return a string containing the results.
The playback model replays the first conversation sent to it. A typical usage may look like this:
playback.txt
---USER
Good morning!
---ASSISTANT
Hello
---USER
Generate some JSON
---ASSISTANT
{
"city": "London",
"temperature": 72
}
This can then be used with the prompt-server you can apply the MCP Prompt to the agent, either programatically with apply_prompt or with the /prompts command in the interactive shell.
Alternatively, you can load the file with load_message_multipart.
JSON contents can be converted to structured outputs:
@fast.agent(name="playback",model="playback")
...
playback_messages: List[PromptMessageMultipart] = load_message_multipart(Path("playback.txt"))
# Set up the Conversation
assert ("HISTORY LOADED") == agent.playback.generate(playback_messages)
response: str = agent.playback.send("Good morning!") # Returns Hello
temperature, _ = agent.playback.structured("Generate some JSON")When the playback runs out of messages, it returns MESSAGES EXHAUSTED (list size [a]) ([b] overage).
List size is the total number of messages originally loaded, overage is the number of requests made after exhaustion.
For each model provider, you can configure parameters either through environment variables or in your fastagent.config.yaml file.
Be sure to run fast-agent check to troubleshoot API Key issues:
In your fastagent.config.yaml:
<provider>:
api_key: "your_api_key" # Override with API_KEY env var
base_url: "https://api.example.com" # Base URL for API calls
Anthropic models support Text, Vision and PDF content.
YAML Configuration:
anthropic:
api_key: "your_anthropic_key" # Required
base_url: "https://api.anthropic.com/v1" # Default, only include if required
Environment Variables:
ANTHROPIC_API_KEY: Your Anthropic API keyANTHROPIC_BASE_URL: Override the API endpoint
Model Name Aliases:
| Model Alias | Maps to | Model Alias | Maps to | | --- | --- | --- | --- | | claude | claude-3-7-sonnet-latest | haiku | claude-3-5-haiku-latest | | sonnet | claude-3-7-sonnet-latest | haiku3 | claude-3-haiku-20240307 | | sonnet35 | claude-3-5-sonnet-latest | haiku35 | claude-3-5-haiku-latest | | sonnet37 | claude-3-7-sonnet-latest | opus | claude-3-opus-latest | | opus3 | claude-3-opus-latest | | |
fast-agent supports OpenAI gpt-4.1, gpt-4.1-mini, o1-preview, o1 and o3-mini models. Arbitrary model names are supported with openai.<model_name>. Supported modalities are model-dependent, check the OpenAI Models Page for the latest information.
Structured outputs use the OpenAI API Structured Outputs feature.
Future versions of fast-agent will have enhanced model capability handling.
YAML Configuration:
openai:
api_key: "your_openai_key" # Default
base_url: "https://api.openai.com/v1" # Default, only include if required
Environment Variables:
OPENAI_API_KEY: Your OpenAI API keyOPENAI_BASE_URL: Override the API endpoint
Model Name Aliases:
| Model Alias | Maps to | Model Alias | Maps to | | --- | --- | --- | --- | | gpt-4o | gpt-4o | gpt-4.1 | gpt-4.1 | | gpt-4o-mini | gpt-4o-mini | gpt-4.1-mini | gpt-4.1-mini | | o1 | o1 | gpt-4.1-nano | gpt-4.1-nano | | o1-mini | o1-mini | o1-preview | o1-preview | | o3-mini | o3-mini | | |
DeepSeek v3 is supported for Text and Tool calling.
YAML Configuration:
deepseek:
api_key: "your_deepseek_key"
base_url: "https://api.deepseek.com/v1"
Environment Variables:
DEEPSEEK_API_KEY: Your DeepSeek API keyDEEPSEEK_BASE_URL: Override the API endpoint
Model Name Aliases:
| Model Alias | Maps to | | --- | --- | | deepseek | deepseek-chat | | deepseek3 | deepseek-chat |
Google is currently supported through the OpenAI compatibility endpoint, with first-party support planned soon.
YAML Configuration:
google:
api_key: "your_google_key"
base_url: "https://generativelanguage.googleapis.com/v1beta/openai"
Environment Variables:
GOOGLE_API_KEY: Your Google API key
Model Name Aliases:
None mapped
Models prefixed with generic will use a generic OpenAI endpoint, with the defaults configured to work with Ollama OpenAI compatibility.
This means that to run Llama 3.2 latest you can specify generic.llama3.2:latest for the model string, and no further configuration should be required.
Warning
The generic provider is tested for tool calling and structured generation with qwen2.5:latest and llama3.2:latest. Other models and configurations may not work as expected - use at your own risk.
YAML Configuration:
generic:
api_key: "ollama" # Default for Ollama, change as needed
base_url: "http://localhost:11434/v1" # Default for Ollama
Environment Variables:
GENERIC_API_KEY: Your API key (defaults toollamafor Ollama)GENERIC_BASE_URL: Override the API endpoint
Usage with other OpenAI API compatible providers: By configuring the base_url and appropriate api_key, you can connect to any OpenAI API-compatible provider.
Uses the OpenRouter aggregation service. Models are accessed via an OpenAI-compatible API. Supported modalities depend on the specific model chosen on OpenRouter.
Models must be specified using the openrouter. prefix followed by the full model path from OpenRouter (e.g., openrouter.google/gemini-flash-1.5).
Warning
There is an issue with between OpenRouter and Google Gemini models causing large Tool Call block content to be removed.
YAML Configuration:
openrouter:
api_key: "your_openrouter_key" # Required
base_url: "https://openrouter.ai/api/v1" # Default, only include to override
Environment Variables:
OPENROUTER_API_KEY: Your OpenRouter API keyOPENROUTER_BASE_URL: Override the API endpoint
Model Name Aliases:
OpenRouter does not use aliases in the same way as Anthropic or OpenAI. You must always use the openrouter.provider/model-name format.
TensorZero is an open-source framework for building production-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluations, and experimentation.
At the moment, you must run the TensorZero Gateway as a separate service (e.g. using Docker). See the TensorZero Quick Start and the TensorZero Gateway Deployment Guide for more information on how to deploy the TensorZero Gateway.
You can call a function defined in your TensorZero configuration (tensorzero.toml) with fast-agent by prefixing the function name with tensorzero. (e.g. tensorzero.my_function_name).
YAML Configuration:
tensorzero:
base_url: "http://localhost:3000" # Optional, only include to override
Environment Variables:
None (model provider credentials should be provided to the TensorZero Gateway instead)
MCP Servers are configured in the fastagent.config.yaml file. Secrets can be kept in fastagent.secrets.yaml, which follows the same format (fast-agent merges the contents of the two files).
The below shows an example of configuring an MCP Server named server_one.
fastagent.config.yaml
mcp:
# name used in agent servers array
server_one:
# command to run
command: "npx"
# list of arguments for the command
args: ["@modelcontextprotocol/server-brave-search"]
# key/value pairs of environment variables
env:
BRAVE_API_KEY: your_key
KEY: value
server_two:
# and so on ...
This MCP Server can then be used with an agent as follows:
@fast.agent(name="Search", servers=["server_one"])To use SSE Servers, specify the sse transport and specify the endpoint URL and headers:
fastagent.config.yaml
mcp:
# name used in agent servers array
server_two:
transport: "sse"
# url to connect
url: "http://localhost:8000/sse"
# timeout in seconds to use for sse sessions (optional)
read_transport_sse_timeout_seconds: 300
# request headers for connection
headers:
Authorization: "Bearer <secret>"
fast-agent supports MCP Roots. Roots are configured on a per-server basis:
fastagent.config.yaml
mcp:
server_three:
transport: "sse"
url: "http://localhost:8000/sse"
roots:
uri: "file://...."
name: Optional Name
server_uri_alias: # optional
As per the MCP specification roots MUST be a valid URI starting with file://.
If a server_uri_alias is supplied, fast-agent presents this to the MCP Server. This allows you to present a consistent interface to the MCP Server. An example of this usage would be mounting a local directory to a docker volume, and presenting it as /mnt/data to the MCP Server for consistency.
The data analysis example (fast-agent quickstart data-analysis has a working example of MCP Roots).
Sampling is configured by specifying a sampling model for the MCP Server.
fastagent.config.yaml
mcp:
server_four:
transport: "sse"
url: "http://localhost:8000/sse"
sampling:
model: "provider.model.<reasoning_effort>"
Read more about The model string and settings here. Sampling requests support vision - try @llmindset/mcp-webcam for an example.
Below are some recommended resources for developing with the Model Context Protocol (MCP):
| Resource | Description | | --- | --- | | Working with Files and Resources | Examining the options MCP Server and Host developers have for sharing rich content | | PulseMCP Community | A community focussed site offering news, up-to-date directories and use-cases of MCP Servers | | Basic Memory | High quality, markdown based knowledge base for LLMs - also good for Agent development | | Repomix | Create LLM Friendly files from folders or directly from GitHub. Include as an MCP Server - or run from a script prior to create Agent inputs | | PromptMesh Tools | High quality tools and libraries at the cutting edge of MCP development | | mcp-hfspace | Seamlessly connect to hundreds of Open Source models including Image and Audio generators and more | | wong2 mcp-cli | A fast, lightweight, command line alternative to the official MCP Inspector |
In this quick start, we'll demonstrate how fast-agent can transfer state between two agents using MCP Prompts.
First, we'll start agent_one as an MCP Server, and send it some messages with the MCP Inspector tool.
Next, we'll run agent_two and transfer the conversation from agent_one using an MCP Prompt.
Finally, we'll take a look at fast-agent's prompt-server and how it can assist building agent applications
You'll need API Keys to connect to a supported model, or use Ollama's OpenAI compatibility mode to use local models.
The quick start also uses the MCP Inspector - check here for installation instructions.
# create, and change to a new directory
mkdir fast-agent && cd fast-agent
# create and activate a python environment
uv venv
source .venv/bin/activate
# setup fast-agent
uv pip install fast-agent-mcp
# create the state transfer example
fast-agent quickstart state-transfer
# create, and change to a new directory
md fast-agent |cd
# create and activate a python environment
uv venv
.venv\Scripts\activate
# setup fast-agent
uv pip install fast-agent-mcp
# create the state transfer example
fast-agent quickstart state-transfer
Change to the state-transfer directory (cd state-transfer), rename fastagent.secrets.yaml.example to fastagent.secrets.yaml and enter the API Keys for the providers you wish to use.
The supplied fastagent.config.yaml file contains a default of gpt-4o - edit this if you wish.
Finally, run uv run agent_one.py and send a test message to make sure that everything working. Enter stop to return to the command line.
To start "agent_one" as an MCP Server, run the following command:
# start agent_one as an MCP Server:
uv run agent_one.py --server --transport sse --port 8001
# start agent_one as an MCP Server:
uv run agent_one.py --server --transport sse --port 8001
The agent is now available as an MCP Server.
Note
This example starts the server on port 8001. To use a different port, update the URLs in fastagent.config.yaml and the MCP Inspector.
From another command line, run the Model Context Protocol inspector to connect to the agent:
# run the MCP inspector
npx @modelcontextprotocol/inspector
# run the MCP inspector
npx @modelcontextprotocol/inspector
Choose the SSE transport type, and the url http://localhost:8001/sse. After clicking the connect button, you can interact with the agent from the tools tab. Use the agent_one_send tool to send the agent a chat message and see it's response.
The conversation history can be viewed from the prompts tab. Use the agent_one_history prompt to view it.
Disconnect the Inspector, then press ctrl+c in the command window to stop the process.
We can now transfer and continue the conversation with agent_two.
Run agent_two with the following command:
# start agent_two as an MCP Server:
uv run agent_two.py
# start agent_two as an MCP Server:
uv run agent_two.py
Once started, type '/prompts' to see the available prompts. Select 1 to apply the Prompt from agent_one to agent_two, transferring the conversation context.
You can now continue the chat with agent_two (potentially using different Models, MCP Tools or Workflow components).
fast-agent uses the following configuration file to connect to the agent_one MCP Server:
fastagent.config.yaml
# MCP Servers
mcp:
servers:
agent_one:
transport: sse
url: http://localhost:8001
agent_two then references the server in it's definition:
# Define the agent
@fast.agent(name="agent_two",
instruction="You are a helpful AI Agent",
servers=["agent_one"])
async def main():
# use the --model command line switch or agent arguments to change model
async with fast.run() as agent:
await agent.interactive()fast-agent gives you the ability to save and reload conversations.
Enter ***SAVE_HISTORY history.json in the agent_two chat to save the conversation history in MCP GetPromptResult format.
You can also save it in a text format for easier editing.
By using the supplied MCP prompt-server, we can reload the saved prompt and apply it to our agent. Add the following to your fastagent.config.yaml file:
# MCP Servers
mcp:
servers:
prompts:
command: prompt-server
args: ["history.json"]
agent_one:
transport: sse
url: http://localhost:8001
And then update agent_two.py to use the new server:
# Define the agent
@fast.agent(name="agent_two",
instruction="You are a helpful AI Agent",
servers=["prompts"])Run uv run agent_two.py, and you can then use the /prompts command to load the earlier conversation history, and continue where you left off.
Note that Prompts can contain any of the MCP Content types, so Images, Audio and other Embedded Resources can be included.
You can also use the Playback LLM to replay an earlier chat (useful for testing!)
FastAgent is built to seamlessly integrate with the MCP SDK type system:
Conversations with assistants are based on PromptMessageMultipart - an extension the the mcp PromptMessage type, with support for multiple content sections. This type is expected to become native in a future version of MCP: modelcontextprotocol/modelcontextprotocol#198
FastAgent makes it easy to transfer conversation history between agents:
history_transfer.py
@fast.agent(name="haiku", model="haiku")
@fast.agent(name="openai", model="o3-mini.medium")
async def main() -> None:
async with fast.run() as agent:
# Start an interactive session with "haiku"
await agent.prompt(agent_name="haiku")
# Transfer the message history top "openai" (using PromptMessageMultipart)
await agent.openai.generate(agent.haiku.message_history)
# Continue the conversation
await agent.prompt(agent_name="openai")