Agentic Context Engineering: Complete Guide

How ACE enables AI agents to improve through in-context learning instead of fine-tuning.

What is Agentic Context Engineering?

Agentic Context Engineering (ACE) is a framework introduced by researchers at Stanford University and SambaNova Systems that enables AI agents to improve performance by dynamically curating their own context through execution feedback.

Key Innovation: Instead of updating model weights through expensive fine-tuning cycles, ACE treats context as a living "skillbook" that evolves based on what strategies actually work in practice.

Research Paper: Agentic Context Engineering (arXiv:2510.04618)

The Core Problem

Modern AI agents face a fundamental limitation: they don't learn from execution history. When an agent makes a mistake, developers must manually intervene—editing prompts, adjusting parameters, or fine-tuning the model.

Traditional approaches have major drawbacks:

Repetitive failures: Agents lack institutional memory
Manual intervention: Doesn't scale as complexity increases
Expensive adaptation: Fine-tuning costs $10,000+ per cycle and takes weeks
Black box improvement: Unclear what changed or why

How ACE Works

ACE introduces a three-agent architecture where specialized roles collaborate to build and maintain a dynamic knowledge base called the "skillbook."

The Three Agents

1. Agent - Task Execution

Performs the actual work using strategies from the skillbook
Operates like a traditional agent but with access to learned knowledge

2. Reflector - Performance Analysis

Analyzes execution outcomes without human supervision
Identifies which strategies worked, which failed, and why
Generates insights that inform skillbook updates

3. SkillManager - Knowledge Management

Adds new strategies based on successful executions
Removes or marks strategies that consistently fail
Merges semantically similar strategies to prevent redundancy

The Skillbook

The skillbook stores learned strategies as structured "skills"—discrete pieces of knowledge with metadata:

{
  "content": "When querying financial data, filter by date range first to reduce result set size",
  "helpful_count": 12,
  "harmful_count": 1,
  "section": "task_guidance"
}

The Learning Cycle

Execution: Agent receives a task and retrieves relevant skillbook skills
Action: Agent executes using retrieved strategies
Reflection: Reflector analyzes the execution outcome
Curation: SkillManager updates the skillbook with update operations
Iteration: Process repeats, skillbook grows more refined over time

Insight Levels

The Reflector can analyze execution at three different levels of scope, producing insights of varying depth:

Level	Scope	What's Analyzed	Learning Quality
Micro	Single interaction + environment	Request → response → ground truth/feedback	Learns from correctness
Meso	Full agent run	Reasoning traces (thoughts, tool calls, observations)	Learns from execution patterns
Macro	Cross-run analysis	Patterns across multiple executions	Comprehensive (future)

Micro-level insights come from the full ACE adaptation loop with environment feedback and ground truth. The Reflector knows whether the answer was correct and learns from that evaluation. Used by OfflineACE and OnlineACE.

Meso-level insights come from full agent runs with intermediate steps—the agent's thoughts, tool calls, and observations—but without external ground truth. The Reflector learns from the execution patterns themselves. Used by integration wrappers like ACELangChain with AgentExecutor.

Macro-level insights (future) will compare patterns across multiple runs to identify systemic improvements.

Key Technical Innovations

Update Operations (Preventing Context Collapse)

A critical insight from the ACE paper: LLMs exhibit brevity bias when asked to rewrite context. They compress information, losing crucial details.

ACE solves this through update operations—incremental modifications that never ask the LLM to regenerate entire contexts:

Add: Insert new skill to skillbook
Remove: Delete specific skill by ID
Modify: Update specific fields (helpful_count, content refinement)

This preserves the exact wording and structure of learned knowledge.

Semantic Deduplication

As agents learn, they may generate similar but differently-worded strategies. ACE prevents skillbook bloat through embedding-based deduplication, keeping the skillbook concise while capturing diverse knowledge.

Hybrid Retrieval

Instead of dumping the entire skillbook into context, ACE uses hybrid retrieval to select only the most relevant skills. This:

Keeps context windows manageable
Prioritizes proven strategies
Reduces token costs

Async Learning Mode

For latency-sensitive applications, ACE supports async learning where the Agent returns immediately while Reflector and SkillManager process in the background:

┌───────────────────────────────────────────────────────────────────────┐
│                       ASYNC LEARNING PIPELINE                         │
├───────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  Sample 1 ──► Agent ──► Env ──► Reflector ─┐                         │
│  Sample 2 ──► Agent ──► Env ──► Reflector ─┼──► Queue ──► SkillManager│
│  Sample 3 ──► Agent ──► Env ──► Reflector ─┘           (serialized)   │
│             (parallel)        (parallel)                              │
│                                                                       │
└───────────────────────────────────────────────────────────────────────┘

Why this architecture:

Parallel Reflectors: Safe to parallelize (read-only analysis, no skillbook writes)
Serialized SkillManager: Must be sequential (writes to skillbook, handles deduplication)
3x faster learning: Reflector LLM calls run concurrently

Usage:

adapter = OfflineACE(
    skillbook=skillbook,
    agent=agent,
    reflector=reflector,
    skill_manager=skill_manager,
    async_learning=True,        # Enable async mode
    max_reflector_workers=3,    # Parallel Reflector threads
)

results = adapter.run(samples, environment)  # Fast - learning in background

# Control methods
adapter.learning_stats       # Check progress
adapter.wait_for_learning()  # Block until complete
adapter.stop_async_learning() # Shutdown pipeline

Performance Results

The Stanford team evaluated ACE across multiple benchmarks:

AppWorld Agent Benchmark:

+17.1 percentage points improvement vs. base LLM (≈40% relative improvement)
Tested on complex multi-step tasks requiring tool use and reasoning

Finance Domain (FiNER):

+8.6 percentage points improvement on financial reasoning tasks

Adaptation Efficiency:

86.9% lower adaptation latency compared to existing context-adaptation methods

Key Insight: Performance improvements compound over time. As the skillbook grows, agents make fewer mistakes on similar tasks, creating a positive feedback loop.

When to Use ACE

Best Fit Use Cases

Software Development Agents

Learn project-specific patterns (naming conventions, error handling)
Build knowledge of common bugs and solutions
Accumulate code review guidelines

Customer Support Automation

Learn which issues need human escalation
Discover effective communication patterns
Build institutional knowledge of edge cases

Data Analysis Agents

Learn efficient query patterns
Discover which visualizations work for which data types
Build baseline expectations from execution history

Research Assistants

Learn effective search strategies per domain
Discover citation patterns and summarization techniques
Build knowledge of reliable sources

When NOT to Use ACE

ACE may not be the right fit when:

Single-use tasks: No benefit from learning if task never repeats
Perfect first-time execution required: ACE learns through iteration
Purely factual retrieval: Traditional RAG may be more appropriate

ACE vs. Other Approaches

vs. Fine-Tuning

Aspect	ACE	Fine-Tuning
Speed	Immediate (after single execution)	Days to weeks
Cost	Inference only	$10K+ per iteration
Interpretability	Readable skillbook	Black box weights
Reversibility	Edit/remove strategies easily	Requires retraining

vs. RAG

Aspect	ACE	RAG
Knowledge Source	Learned from execution	Static documents
Update Mechanism	Autonomous skill updates	Manual updates
Content Type	Strategies, patterns	Facts, references
Optimization	Self-improving	Requires query tuning

Getting Started

Ready to build self-learning agents? Check out these resources:

Quick Start Guide - Get running in 5 minutes
Integration Guide - Add ACE to existing agents
API Reference - Complete API documentation
Examples - Ready-to-run code examples

Additional Resources

Research

Original ACE Paper (arXiv)

Community

Last Updated: November 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic Context Engineering: Complete Guide

What is Agentic Context Engineering?

The Core Problem

How ACE Works

The Three Agents

The Skillbook

The Learning Cycle

Insight Levels

Key Technical Innovations

Update Operations (Preventing Context Collapse)

Semantic Deduplication

Hybrid Retrieval

Async Learning Mode

Performance Results

When to Use ACE

Best Fit Use Cases

When NOT to Use ACE

ACE vs. Other Approaches

vs. Fine-Tuning

vs. RAG

Getting Started

Additional Resources

Research

Community

FilesExpand file tree

COMPLETE_GUIDE_TO_ACE.md

Latest commit

History

COMPLETE_GUIDE_TO_ACE.md

File metadata and controls

Agentic Context Engineering: Complete Guide

What is Agentic Context Engineering?

The Core Problem

How ACE Works

The Three Agents

The Skillbook

The Learning Cycle

Insight Levels

Key Technical Innovations

Update Operations (Preventing Context Collapse)

Semantic Deduplication

Hybrid Retrieval

Async Learning Mode

Performance Results

When to Use ACE

Best Fit Use Cases

When NOT to Use ACE

ACE vs. Other Approaches

vs. Fine-Tuning

vs. RAG

Getting Started

Additional Resources

Research

Community