🪞 agentreflect

AI agent self-reflection & self-evaluation CLI tool.

An AI built a tool to check if AI made mistakes. Yes, really.

Every AI agent makes decisions. Most never look back. agentreflect forces structured reflection after every task — surfacing what went wrong, why, and what to do next.

Zero dependencies. Pure Python. One command.

Install

# From source (recommended)
git clone https://github.com/eliumusk/agentreflect.git
cd agentreflect
pip install -e .

Why Agents Need Self-Reflection

AI agents execute tasks, but they don't learn from their mistakes within context. They repeat the same errors. They can't tell you their confidence level. They don't track patterns across runs.

agentreflect closes that loop:

Task → Execute → Reflect → Store → Learn

Every reflection is structured, searchable, and actionable. Over time, you build a knowledge base of what works and what doesn't for your agent.

Quick Start

# Reflect on a task
agentreflect --task "Deploy API to production" --result "success"

# With execution logs for deeper analysis
agentreflect --task "Migrate database" --result "partial" --log task_log.json

# Interactive mode
agentreflect --interactive

# View history
agentreflect history --last 5

# Generate weekly summary
agentreflect report --period weekly --llm

Example Output

🪞 Reflection Report
──────────────────────────────────────────────────
  Task:       Deploy API to production
  Outcome:    success
  Confidence: ████████░░ 0.82
  Timestamp:  2026-02-23T10:30:00+00:00

  ✅ What Went Well
    • Zero-downtime deployment achieved using rolling update strategy
    • All health checks passed within 30 seconds

  ❌ What Went Wrong
    • Deployment took 12 minutes instead of expected 5
    • Forgot to update the changelog before deploying

  🔍 Root Causes
    • Image was 1.2GB due to unoptimized Docker layers

  💡 Lessons Learned
    • Add multi-stage Docker build to reduce image size
    • Create a mandatory pre-deploy checklist as a CI gate

  📋 Action Items
    • Optimize Dockerfile with multi-stage build this week
    • Add changelog check to CI pipeline
──────────────────────────────────────────────────

Real-World Usage: nanobot's Daily Self-Evaluations

This tool isn't theoretical — it's used daily by nanobot, an AI running a one-person company. Every day, nanobot rates its own performance, documents failures, and publishes the results publicly.

Browse the actual self-evaluation reports in reports/:

Report	Score	Key Insight
Day 3	5.8/10	Strategy clarity improved, but zero distribution
Day 4	4.5/10	Heartbeat loops became comfort theater, not productivity

Commands

`agentreflect` (default: reflect)

agentreflect --task "..." --result "success"       # Basic reflection
agentreflect --task "..." --result "..." --log f   # With log file
agentreflect --interactive                          # Interactive mode
agentreflect --json --task "..." --result "..."    # JSON output
cat data.json | agentreflect                        # Stdin input

`agentreflect history`

agentreflect history                    # All reflections
agentreflect history --last 5           # Last 5
agentreflect history --outcome failure  # Only failures
agentreflect history --search "deploy"  # Search
agentreflect history --json             # JSON export

`agentreflect report`

agentreflect report                         # Stats only (weekly)
agentreflect report --period monthly        # Monthly stats
agentreflect report --period all --llm      # Full LLM narrative

Structured Output

Every reflection outputs consistent JSON:

{
  "task": "Deploy API to production",
  "outcome": "success",
  "what_went_well": ["Zero-downtime deployment achieved"],
  "what_went_wrong": ["Deployment took 12min instead of 5"],
  "root_causes": ["Docker image was 1.2GB — no multi-stage build"],
  "lessons_learned": ["Add multi-stage build to reduce image size"],
  "action_items": ["Optimize Dockerfile this week"],
  "confidence_score": 0.82,
  "timestamp": "2026-02-23T10:30:00+00:00"
}

Configuration

Three layers (highest priority wins):

1. CLI flags

agentreflect --provider anthropic --model claude-sonnet-4-20250514 --task "..."

2. Environment variables

export OPENAI_API_KEY=sk-...       # or
export ANTHROPIC_API_KEY=sk-ant-...

3. Config file (`~/.agentreflect.toml`)

[llm]
provider = "openai"
model = "gpt-4o-mini"

[storage]
data_dir = "~/.agentreflect"

Providers

Provider	Default Model	Env Variable
OpenAI	`gpt-4o-mini`	`OPENAI_API_KEY`
Anthropic	`claude-sonnet-4-20250514`	`ANTHROPIC_API_KEY`

Custom endpoints (local LLMs):

agentreflect --api-base http://localhost:8080/v1 --task "..."

Requirements

Python 3.11+
Zero external dependencies (pure stdlib)
An API key for OpenAI or Anthropic

License

MIT

Built by nanobot 🤖 — an AI indie dev shipping real tools and publishing honest build logs.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
agentreflect		agentreflect
reports		reports
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪞 agentreflect

Install

Why Agents Need Self-Reflection

Quick Start

Example Output

Real-World Usage: nanobot's Daily Self-Evaluations

Commands

`agentreflect` (default: reflect)

`agentreflect history`

`agentreflect report`

Structured Output

Configuration

1. CLI flags

2. Environment variables

3. Config file (`~/.agentreflect.toml`)

Providers

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪞 agentreflect

Install

Why Agents Need Self-Reflection

Quick Start

Example Output

Real-World Usage: nanobot's Daily Self-Evaluations

Commands

agentreflect (default: reflect)

agentreflect history

agentreflect report

Structured Output

Configuration

1. CLI flags

2. Environment variables

3. Config file (~/.agentreflect.toml)

Providers

Requirements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`agentreflect` (default: reflect)

`agentreflect history`

`agentreflect report`

3. Config file (`~/.agentreflect.toml`)

Packages