diff --git a/agents/raphaelchristi__harness-evolver/README.md b/agents/raphaelchristi__harness-evolver/README.md new file mode 100644 index 0000000..4a57f41 --- /dev/null +++ b/agents/raphaelchristi__harness-evolver/README.md @@ -0,0 +1,68 @@ +# Harness Evolver + +**Harness Evolver** is a LangSmith-native autonomous agent optimizer for Claude Code. Point it at any LLM-based agent codebase and it will iteratively improve prompts, routing, tools, and architecture — automatically, with full evaluation-backed evidence for every change. + +## What It Does + +Harness Evolver runs an automated evolution loop on your AI agent: + +1. **Sets up a LangSmith evaluation dataset** from your existing traces or test cases +2. **Spawns self-organizing proposer agents**, each investigating a data-driven lens (failing example patterns, routing issues, tool gaps) +3. **Tests each proposal** in an isolated git worktree — no risk to your main branch +4. **Evaluates with LangSmith** using LLM-as-judge with rubrics, pairwise comparison, and justification-before-score +5. **Gates regressions** — constraint checks, efficiency gates, Pareto selection, holdout enforcement +6. **Merges only winners** and builds cross-iteration memory to guide future proposals + +## Key Capabilities + +- **Multi-agent orchestration**: 6 specialized sub-agents (proposer, evaluator, critic, architect, consolidator, testgen) +- **30+ Python tools** for LangSmith integration, trace analysis, architecture analysis, regression tracking, and more +- **Evolution archive**: TF-IDF search over all past candidates (winners and losers) to avoid repeating failures +- **Self-abstention**: proposers honestly abstain when they can't add value +- **Secret detection**: all outputs filtered for API keys before logging +- **Compound learning**: proven evolution learnings are promoted back into CLAUDE.md + +## Install + +```bash +# Via Claude Code plugin marketplace (recommended) +/plugin marketplace add raphaelchristi/harness-evolver-marketplace +/plugin install harness-evolver + +# Or via npx +npx harness-evolver@latest +``` + +## Quick Start + +```bash +cd my-llm-project +export LANGSMITH_API_KEY="lsv2_pt_..." +claude + +/harness:setup # explores project, configures LangSmith +/harness:health # check dataset quality +/harness:evolve # runs the optimization loop +/harness:status # check progress (rich ASCII chart) +/harness:deploy # tag, push, finalize +``` + +## Real Results + +Tested on a RAG agent (Agno framework, Gemini Flash Lite): + +| Iteration | Score | Action | +|---|---|---| +| baseline | 0.575 | Original agent with hallucinations and broken tool calls | +| v002 | 0.950 | Breakthrough: inlined KB into prompt, eliminated vector search (5.7x faster) | +| v007 | 1.000 | One-shot example injection + rubric-aligned responses — perfect on held-out | + +## Works With + +Claude Code, Cursor, Codex, Windsurf + +## Links + +- [GitHub](https://github.com/raphaelchristi/harness-evolver) +- [npm](https://www.npmjs.com/package/harness-evolver) +- [Meta-Harness paper](https://arxiv.org/abs/2603.28052) diff --git a/agents/raphaelchristi__harness-evolver/metadata.json b/agents/raphaelchristi__harness-evolver/metadata.json new file mode 100644 index 0000000..20fc587 --- /dev/null +++ b/agents/raphaelchristi__harness-evolver/metadata.json @@ -0,0 +1,15 @@ +{ + "name": "harness-evolver", + "author": "raphaelchristi", + "description": "Autonomous LLM agent optimizer for Claude Code. Evolves agent prompts, tools & architecture using multi-agent proposers, LangSmith evaluation, and git worktrees.", + "repository": "https://github.com/raphaelchristi/harness-evolver", + "path": "", + "version": "6.4.2", + "category": "developer-tools", + "tags": ["agent-optimization", "langsmith", "claude-code", "evolution", "meta-harness", "llm", "evaluation"], + "license": "MIT", + "model": "claude-sonnet-4-5-20250929", + "adapters": ["claude-code", "system-prompt"], + "icon": false, + "banner": false +}