One Command (oh) to Launch OpenHarness and Unlock All Agent Harnesses.
Supports CLI agent integration including OpenClaw, nanobot, Cursor, and more.
| Claude Code | OpenHarness | |
|---|---|---|
| Lines of Code | 512,664 | 11,733 (44x lighter) |
| Files | 1,884 | 163 |
| Language | TypeScript | Python |
| Tools | ~44 | 43 (98%) |
| Commands | ~88 | 54 (61%) |
| Skills Compatible | β | β anthropics/skills |
| Plugin Compatible | β | β claude-code/plugins |
| Tests | β | 114 unit + 6 E2E suites |
Leverages Python's power with pure focus on Harness architectureβstripped of enterprise overhead like telemetry, OAuth complexity, and hundreds of React components.
An Agent Harness is the complete infrastructure that wraps around an LLM to make it a functional agent. The model provides intelligence; the harness provides hands, eyes, memory, and safety boundaries.
OpenHarness is an open-source Python implementation designed for researchers, builders, and the community:
- Understand how production AI agents work under the hood
- Experiment with cutting-edge tools, skills, and agent coordination patterns
- Extend the harness with custom plugins, providers, and domain knowledge
- Build specialized agents on top of proven architecture
β’ Streaming Tool-Call Cycle β’ API Retry with Exponential Backoff β’ Parallel Tool Execution β’ Token Counting & Cost Tracking |
β’ 43 Tools (File, Shell, Search, Web, MCP) β’ On-Demand Skill Loading (.md) β’ Plugin Ecosystem (Skills + Hooks + Agents) β’ Compatible with anthropics/skills & plugins |
β’ CLAUDE.md Discovery & Injection β’ Context Compression (Auto-Compact) β’ MEMORY.md Persistent Memory β’ Session Resume & History |
β’ Multi-Level Permission Modes β’ Path-Level & Command Rules β’ PreToolUse / PostToolUse Hooks β’ Interactive Approval Dialogs |
β’ Subagent Spawning & Delegation β’ Team Registry & Task Management β’ Background Task Lifecycle β’ ClawTeam Integration (Roadmap) |
- 2026-04-01 π¨ v0.1.0 β Initial OpenHarness open-source release featuring complete Harness architecture:
- Python 3.11+ and uv
- Node.js 18+ (for the React terminal UI)
- An LLM API key
# Clone and install
git clone https://github.com/HKUDS/OpenHarness.git
cd OpenHarness
uv sync --extra dev
# Example: use Kimi as the backend
export ANTHROPIC_BASE_URL=https://api.moonshot.cn/anthropic
export ANTHROPIC_API_KEY=your_kimi_api_key
export ANTHROPIC_MODEL=kimi-k2.5
# Launch
oh # if venv is activated
uv run oh # without activating venv# Single prompt β stdout
oh -p "Explain this codebase"
# JSON output for programmatic use
oh -p "List all functions in main.py" --output-format json
# Stream JSON events in real-time
oh -p "Fix the bug" --output-format stream-jsonOpenHarness implements the core Agent Harness pattern with 10 subsystems:
openharness/
engine/ # π§ Agent Loop β query β stream β tool-call β loop
tools/ # π§ 43 Tools β file I/O, shell, search, web, MCP
skills/ # π Knowledge β on-demand skill loading (.md files)
plugins/ # π Extensions β commands, hooks, agents, MCP servers
permissions/ # π‘οΈ Safety β multi-level modes, path rules, command deny
hooks/ # β‘ Lifecycle β PreToolUse/PostToolUse event hooks
commands/ # π¬ 54 Commands β /help, /commit, /plan, /resume, ...
mcp/ # π MCP β Model Context Protocol client
memory/ # π§ Memory β persistent cross-session knowledge
tasks/ # π Tasks β background task management
coordinator/ # π€ Multi-Agent β subagent spawning, team coordination
prompts/ # π Context β system prompt assembly, CLAUDE.md, skills
config/ # βοΈ Settings β multi-layer config, migrations
ui/ # π₯οΈ React TUI β backend protocol + frontend
The heart of the harness. One loop, endlessly composable:
while True:
response = await api.stream(messages, tools)
if response.stop_reason != "tool_use":
break # Model is done
for tool_call in response.tool_uses:
# Permission check β Hook β Execute β Hook β Result
result = await harness.execute_tool(tool_call)
messages.append(tool_results)
# Loop continues β model sees results, decides next actionThe model decides what to do. The harness handles how β safely, efficiently, with full observability.
| Category | Tools | Description |
|---|---|---|
| File I/O | Bash, Read, Write, Edit, Glob, Grep | Core file operations with permission checks |
| Search | WebFetch, WebSearch, ToolSearch, LSP | Web and code search capabilities |
| Notebook | NotebookEdit | Jupyter notebook cell editing |
| Agent | Agent, SendMessage, TeamCreate/Delete | Subagent spawning and coordination |
| Task | TaskCreate/Get/List/Update/Stop/Output | Background task management |
| MCP | MCPTool, ListMcpResources, ReadMcpResource | Model Context Protocol integration |
| Mode | EnterPlanMode, ExitPlanMode, Worktree | Workflow mode switching |
| Schedule | CronCreate/List/Delete, RemoteTrigger | Scheduled and remote execution |
| Meta | Skill, Config, Brief, Sleep, AskUser | Knowledge loading, configuration, interaction |
Every tool has:
- Pydantic input validation β structured, type-safe inputs
- Self-describing JSON Schema β models understand tools automatically
- Permission integration β checked before every execution
- Hook support β PreToolUse/PostToolUse lifecycle events
Skills are on-demand knowledge β loaded only when the model needs them:
Available Skills:
- commit: Create clean, well-structured git commits
- review: Review code for bugs, security issues, and quality
- debug: Diagnose and fix bugs systematically
- plan: Design an implementation plan before coding
- test: Write and run tests for code
- simplify: Refactor code to be simpler and more maintainable
- pdf: PDF processing with pypdf (from anthropics/skills)
- xlsx: Excel operations (from anthropics/skills)
- ... 40+ more
Compatible with anthropics/skills β just copy .md files to ~/.openharness/skills/.
Compatible with claude-code plugins. Tested with 12 official plugins:
| Plugin | Type | What it does |
|---|---|---|
commit-commands |
Commands | Git commit, push, PR workflows |
security-guidance |
Hooks | Security warnings on file edits |
hookify |
Commands + Agents | Create custom behavior hooks |
feature-dev |
Commands | Feature development workflow |
code-review |
Agents | Multi-agent PR review |
pr-review-toolkit |
Agents | Specialized PR review agents |
# Manage plugins
oh plugin list
oh plugin install <source>
oh plugin enable <name>Multi-level safety with fine-grained control:
| Mode | Behavior | Use Case |
|---|---|---|
| Default | Ask before write/execute | Daily development |
| Auto | Allow everything | Sandboxed environments |
| Plan Mode | Block all writes | Large refactors, review first |
Path-level rules in settings.json:
{
"permission": {
"mode": "default",
"path_rules": [{"pattern": "/etc/*", "allow": false}],
"denied_commands": ["rm -rf /", "DROP TABLE *"]
}
}React/Ink TUI with full interactive experience:
- Command picker: Type
/β arrow keys to select β Enter - Permission dialog: Interactive y/n with tool details
- Mode switcher:
/permissionsβ select from list - Session resume:
/resumeβ pick from history - Animated spinner: Real-time feedback during tool execution
- Keyboard shortcuts: Shown at the bottom, context-aware
oh [OPTIONS] COMMAND [ARGS]
Session: -c/--continue, -r/--resume, -n/--name
Model: -m/--model, --effort, --max-turns
Output: -p/--print, --output-format text|json|stream-json
Permissions: --permission-mode, --dangerously-skip-permissions
Context: -s/--system-prompt, --append-system-prompt, --settings
Advanced: -d/--debug, --mcp-config, --bare
Subcommands: oh mcp | oh plugin | oh auth
| Suite | Tests | Status |
|---|---|---|
| Unit + Integration | 114 | β All passing |
| CLI Flags E2E | 6 | β Real model calls |
| Harness Features E2E | 9 | β Retry, skills, parallel, permissions |
| React TUI E2E | 3 | β Welcome, conversation, status |
| TUI Interactions E2E | 4 | β Commands, permissions, shortcuts |
| Real Skills + Plugins | 12 | β anthropics/skills + claude-code/plugins |
# Run all tests
uv run pytest -q # 114 unit/integration
python scripts/test_harness_features.py # Harness E2E
python scripts/test_real_skills_plugins.py # Real plugins E2Efrom pydantic import BaseModel, Field
from openharness.tools.base import BaseTool, ToolExecutionContext, ToolResult
class MyToolInput(BaseModel):
query: str = Field(description="Search query")
class MyTool(BaseTool):
name = "my_tool"
description = "Does something useful"
input_model = MyToolInput
async def execute(self, arguments: MyToolInput, context: ToolExecutionContext) -> ToolResult:
return ToolResult(output=f"Result for: {arguments.query}")Create ~/.openharness/skills/my-skill.md:
---
name: my-skill
description: Expert guidance for my specific domain
---
# My Skill
## When to use
Use when the user asks about [your domain].
## Workflow
1. Step one
2. Step two
...Create .openharness/plugins/my-plugin/.claude-plugin/plugin.json:
{
"name": "my-plugin",
"version": "1.0.0",
"description": "My custom plugin"
}Add commands in commands/*.md, hooks in hooks/hooks.json, agents in agents/*.md.
OpenHarness is a community-driven research project. We welcome contributions in:
| Area | Examples |
|---|---|
| Tools | New tool implementations for specific domains |
| Skills | Domain knowledge .md files (finance, science, DevOps...) |
| Plugins | Workflow plugins with commands, hooks, agents |
| Providers | Support for more LLM backends (OpenAI, Ollama, etc.) |
| Multi-Agent | Coordination protocols, team patterns |
| Testing | E2E scenarios, edge cases, benchmarks |
| Documentation | Architecture guides, tutorials, translations |
# Development setup
git clone https://github.com/HKUDS/OpenHarness.git
cd openharness
uv sync --extra dev
uv run pytest -q # Verify everything worksMIT β see LICENSE.
Oh my Harness!
The model is the agent. The code is the harness.









