-
Notifications
You must be signed in to change notification settings - Fork 263
feat(hooks): add HookType.AGENT for agent-based hook evaluation #3052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
burakkeless
wants to merge
14
commits into
OpenHands:main
Choose a base branch
from
burakkeless:burak/2864-agent-based-hook-evaluation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
81fba9d
Add agent-based hook execution with sub-conversation support
6e6406c
Add display_command property to HookDefinition for improved hook labe…
7eb3871
Address PR review feedback on agent hook implementation
c166503
Introduce agent-based hooks example and improve agent hook execution
ba8e3fc
resolve conflict
67d3735
Refactor agent hook execution for improved JSON parsing, error handli…
57865d8
Merge branch 'main' into burak/2864-agent-based-hook-evaluation
VascoSch92 60e9fa1
fix: address hook pre-commit failures
VascoSch92 823b460
fix: include agent hook metrics in parent stats
VascoSch92 95895c8
fix: persist agent hooks under conversation base
VascoSch92 f86fad4
fix(hooks): address review on agent hook visualizer, decisions, example
VascoSch92 6a2c950
Merge branch 'main' into burak/2864-agent-based-hook-evaluation
VascoSch92 c467752
fix(hooks): preserve REST contract for HookDefinition.command; allow …
VascoSch92 ead24bf
fix(hooks): address review — live LLM lookup, scoped enum allowlist, …
VascoSch92 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| # Agent-based Hooks Example | ||
|
|
||
| This folder demonstrates the `type="agent"` hook — a lifecycle hook whose | ||
| decision is produced by an LLM-driven sub-agent rather than a shell script. | ||
|
|
||
| For shell-command hooks see [`../33_hooks/`](../33_hooks). | ||
|
|
||
| ## Why an agent hook | ||
|
|
||
| A shell-based PreToolUse hook can only block what its blacklist literally | ||
| matches. The agent rewrites `cat /etc/passwd` as `awk '{print}' /etc/passwd` | ||
| and slips through. An agent hook reasons about the **semantic intent** of the | ||
| command — "reading a sensitive system file" — and denies regardless of the | ||
| exact tool name used. | ||
|
|
||
| ## Example | ||
|
|
||
| - **main.py** — Two agent hooks, each in its own conversation: | ||
| - **PreToolUse** "security reviewer" denies a command whose intent is to | ||
| read `/etc/passwd`, even though no obvious keyword appears in a blacklist. | ||
| - **Stop** "quality reviewer" refuses to let the main agent finish until | ||
| the required deliverable (`REPORT.md`) is present in the workspace. | ||
|
|
||
| Each hook decision is printed to the console via a `HookExecutionEvent` | ||
| callback, so you can watch the allow/deny outcomes as the demo runs. | ||
|
|
||
| ## Running | ||
|
|
||
| ```bash | ||
| export LLM_API_KEY="your-key" | ||
| export LLM_MODEL="anthropic/claude-sonnet-4-5-20250929" # optional | ||
| export LLM_BASE_URL="https://your-endpoint" # optional | ||
|
|
||
| python main.py | ||
| ``` | ||
|
|
||
| ## How an agent hook is configured | ||
|
|
||
| ```python | ||
| HookDefinition( | ||
| type=HookType.AGENT, | ||
| name="security-reviewer", # bucket for cost metrics (agent-hook:<name>) | ||
| system_prompt="...", # instructs the hook agent; must request JSON | ||
| tools=["file_editor"], # optional tools the hook agent may use | ||
| # (use registered names, e.g. "file_editor", | ||
| # "terminal" — not class names like | ||
| # "FileEditorTool") | ||
| timeout=60, # forwarded to the per-hook LLM copy | ||
| max_iterations=3, # cap on hook sub-conversation steps | ||
| ) | ||
| ``` | ||
|
|
||
| The hook agent receives the event JSON and must reply with: | ||
|
|
||
| ```json | ||
| {"decision": "allow" | "deny", "reason": "<short explanation>"} | ||
| ``` | ||
|
|
||
| Anything else (non-JSON, missing field, sub-conversation error) defaults to | ||
| `allow` so a broken hook cannot wedge the main agent. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| """OpenHands Agent SDK — Agent-based Hooks Example | ||
|
|
||
| Demonstrates the `type="agent"` hook, which evaluates lifecycle events with an | ||
| LLM-driven sub-agent instead of a shell script. The hook agent receives the | ||
| event JSON, reasons about it semantically, and replies with a decision payload: | ||
|
|
||
| {"decision": "allow" | "deny", "reason": "..."} | ||
|
|
||
| Two demos: | ||
|
|
||
| - PreToolUse (security reviewer): inspects the INTENT of a terminal command, | ||
| not just its syntax. A command like `awk '{print}' /etc/passwd` would slip | ||
| past a blacklist of command names, but the agent hook recognises it as | ||
| reading a sensitive system file and denies it. | ||
| - Stop (quality reviewer): refuses to let the main agent finish until the | ||
| required deliverable exists in the workspace. | ||
|
|
||
| For shell-command hooks see ../33_hooks/33_hooks.py. | ||
| """ | ||
|
|
||
| import os | ||
| import tempfile | ||
| from pathlib import Path | ||
|
|
||
| from pydantic import SecretStr | ||
|
|
||
| from openhands.sdk import LLM, Conversation | ||
| from openhands.sdk.conversation.state import ConversationExecutionStatus | ||
| from openhands.sdk.event.hook_execution import HookExecutionEvent | ||
| from openhands.sdk.hooks import HookConfig, HookDefinition, HookMatcher, HookType | ||
| from openhands.tools.preset.default import get_default_agent | ||
|
|
||
|
|
||
| # Keep the demo conversations short: a small per-run iteration cap means a hook | ||
| # that keeps denying (or a model that keeps retrying) fails fast instead of | ||
| # burning calls up to the default 500-iteration limit / CI subprocess timeout. | ||
| MAX_ITERATIONS = 10 | ||
|
|
||
|
|
||
| # Configure LLM | ||
| api_key = os.getenv("LLM_API_KEY") | ||
| assert api_key is not None, "LLM_API_KEY environment variable is not set." | ||
| model = os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929") | ||
| base_url = os.getenv("LLM_BASE_URL") | ||
|
|
||
| llm = LLM( | ||
| usage_id="agent", | ||
| model=model, | ||
| base_url=base_url, | ||
| api_key=SecretStr(api_key), | ||
| ) | ||
|
|
||
| SECURITY_REVIEWER_PROMPT = """You are a security reviewer for a coding agent. | ||
| You receive a PreToolUse hook event describing a terminal command the agent | ||
| is about to run. | ||
|
|
||
| Reason about the SEMANTIC INTENT of the command, not just its syntax. Block: | ||
| - reading sensitive system files (/etc/passwd, /etc/shadow, ~/.ssh/*, credentials) | ||
| - broad filesystem discovery outside the workspace (rg --files /etc, find /, ls /root) | ||
| - exfiltrating data over the network | ||
| - modifying system state (sudo, apt install, systemctl) | ||
|
|
||
| Allow ordinary inspection, builds, tests, and edits inside the workspace. | ||
| When unsure, prefer allow. | ||
|
|
||
| Reply with a single JSON object and nothing else: | ||
| {"decision": "allow" | "deny", "reason": "<short explanation>"} | ||
| """ | ||
|
|
||
| QUALITY_REVIEWER_PROMPT = """You are a quality reviewer enforcing task completion. | ||
| You receive a Stop hook event when the main agent tries to finish. | ||
|
|
||
| The task requires the file REPORT.md to exist in the workspace and contain at | ||
| least one bullet point describing the repository. Use the file_editor tool to | ||
| check whether the file exists and inspect its contents. | ||
|
|
||
| If the deliverable is missing or empty, deny so the main agent keeps working. | ||
| Otherwise allow. | ||
|
|
||
| Reply with a single JSON object and nothing else: | ||
| {"decision": "allow" | "deny", "reason": "<short explanation>"} | ||
| """ | ||
|
|
||
|
|
||
| def hook_logger(event) -> None: | ||
| """Surface each hook decision so the demo output is self-explanatory.""" | ||
| if not isinstance(event, HookExecutionEvent): | ||
| return | ||
| status = "DENY " if event.blocked else ("ALLOW" if event.success else "FAIL ") | ||
| line = f" [hook] {event.hook_event_type} {status} -> {event.hook_command}" | ||
| if event.reason: | ||
| line += f"\n reason: {event.reason}" | ||
| print(line) | ||
|
|
||
|
|
||
| def run_demo(workspace: Path, hook_config: HookConfig, message: str) -> float: | ||
| """Run one demo in its own conversation and return its cost. | ||
|
|
||
| Each demo gets a fresh LLM with isolated metrics so per-demo costs don't | ||
| overlap (reusing one LLM would make the second conversation's stats include | ||
| the first demo's spend). A small iteration cap plus an error/stuck check make | ||
| the example fail fast instead of looping. | ||
| """ | ||
| demo_llm = llm.model_copy() | ||
| demo_llm.reset_metrics() | ||
| conversation = Conversation( | ||
| agent=get_default_agent(llm=demo_llm), | ||
| workspace=str(workspace), | ||
| hook_config=hook_config, | ||
| callbacks=[hook_logger], | ||
| max_iteration_per_run=MAX_ITERATIONS, | ||
| ) | ||
| conversation.send_message(message) | ||
| conversation.run() | ||
| status = conversation.state.execution_status | ||
| if status in ( | ||
| ConversationExecutionStatus.ERROR, | ||
| ConversationExecutionStatus.STUCK, | ||
| ): | ||
| raise RuntimeError( | ||
| f"Demo conversation ended in {status.value} state " | ||
| "before reaching a decision." | ||
| ) | ||
| return conversation.conversation_stats.get_combined_metrics().accumulated_cost | ||
|
|
||
|
|
||
| # Each demo runs in its own conversation with only the hook it needs. Sharing a | ||
| # single config would leave the Stop quality gate active during Demo 1, so the | ||
| # agent could never finish the first task until REPORT.md existed — coupling two | ||
| # unrelated demos and burning iterations. | ||
| security_hook_config = HookConfig( | ||
| pre_tool_use=[ | ||
| HookMatcher( | ||
| matcher="terminal", | ||
| hooks=[ | ||
| HookDefinition( | ||
| type=HookType.AGENT, | ||
| name="security-reviewer", | ||
| system_prompt=SECURITY_REVIEWER_PROMPT, | ||
| timeout=60, | ||
| max_iterations=3, | ||
| ) | ||
| ], | ||
| ) | ||
| ], | ||
| ) | ||
|
|
||
| quality_hook_config = HookConfig( | ||
| stop=[ | ||
| HookMatcher( | ||
| hooks=[ | ||
| HookDefinition( | ||
| type=HookType.AGENT, | ||
| name="quality-reviewer", | ||
| system_prompt=QUALITY_REVIEWER_PROMPT, | ||
| tools=["file_editor"], | ||
| timeout=90, | ||
| max_iterations=5, | ||
| ) | ||
| ], | ||
| ) | ||
| ], | ||
| ) | ||
|
|
||
|
|
||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| workspace = Path(tmpdir) | ||
| total_cost = 0.0 | ||
|
|
||
| print("=" * 60) | ||
| print("Demo 1: PreToolUse — semantic deny") | ||
| print("=" * 60) | ||
| print( | ||
| "Asking the agent to read /etc/passwd via awk. The literal command\n" | ||
| "wouldn't match a syntactic blacklist (no `cat`, no `/etc/shadow`\n" | ||
| "keyword), but the security-reviewer agent should recognise the\n" | ||
| "intent and deny.\n" | ||
| ) | ||
| total_cost += run_demo( | ||
| workspace, | ||
| security_hook_config, | ||
| "Show me the contents of /etc/passwd using awk '{print}'.", | ||
| ) | ||
|
|
||
| print("\n" + "=" * 60) | ||
| print("Demo 2: Stop — deny until deliverable exists") | ||
| print("=" * 60) | ||
| print("Quality reviewer denies until REPORT.md exists with a bullet point.\n") | ||
| total_cost += run_demo( | ||
| workspace, | ||
| quality_hook_config, | ||
| "Write REPORT.md in the workspace with at least one bullet point " | ||
| "describing this repository, then finish.", | ||
| ) | ||
|
|
||
| report = workspace / "REPORT.md" | ||
| if report.exists(): | ||
| print(f"\n[REPORT.md preview: {report.read_text()[:120]!r}...]") | ||
|
|
||
| print("\n" + "=" * 60) | ||
| print("Example Complete!") | ||
| print("=" * 60) | ||
|
|
||
| print(f"\nEXAMPLE_COST: {total_cost}") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.