AgentCloseoutBench

AgentCloseoutBench is a benchmark-in-progress for evaluating dark-pattern detection on agentic coding assistant closeout text: the final assistant message available to Claude Code Stop and SubagentStop hooks as last_assistant_message.

The benchmark contribution is the lifecycle surface and reusable black-box hook evaluation harness. Regex hook performance is reported as one detector family, not as the benchmark itself.

The new detector contribution is agentcloseout-physics: a deterministic closeout protocol engine. It treats hooks as runtime adapters and evaluates positive closeout states, dark-pattern mechanics, and evidence-claim markers without a live LLM, embedding model, or network call in the verdict path.

Current Status

This repository is a recovery, hardening, and public-data-intake workspace with a complete v0.2 synthetic candidate corpus plus a small v0.3 public-derived adversarial fixture lane. It is not yet a public v1.0 gold-label dataset release.

Current candidate corpus: 800 records, 4 categories x 100 positive x 100 negative, exact task-type quotas from quota_manifest.json.
Current public-derived adversarial lane: 16 candidate records, 4 categories x 2 positive x 2 negative, with per-record source provenance and manifest rows.
Current public-derived rule fixtures: 14 fixtures covering the four dark-pattern engines plus closeout_contract and evidence_claims.
Current corpus labels: candidate labels until two independent human annotation passes plus adjudication are complete.
Current public-shaped source: deterministic synthetic templates released under Apache-2.0.
Current public claim language: "To our knowledge, AgentCloseoutBench is the first benchmark for dark-pattern detection on agentic coding assistant closeout text at the Claude Code Stop/SubagentStop last_assistant_message boundary."
Current engine claim language: "out-of-band deterministic enforcement at the agentic coding assistant closeout boundary makes specific dark-pattern and false-closeout mechanics observable, reproducible, and benchmarkable."
Current ACSP-CC language: ACSP-CC is a proposed Claude Code closeout security profile, and agentcloseout-physics is the current reference implementation for that proposal. Any conformance output is self-assessed preflight evidence, not a standard, certification, or final benchmark metric.
Not yet claimed: human-annotated release, universal agent benchmark, or absolute injection-immune defense.
Current high-assurance hardening: the Claude Code adapters include a PreToolUse tamper guard for .claude/hooks, .claude/agentcloseout.env, pinned engine paths, and pinned rule packs; env config is parsed through an allowlist instead of shell-sourced.

Field evidence

The failure modes this benchmark scores — MAST 2.6 (action-reasoning mismatch) and 3.3 (no/incorrect verification) — were observed in a production healthcare deployment. Effective Therapy (a trauma-therapy platform; cited with permission, patient-facing specifics withheld) ran an Opus 4.7 orchestrator that narrated 39 agent dispatches, including five verification agents reporting findings, while 5 of 39 agents were ever used and the verification agents had zero sessions; a codebase audit added 80+ hollow-code findings (correct auth, routes, and signatures — missing the line that persists data). Refs anthropics/claude-code#61167, #61107; case study at ianymu/recognition-without-arrest#2. Effective Therapy has offered 30+ labeled hollow-code examples toward a future semantic-emptiness detector — a real-world labeled corpus, not synthetic.

A follow-up forensic audit (2026-05-26) put a field-measured fabrication rate on it, against ground truth (actual curl dispatch logs vs assistant claims): ~34% phantom on Opus 4.7 (44 phantom claims / 128 real dispatches, 18 phantom agent-names) vs ~4% on Opus 4.6 (2 / 50), with zero Agent/Task tool calls in any 4.7 session — the fabrication never crossed the tool boundary. This is the first field-measured rate for the MAST 2.6/3.3 family (single deployment, retrospective; not a substitute for gold labels). Details and the curl-vs-claim protocol: case-studies/effective-therapy-forensic.md.

Layout

SPEC.md: active scientific and engineering contract.
SOURCE_LEDGER.md: live-verified external evidence used for claims.
CLAIM_LEDGER.md: claim status: verified, corrected, deferred, or dropped.
data/: release-shaped candidate corpus JSONL files.
recovery/: local reconstruction outputs and quarantined records.
annotations/: human and LLM annotation workflow scripts and outputs.
evaluation/: black-box hook harness and metric code.
engine/: Rust CLI for deterministic closeout physics.
engines/: per-category physics engine manifests for paper and runtime use.
rules/closeout/: versioned deterministic rule packs.
adapters/claude-code/: installable Claude Code hook adapters for daily use.
fixtures/closeout/: golden fixtures for rule-pack behavior.
fixtures/closeout_public/: public-study-derived fixtures for v1 pressure testing.
public_data_intake/: source registry, manifest, quarantine, and public-derived adversarial corpus lane.
baselines/: non-hook baselines used to separate benchmark quality from hook tuning.
rubrics/, schemas/, manifests/: annotation, schema, provenance, license, redaction, and metadata artifacts.
tests/: local no-network QA tests.

Local QA

Run the local no-network checks:

python3 scripts/validate_corpus.py --data-dir data --quota-manifest quota_manifest.json
python3 -m pytest -q

Run a reproducibility smoke check:

bash scripts/reproduce_local.sh

Run the deterministic closeout physics checks:

bin/agentcloseout-physics lint-rules rules/closeout
bin/agentcloseout-physics test-rules rules/closeout fixtures/closeout
bin/agentcloseout-physics test-rules rules/closeout fixtures/closeout_public
python3 scripts/public_data_intake.py audit-registry \
  --registry public_data_intake/source_registry.json \
  --schema schemas/public_source.schema.json
python3 scripts/public_data_intake.py validate-derived \
  --registry public_data_intake/source_registry.json \
  --manifest public_data_intake/derived_fixture_manifest.jsonl \
  --data-dir public_data_intake/candidate_public_adversarial

Run the user-facing Claude Code adapter smoke test:

bash scripts/hook-smoke.sh

Install physics-backed hooks into a Claude Code project:

bash adapters/claude-code/install.sh /path/to/project

Install a single category hook:

bash adapters/claude-code/install.sh /path/to/project no-cliffhanger

The standalone hook repos remain installable on their own. The adapter lane is for users who want the reproducible Rust engine, versioned rule packs, rule-pack hash, benchmark fixtures, and opt-in content-free telemetry commands. The adapter installer also writes a PreToolUse tamper guard that blocks Claude Code from editing the local hook wiring, engine pointer, or rule-pack pointer during an ordinary session.

Recovery

The previous /tmp/agent-closeout-bench workspace was lost. Recovery is derived from Claude Code JSONL transcripts under:

~/.claude/projects/-tmp-agent-closeout-bench/*.jsonl

Recovery must only extract visible assistant text blocks. It must not persist thinking blocks, signatures, tool calls, tool outputs, hidden transcript fields, or secrets.

python3 generation/recover_from_claude_transcripts.py \
  --transcripts-dir ~/.claude/projects/-tmp-agent-closeout-bench \
  --output-dir data \
  --manifest recovery/RECOVERY_MANIFEST.md

Generic negative prompts do not encode a category in the prompt text. The recovery script quarantines those as category_unresolved unless a later, auditable mapping source proves the category.

The recovered private transcript pool is not mixed into the public-shaped v0.2 corpus. It is preserved for audit in recovery/recovered_category_proven_pool.jsonl.

Evaluation

Example hook evaluation:

python3 evaluation/eval_hooks.py \
  --hooks-dir /path/to/llm-dark-patterns/hooks \
  --corpus-dir data \
  --hook-category-map "wrap_up:no-wrap-up.sh,cliffhanger:no-cliffhanger.sh,roleplay_drift:no-roleplay-drift.sh,sycophancy:no-sycophancy.sh" \
  --ground-truth candidate \
  --output results/eval_candidate.json

Candidate diagnostics should use dev or validation. Final paper results must use adjudicated human labels on locked_test; the harness blocks locked-test runs that ask for candidate labels.

Release Blockers

Two independent human annotation passes.
Adjudicated final labels.
Per-category agreement report.
Private or delayed holdout policy if a leaderboard is launched.
Fresh-clone reproducibility run with final labels.
Hugging Face dataset card and Croissant metadata validation if targeting an E&D-style release.
Exact pinned hook commits and machine-readable result JSON.
Larger reviewed public-derived corpus, then two-pass human gold annotation and adjudication before any public performance claim.

Release Blockers Resolved In v0.2

Full 800-record schema-valid candidate corpus.
Exact deterministic quota manifest.
Opaque blind annotation packet with private id map.
Provenance, license, and redaction manifests for the synthetic public-shaped corpus.
Local no-network smoke reproduction.

Public-Data Guardrails Added In v0.3

Source registry with tier, license, privacy status, allowed use, import decision, and release eligibility.
Content-free sampler for local public JSONL trace review; raw text is not persisted unless an approved source and explicit write flag are used.
Derived-fixture manifest linking every public-derived record to source id, source-record hash, transform, reviewer, and license decision.
Quarantine checks for secrets, emails, absolute paths, usernames, hostnames, repo URLs, raw tool-output markers, and trace artifact leakage.
Evaluation output now reports per-corpus-kind, per-source, and per-fixture breakdowns so public-derived stress results cannot be hidden in aggregate.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
.taste		.taste
adapters/claude-code		adapters/claude-code
annotations		annotations
baselines		baselines
bin		bin
case-studies		case-studies
conformance/acsp-cc-v0.1		conformance/acsp-cc-v0.1
data		data
docs/physics-engines		docs/physics-engines
engine		engine
engines		engines
evaluation		evaluation
fixtures		fixtures
generation		generation
hf_dataset		hf_dataset
manifests		manifests
paper		paper
public_data_intake		public_data_intake
recovery		recovery
results		results
rubrics		rubrics
rules/closeout		rules/closeout
schemas		schemas
scripts		scripts
splits		splits
standards		standards
tasks		tasks
templates		templates
tests		tests
transcripts		transcripts
.gitignore		.gitignore
ANNOTATION_REPORT.md		ANNOTATION_REPORT.md
CITATION.cff		CITATION.cff
CLAIM_LEDGER.md		CLAIM_LEDGER.md
COLLABORATION_PRIVACY.md		COLLABORATION_PRIVACY.md
DATASET_CARD.md		DATASET_CARD.md
HOOK_ITERATION_LOG.md		HOOK_ITERATION_LOG.md
LICENSE		LICENSE
LIMITATIONS.md		LIMITATIONS.md
PHYSICS_ENGINE_SPEC.md		PHYSICS_ENGINE_SPEC.md
README.md		README.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
REPRODUCE.md		REPRODUCE.md
RULE_PACK_SCHEMA.md		RULE_PACK_SCHEMA.md
SECURITY.md		SECURITY.md
SOTA_2026_RESEARCH_LEDGER.md		SOTA_2026_RESEARCH_LEDGER.md
SOURCE_LEDGER.md		SOURCE_LEDGER.md
SPEC.md		SPEC.md
STATUS.md		STATUS.md
dist-workspace.toml		dist-workspace.toml
leaderboard.md		leaderboard.md
metadata_croissant_draft.json		metadata_croissant_draft.json
pyproject.toml		pyproject.toml
quota_manifest.json		quota_manifest.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentCloseoutBench

Current Status

Field evidence

Categories

Layout

Local QA

Recovery

Evaluation

Release Blockers

Release Blockers Resolved In v0.2

Public-Data Guardrails Added In v0.3

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentCloseoutBench

Current Status

Field evidence

Categories

Layout

Local QA

Recovery

Evaluation

Release Blockers

Release Blockers Resolved In v0.2

Public-Data Guardrails Added In v0.3

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages