feat: RCE recorded-trace verifier v0 by Haserjian · Pull Request #59 · Haserjian/assay

Haserjian · 2026-04-06T23:55:44Z

Summary

Adds rce_verify.py: recorded-trace RCE verifier implementing the 4-phase verification order from RCE Profile v0.1
Adds JSON Schemas for Episode Contract and replay result
Adds hidden assay rce-verify <pack_dir> --out-dir <dir> CLI command
Extends proof-pack receipt-type acceptance for rce.*/v0 receipts
6 focused tests covering all verdict paths

Verification phases

Phase	Action	Failure
1	Episode Contract schema + DAG validation	INTEGRITY_FAIL
2	Proof pack integrity (existing `verify_proof_pack`)	INTEGRITY_FAIL
3	Receipt completeness + derived hash recomputation	INTEGRITY_FAIL
4	Recorded-trace replay comparison	DIVERGE or MATCH

Key contract surfaces

SKIPPED steps: output_hash=null, excluded from outputs_hash and Phase 4
INTEGRITY_FAIL: claim_check=null (comparison not reached)
Hash format: sha256:<64-char-hex> throughout
DIVERGE: exhaustive collection policy, at least one divergent step
Per-step comparator tier resolution from comparator_tiers_by_step
outputs_hash recomputed from step receipt payloads (not replay artifacts)

Residual gap

dispute.replay_pack_root_sha256 emitted as null — replay-bundle packing not yet implemented.

Test plan

MATCH verdict for valid recorded-trace pack (exit 0)
DIVERGE verdict with exhaustive divergence collection (exit 1)
INTEGRITY_FAIL for mismatched derived hashes (exit 2)
SKIPPED steps excluded from replay with correct null semantics
Writer materializes receipt + details JSON
CLI exit codes match verdict semantics
Full repo test suite regression check

🤖 Generated with Claude Code

Implements the 4-phase verification order from RCE Profile v0.1: - Phase 1: Episode Contract schema + DAG validation (cycles, uniqueness, terminal EMIT_OUTPUT) - Phase 2: Proof pack integrity via existing verify_proof_pack() - Phase 3: Receipt completeness + derived hash recomputation (episode_spec_hash, env_fingerprint_hash, inputs_hash, script_hash, outputs_hash from step receipt payloads) - Phase 4: Recorded-trace replay comparison with per-step comparator tier resolution Verdicts: MATCH (exit 0), DIVERGE (exit 1), INTEGRITY_FAIL (exit 2). SKIPPED steps: output_hash=null, excluded from outputs_hash and Phase 4. INTEGRITY_FAIL: claim_check=null (comparison not reached). Hash format: sha256:<64-char-hex> throughout. DIVERGE: exhaustive collection, at least one divergent step required. New files: - src/assay/rce_verify.py: verifier engine + receipt writer - src/assay/schemas/rce_episode_contract.schema.json: Episode Contract schema - src/assay/schemas/rce_replay_result_v0.1.schema.json: replay result schema - tests/assay/test_rce_verify.py: 6 tests (match, diverge, integrity_fail, skipped, writer, CLI) Modified: - src/assay/__init__.py: export rce_verify public API - src/assay/commands.py: hidden `assay rce-verify` command - src/assay/proof_pack.py: accept slash-versioned RCE receipt types Residual: dispute.replay_pack_root_sha256 emitted as null until replay-bundle packing surface exists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-06T23:55:53Z

AgentMesh Lineage Check

Lineage coverage: 0/3 commits (0%)

No AgentMesh-Episode: trailers found.
Install AgentMesh to enable commit lineage tracking.

Copilot

Pull request overview

Adds an initial “recorded-trace” Replay-Constrained Episode (RCE) verifier, including contract/result JSON Schemas, a hidden CLI entrypoint, and tests to cover the main verdict paths.

Changes:

Introduces assay.rce_verify implementing 4-phase verification (contract + DAG, proof-pack integrity, receipt/artifact completeness, recorded-trace comparison) and writing a replay-result receipt + details sidecar.
Adds JSON Schemas for the RCE Episode Contract and the replay-result receipt format.
Extends proof-pack receipt-type acceptance to allow namespaced types with /... suffixes and adds a hidden assay rce-verify CLI command with tests.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/assay/test_rce_verify.py	Adds focused tests for MATCH/DIVERGE/INTEGRITY_FAIL paths, skipped-step semantics, writer output, and CLI exit codes.
src/assay/rce_verify.py	Implements the RCE verifier, schema validation, phase logic, receipt emission, and disk writer.
src/assay/schemas/rce_episode_contract.schema.json	Defines the Episode Contract schema used in Phase 1.
src/assay/schemas/rce_replay_result_v0.1.schema.json	Defines the replay-result receipt schema used for verifier outputs.
src/assay/commands.py	Adds hidden `assay rce-verify` CLI command that invokes the verifier/writer.
src/assay/proof_pack.py	Broadens allowed namespaced receipt type regex to accept optional `/...` suffix (e.g., `rce.*/v0`).
src/assay/init.py	Re-exports the new verifier symbols from the top-level package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T00:02:13Z

+    steps_replayed = int(receipt.get("steps_replayed", 0) or 0)
+    steps_matched = int(receipt.get("steps_matched", 0) or 0)
+    steps_diverged = int(receipt.get("steps_diverged", 0) or 0)
+    divergent_step_ids = cast(List[str], receipt.get("divergent_step_ids") or [])


validate_rce_replay_result is meant to return a list of validation errors, but it can currently raise (e.g., int("abc") if a caller passes an invalid receipt). This makes the helper unsafe to use as a validator on untrusted input. Consider wrapping the numeric coercions (and any other type assumptions) in try/except and appending errors instead of throwing, so the function always returns List[str].

Suggested change

steps_replayed = int(receipt.get("steps_replayed", 0) or 0)

steps_matched = int(receipt.get("steps_matched", 0) or 0)

steps_diverged = int(receipt.get("steps_diverged", 0) or 0)

divergent_step_ids = cast(List[str], receipt.get("divergent_step_ids") or [])

try:

steps_replayed = int(receipt.get("steps_replayed", 0) or 0)

except (TypeError, ValueError):

errors.append("steps_replayed: must be an integer")

steps_replayed = 0

try:

steps_matched = int(receipt.get("steps_matched", 0) or 0)

except (TypeError, ValueError):

errors.append("steps_matched: must be an integer")

steps_matched = 0

try:

steps_diverged = int(receipt.get("steps_diverged", 0) or 0)

except (TypeError, ValueError):

errors.append("steps_diverged: must be an integer")

steps_diverged = 0

raw_divergent_step_ids = receipt.get("divergent_step_ids")

if raw_divergent_step_ids is None:

divergent_step_ids = []

elif isinstance(raw_divergent_step_ids, list):

divergent_step_ids = cast(List[str], raw_divergent_step_ids)

else:

errors.append("divergent_step_ids: must be a list")

divergent_step_ids = []

Copilot · 2026-04-07T00:02:13Z

+        steps_replayed += 1
+        expected_input_hashes = [
+            cast(str, step_payloads[dependency].get("output_hash"))
+            for dependency in cast(List[str], step.get("depends_on") or [])
+        ]
+        observed_output_hash = _canonical_sha256(parsed_traces[step_id])
+        reasons: List[str] = []
+        if cast(List[str], payload.get("input_hashes") or []) != expected_input_hashes:
+            reasons.append("input hash chain mismatch")
+        if observed_output_hash != _payload_string(payload, "output_hash"):
+            reasons.append("JCS output hash mismatch")


The Phase 4 divergence logic treats an input_hashes chain mismatch as a DIVERGE verdict (reasons.append("input hash chain mismatch")). This mismatch is independent of recorded-trace replay (it’s an internal receipt/contract consistency issue), and can cause packs whose step outputs match recorded traces to be labeled DIVERGE instead of INTEGRITY_FAIL. Consider moving this check into Phase 3 (receipt completeness) and failing verification with INTEGRITY_FAIL when the step receipt’s input_hashes don’t match the receipt-derived dependency output hashes (and/or when a step depends on a SKIPPED step).

Copilot · 2026-04-07T00:02:13Z

+              }
+            },
+            "replay_pack_root_sha256": {
+              "type": ["string", "null"]


dispute.replay_pack_root_sha256 allows any string when non-null. Since the rest of the format consistently uses sha256:<64-hex>, it would be better for the schema to enforce the same pattern when a value is present (e.g., anyOf of {type: "string", pattern: ...} and {type: "null"}). This prevents accepting malformed hashes once replay-pack bundling is implemented.

Suggested change

"type": ["string", "null"]

"anyOf": [

{

"type": "string",

"pattern": "^sha256:[0-9a-f]{64}$"

},

{

"type": "null"

}

]

Copilot · 2026-04-07T00:02:13Z

+    step_ids = [str(step.get("step_id") or "") for step in steps]
+    duplicates = sorted({step_id for step_id in step_ids if step_ids.count(step_id) > 1 and step_id})
+    if duplicates:
+        errors.append(f"replay_script.steps: duplicate step_id values: {', '.join(duplicates)}")


Duplicate step_id detection is currently O(n^2) due to repeated step_ids.count(step_id) calls. For larger ReplayScripts this can become a noticeable bottleneck during Phase 1 validation. Consider using a single pass with a Counter/dict of counts to detect duplicates in O(n).

Phase 3 now validates the profile's failure-propagation invariant: - Steps depending on FAIL or SKIPPED ancestors MUST be SKIPPED - SKIPPED status is only valid when at least one dependency is FAIL/SKIPPED - A dependent step marked PASS after an upstream FAIL is rejected as INTEGRITY_FAIL (the receipt chain is structurally invalid) This is a proof-tier correction: without it, the receipt chain could look orderly while the episode semantics were lying. The schema cannot express cross-step execution semantics — this belongs in the verifier. Added focused regression: _build_failed_dependency_pack constructs a pack where a dependent step is incorrectly marked PASS after its upstream FAIL, and the verifier correctly rejects it. 7/7 tests pass. 47/47 adjacent tests (replay_judge + episode) unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…or; tighten schema Merge-blocker fixes from PR review: 1. input_hashes chain mismatch → INTEGRITY_FAIL (Phase 3) A mismatch between step.input_hashes and the dependency's output_hash in the receipt graph is an internal receipt inconsistency, not a replay-comparison disagreement. Moving this check to Phase 3 gives it the correct INTEGRITY_FAIL verdict. Phase 4 now only checks JCS output hash mismatch → DIVERGE. New test: test_input_hash_chain_mismatch_is_integrity_fail 2. validate_rce_replay_result never raises on malformed input int("abc") / int([]) raised ValueError/TypeError on malformed steps_replayed, steps_matched, steps_diverged fields. Replaced with _safe_int() helper that appends to the error list instead. Callers can treat the return value as an exhaustive error surface. New test: test_validate_replay_result_never_raises_on_malformed_input 3. replay_pack_root_sha256 schema pattern (forward-compat) When non-null, the value must match sha256:<64hex>. Previously typed as ["string", "null"] with no format constraint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Haserjian · 2026-04-07T02:10:04Z

Address review items by reclassifying hash-chain mismatches as integrity failures, hardening replay-result validation to fail by returned errors instead of exceptions, and tightening replay-pack SHA schema validation.

Commit ddf37ce on feat/rce-verifier-v0. 9/9 tests green. Branch is pushed.

Copilot AI review requested due to automatic review settings April 6, 2026 23:55

Copilot started reviewing on behalf of Haserjian April 6, 2026 23:56 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

This was referenced Apr 7, 2026

feat: RCE replay scenarios (match, diverge, tampered) Haserjian/assay-proof-gallery#4

Open

docs: RCE replay mode action contract (PR E gate) Haserjian/assay-verify-action#3

Merged

Haserjian merged commit 6e92cf7 into main Apr 7, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: RCE recorded-trace verifier v0#59

feat: RCE recorded-trace verifier v0#59
Haserjian merged 3 commits into
mainfrom
feat/rce-verifier-v0

Haserjian commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Haserjian commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    steps_replayed = int(receipt.get("steps_replayed", 0) or 0)
-    steps_matched = int(receipt.get("steps_matched", 0) or 0)
-    steps_diverged = int(receipt.get("steps_diverged", 0) or 0)
-    divergent_step_ids = cast(List[str], receipt.get("divergent_step_ids") or [])
+    try:
+        steps_replayed = int(receipt.get("steps_replayed", 0) or 0)
+    except (TypeError, ValueError):
+        errors.append("steps_replayed: must be an integer")
+        steps_replayed = 0
+    try:
+        steps_matched = int(receipt.get("steps_matched", 0) or 0)
+    except (TypeError, ValueError):
+        errors.append("steps_matched: must be an integer")
+        steps_matched = 0
+    try:
+        steps_diverged = int(receipt.get("steps_diverged", 0) or 0)
+    except (TypeError, ValueError):
+        errors.append("steps_diverged: must be an integer")
+        steps_diverged = 0
+    raw_divergent_step_ids = receipt.get("divergent_step_ids")
+    if raw_divergent_step_ids is None:
+        divergent_step_ids = []
+    elif isinstance(raw_divergent_step_ids, list):
+        divergent_step_ids = cast(List[str], raw_divergent_step_ids)
+    else:
+        errors.append("divergent_step_ids: must be a list")
+        divergent_step_ids = []

-              "type": ["string", "null"]
+              "anyOf": [
+                {
+                  "type": "string",
+                  "pattern": "^sha256:[0-9a-f]{64}$"
+                },
+                {
+                  "type": "null"
+                }
+              ]

Conversation

Haserjian commented Apr 6, 2026

Summary

Verification phases

Key contract surfaces

Residual gap

Test plan

Uh oh!

github-actions Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AgentMesh Lineage Check

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Haserjian commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 6, 2026 •

edited

Loading