fix: make evidence optional in record_decision to unblock drift test by acailic · Pull Request #207 · acailic/agent_debugger

acailic · 2026-06-05T12:49:07Z

Summary

Makes evidence an optional keyword arg (default None → []) in RecordingMixin.record_decision — non-breaking since all existing callers pass it explicitly
Adds _drift_events list and _drift_compare_index to TraceContext.restore so drift collection is wired up
Extends record_decision to detect and accumulate drift events when a _drift_detector is active

Test plan

tests/test_replay_depth_l3.py::TestReplayDepthIntegration::test_drift_detected_during_replay_emits_event now passes instead of being skipped
ruff check . passes with no issues
python3 -m pytest -q full suite passes

Closes #205

🤖 Generated with Claude Code

Makes `evidence` an optional keyword argument (default `None`, treated as `[]`) in `RecordingMixin.record_decision`. All existing callers already pass evidence explicitly so this is non-breaking. Also adds lightweight drift-event collection to `record_decision` and wires `_drift_events`/`_drift_compare_index` onto `TraceContext.restore` so the previously-skipped drift-emission test now passes. Closes #205 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR aims to unblock drift-related replay testing by making TraceContext.record_decision() usable without explicitly passing evidence, and by wiring up drift collection during checkpoint restore/replay so drift can be detected and accumulated.

Changes:

Updates RecordingMixin.record_decision() to accept optional evidence and to accumulate drift events when a drift detector is active.
Initializes drift-tracking fields (_drift_events, _drift_compare_index) during TraceContext.restore().
Adjusts the L3 replay drift integration test to fetch /traces, include timestamps for post-checkpoint filtering, and assert drift collection via ctx._drift_events.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`tests/test_replay_depth_l3.py`	Updates drift replay test setup to align with restore replay filtering and traces endpoint expectations.
`agent_debugger_sdk/core/recorders.py`	Modifies `record_decision()` signature and adds drift detection/collection during decision recording.
`agent_debugger_sdk/core/context/trace_context.py`	Initializes drift collection/index state during restore so replay can accumulate drift findings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

acailic · 2026-06-05T18:21:17Z

    async def record_decision(
        self,
        reasoning: str,
        confidence: float,
-        evidence: list[dict[str, Any]],
        chosen_action: str,
+        evidence: list[dict[str, Any]] | None = None,
        evidence_event_ids: list[str] | None = None,


Fixed in f380508. Added * after chosen_action in record_decision to make evidence and all remaining parameters keyword-only. This prevents positional passing of evidence while keeping the existing positional order for reasoning, confidence, and chosen_action safe.

acailic · 2026-06-05T18:21:21Z

+        # Detect drift against the original execution if a detector is active
+        drift_detector = getattr(self, "_drift_detector", None)
+        if drift_detector is not None:
+            drift_index = getattr(self, "_drift_compare_index", 0)
+            event_dict = {
+                "event_type": "decision",
+                "data": {"chosen_action": chosen_action, "confidence": confidence},
+            }
+            drift = drift_detector.compare(event_dict, drift_index)
+            self._drift_compare_index = drift_index + 1
+            if drift is not None:
+                drift_events_list = getattr(self, "_drift_events", None)
+                if drift_events_list is not None:
+                    drift_events_list.append(drift)


Fixed in f380508. Three changes: (1) drift event_dict now uses event.confidence (the clamped value from max(0.0, min(1.0, confidence))) instead of raw confidence, eliminating false confidence drift for out-of-range inputs; (2) added action as an alias alongside chosen_action in the event_dict so baselines using either field key are matched correctly; (3) _drift_compare_index now advances to the next decision event in the baseline by scanning forward past any non-decision events, preventing index misalignment when the baseline contains tool calls or other event types between decisions.

…ison fixes - Add `*` after `chosen_action` in `record_decision` to make `evidence` and remaining params keyword-only, preventing accidental positional use and protecting existing positional callers - Use clamped `event.confidence` instead of raw `confidence` in drift event_dict to match what is actually persisted - Add `action` alias alongside `chosen_action` in drift event_dict so baselines using either key are matched - Advance `_drift_compare_index` to the next decision event in the baseline (skipping non-decision events) to prevent index misalignment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 5, 2026 12:49

Copilot started reviewing on behalf of acailic June 5, 2026 12:49 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make evidence optional in record_decision to unblock drift test#207

fix: make evidence optional in record_decision to unblock drift test#207
acailic wants to merge 2 commits into
mainfrom
fix/issue-205-evidence-optional

acailic commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

acailic Jun 5, 2026

Uh oh!

acailic Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acailic commented Jun 5, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

acailic Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

acailic Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants