Skip to content

fix: make evidence optional in record_decision to unblock drift test#207

Open
acailic wants to merge 2 commits into
mainfrom
fix/issue-205-evidence-optional
Open

fix: make evidence optional in record_decision to unblock drift test#207
acailic wants to merge 2 commits into
mainfrom
fix/issue-205-evidence-optional

Conversation

@acailic
Copy link
Copy Markdown
Owner

@acailic acailic commented Jun 5, 2026

Summary

  • Makes evidence an optional keyword arg (default None[]) in RecordingMixin.record_decision — non-breaking since all existing callers pass it explicitly
  • Adds _drift_events list and _drift_compare_index to TraceContext.restore so drift collection is wired up
  • Extends record_decision to detect and accumulate drift events when a _drift_detector is active

Test plan

  • tests/test_replay_depth_l3.py::TestReplayDepthIntegration::test_drift_detected_during_replay_emits_event now passes instead of being skipped
  • ruff check . passes with no issues
  • python3 -m pytest -q full suite passes

Closes #205

🤖 Generated with Claude Code

Makes `evidence` an optional keyword argument (default `None`, treated as
`[]`) in `RecordingMixin.record_decision`. All existing callers already
pass evidence explicitly so this is non-breaking.

Also adds lightweight drift-event collection to `record_decision` and
wires `_drift_events`/`_drift_compare_index` onto `TraceContext.restore`
so the previously-skipped drift-emission test now passes.

Closes #205

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 12:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to unblock drift-related replay testing by making TraceContext.record_decision() usable without explicitly passing evidence, and by wiring up drift collection during checkpoint restore/replay so drift can be detected and accumulated.

Changes:

  • Updates RecordingMixin.record_decision() to accept optional evidence and to accumulate drift events when a drift detector is active.
  • Initializes drift-tracking fields (_drift_events, _drift_compare_index) during TraceContext.restore().
  • Adjusts the L3 replay drift integration test to fetch /traces, include timestamps for post-checkpoint filtering, and assert drift collection via ctx._drift_events.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/test_replay_depth_l3.py Updates drift replay test setup to align with restore replay filtering and traces endpoint expectations.
agent_debugger_sdk/core/recorders.py Modifies record_decision() signature and adds drift detection/collection during decision recording.
agent_debugger_sdk/core/context/trace_context.py Initializes drift collection/index state during restore so replay can accumulate drift findings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 97 to 103
async def record_decision(
self,
reasoning: str,
confidence: float,
evidence: list[dict[str, Any]],
chosen_action: str,
evidence: list[dict[str, Any]] | None = None,
evidence_event_ids: list[str] | None = None,
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f380508. Added * after chosen_action in record_decision to make evidence and all remaining parameters keyword-only. This prevents positional passing of evidence while keeping the existing positional order for reasoning, confidence, and chosen_action safe.

Comment on lines +126 to +139
# Detect drift against the original execution if a detector is active
drift_detector = getattr(self, "_drift_detector", None)
if drift_detector is not None:
drift_index = getattr(self, "_drift_compare_index", 0)
event_dict = {
"event_type": "decision",
"data": {"chosen_action": chosen_action, "confidence": confidence},
}
drift = drift_detector.compare(event_dict, drift_index)
self._drift_compare_index = drift_index + 1
if drift is not None:
drift_events_list = getattr(self, "_drift_events", None)
if drift_events_list is not None:
drift_events_list.append(drift)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f380508. Three changes: (1) drift event_dict now uses event.confidence (the clamped value from max(0.0, min(1.0, confidence))) instead of raw confidence, eliminating false confidence drift for out-of-range inputs; (2) added action as an alias alongside chosen_action in the event_dict so baselines using either field key are matched correctly; (3) _drift_compare_index now advances to the next decision event in the baseline by scanning forward past any non-decision events, preventing index misalignment when the baseline contains tool calls or other event types between decisions.

…ison fixes

- Add `*` after `chosen_action` in `record_decision` to make `evidence`
  and remaining params keyword-only, preventing accidental positional use
  and protecting existing positional callers
- Use clamped `event.confidence` instead of raw `confidence` in drift
  event_dict to match what is actually persisted
- Add `action` alias alongside `chosen_action` in drift event_dict so
  baselines using either key are matched
- Advance `_drift_compare_index` to the next decision event in the
  baseline (skipping non-decision events) to prevent index misalignment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: make evidence optional in record_decision to unblock skipped drift test

2 participants