feat: add low_information_session integrity flag to harden sparse-session scoring#19
Conversation
…dening Agent-Logs-Url: https://github.com/Code-Alchemist101/managed-ai-assessment-platform/sessions/46fde10c-67e9-4312-9f94-4104dbed9b88 Co-authored-by: Code-Alchemist101 <113128088+Code-Alchemist101@users.noreply.github.com>
|
@copilot This is directionally good, and the root-cause analysis looks real, but I don’t want to merge this version yet. The current patch does more than add sparse-session visibility. It also changes integrity routing broadly by turning any session with fewer than 10 total events into Please refine this once before merge:
I want this PR to stay conservative and reviewable: make uncertainty visible, but avoid overly broad policy changes unless they are clearly grounded. |
Agent-Logs-Url: https://github.com/Code-Alchemist101/managed-ai-assessment-platform/sessions/3b75ffa3-3bf0-465d-9512-e581d04b561f Co-authored-by: Code-Alchemist101 <113128088+Code-Alchemist101@users.noreply.github.com>
Refined in 688b531. |
|
@copilot One last cleanup before merge: In
Please remove the extra The narrowed low-information rule looks good otherwise. |
Checked this directly on HEAD and there is only a single footer call now ( |
There was a problem hiding this comment.
Pull request overview
Adds a reviewer-visible integrity signal for sparse/low-information sessions to reduce over-trust in archetype scoring when telemetry is too thin, and backs it with analytics + reviewer-web tests.
Changes:
- Add
low_information_sessionintegrity flag when a known sparse-session signature is detected (short session + typing-only edits + no paste + no AI prompts). - Expose a reviewer-facing description for the new integrity flag in reviewer-web view-model labeling.
- Add regression tests in analytics and reviewer-web to validate the flag behavior and UI label rendering.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/web/view-models.test.ts | Adds a unit test for buildIntegrityFlagLabels() to ensure low_information_session renders a clear reviewer-facing label and unknown flags humanize correctly. |
| services/analytics-py/tests/test_pipeline.py | Extends pipeline tests with short-session fixtures and regression coverage for sparse-session integrity flagging and policy behavior. |
| services/analytics-py/assessment_analytics/integrity.py | Implements the low_information_session detection logic and attaches reviewer notes; ensures verdict becomes review when flagged. |
| apps/reviewer-web/src/view-model.ts | Adds a description string for low_information_session so the reviewer UI shows a clear explanation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| base_ts = "2026-04-14T06:00:{:02d}Z" | ||
| events = [ | ||
| { | ||
| "event_id": "desktop-1", | ||
| "session_id": "short-session-1", | ||
| "timestamp_utc": base_ts.format(0), | ||
| "source": "desktop", | ||
| "event_type": "session.started", | ||
| "sequence_no": 1, | ||
| "artifact_ref": "session", | ||
| "payload": {"status": "active"}, | ||
| "client_version": "0.1.0", | ||
| "integrity_hash": "h1", | ||
| "policy_context": {}, | ||
| }, | ||
| { | ||
| "event_id": "desktop-2", | ||
| "session_id": "short-session-1", | ||
| "timestamp_utc": base_ts.format(5), | ||
| "source": "desktop", | ||
| "event_type": "session.heartbeat", | ||
| "sequence_no": 2, | ||
| "artifact_ref": "session", | ||
| "payload": {"status": "active"}, | ||
| "client_version": "0.1.0", | ||
| "integrity_hash": "h2", | ||
| "policy_context": {}, | ||
| }, | ||
| ] | ||
| for index in range(event_count - 2): | ||
| events.append({ | ||
| "event_id": f"ide-{index + 1}", | ||
| "session_id": "short-session-1", | ||
| "timestamp_utc": base_ts.format(10 + index * 5), | ||
| "source": "ide", |
services/analytics-py/tests/test_pipeline.pyfooter for duplicateunittest.main()unittest.main()call if presentOriginal prompt
Audit and harden short-session behavior in the scoring pipeline.
Sequence:
Branch-only, PR into main, small, reviewable atomic patch only.
The following is the prior conversation context from the user's chat exploration (may be truncated):
User: @copilot Accepted Confirmation: Confirm agent session
Assistant: Work is now in progress to expose durable reviewer notes in the decision panel with a minimal, reviewer-focused addition.
You can track progress and review the upcoming branch/draft PR here:
I will only touch reviewer-web and supporting API as necessary for end-to-end reviewer notes; existing decision logic will not be changed or expanded.
User: Replace the decorative 10-event timeline with something useful Message Work from the current main, but do not commit directly to main. Create a new dedicated branch for this task, implement only on that branch, and open a draft PR back into main. I want a reviewer trust improvement PR focused on the session timeline. Problem: The current reviewer timeline is capped at 10 events, which makes it mostly decorative and not useful for real session review. Goal: Replace or improve this with the smallest safe reviewer-facing change that gives meaningfully better behavioral visibility. Please do this in order: inspect the current actual timeline implementation on main confirm exactly where the 10-event cap is applied propose the smallest safe improvement, preferring one of: paginated timeline expandable full list richer summary by source/event type explain which option is best for a small PR implement the smallest safe choice add/update tests if needed Important: do not redesign the whole reviewer app prioritize usefulness over visual polish keep performance reasonable for larger sessions
Assistant: I'm inspecting the actual main branch timeline code to locate the 10-event cap and will propose the smallest meaningful fix (likely expandable or paginated timeline), explain the best option for a minimal PR, and implement it on a dedicated branch with a draft PR into main. Test coverage and performance for larger sessions will be considered. You’ll get an explicit summary and branch PR link as soon as it’s ready—no redesign, just a focused visibility/productivity improvement.
User: @copilot Accepted Confirmation: Confirm agent session
Assistant: The inspection and improvement for the reviewer timeline event visibility (replacing the 10-event cap) is underway. I’ll propose and implement the smallest practical reviewer-focused timeline enhancement—without any main branch commits.
Track progress and access the draft PR for review here:
You’ll get an explicit summary, rationale, and branch PR link when ready—focused on practical utility over visuals and keeping reviewer trust central.
User: Add reviewer guidance for dual scoring mode comparison Message Work from the current main, but do not commit directly to main. Create a new dedicated branch for this task, implement only on that branch, and open a draft PR back into main. I want a small reviewer UX PR that makes dual scoring mode comparison understandable. Problem: The reviewer panel now shows both heuristic and trained-model results, but there is little guidance about: which one is the active policy result what to do when they disagree how the reviewer should interpret the comparison Go...
This pull request was created from Copilot chat.