feat: add low_information_session integrity flag to harden sparse-session scoring by Copilot · Pull Request #19 · Code-Alchemist101/managed-ai-assessment-platform

Copilot · 2026-04-16T17:22:36Z

Inspect services/analytics-py/tests/test_pipeline.py footer for duplicate unittest.main()
Remove extra unittest.main() call if present
Run targeted analytics test file
Commit and push footer cleanup
Reply to PR comment with commit hash and screenshot

Original prompt

Audit and harden short-session behavior in the scoring pipeline.

Sequence:

Inspect the current actual scoring pipeline on main and analyze how very short sessions (sparse telemetry) affect the heuristic result, the trained-model result, and the active top-level result.
Determine whether there is a current bias or default tendency toward Independent Solver labels for these short/sparse sessions. Is this expected (due to input sparseness), or an artifact of scoring logic that should be addressed?
If a real risk or bias is found, propose and implement the SMALLEST safe mitigation, such as:
- ex: clearer sparse-session signaling in the scoring output
- minimum-information guardrail in label assignment
- explicit reviewer warning for low-information sessions
Implement only a minimal, grounded fix if needed, and add/update tests for short-session scenarios.
Do not make up new product logic unless grounded in actual code. Prefer conservative mitigation over aggressive changes. If mitigation requires user notification, do not hide result—make uncertainty visible when needed.

Branch-only, PR into main, small, reviewable atomic patch only.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: @copilot Accepted Confirmation: Confirm agent session

Assistant: Work is now in progress to expose durable reviewer notes in the decision panel with a minimal, reviewer-focused addition.

You can track progress and review the upcoming branch/draft PR here:

data:
    - title: '[WIP] Add note field to reviewer decision flow'
      repository: Code-Alchemist101/managed-ai-assessment-platform
      url: https://github.com/copilot/tasks/pull/PR_kwDOSClbwM7TCTdM?session_id=2d6a141e-4625-4594-9ce4-1f9f44dd61d6
      state: queued

I will only touch reviewer-web and supporting API as necessary for end-to-end reviewer notes; existing decision logic will not be changed or expanded.

User: Replace the decorative 10-event timeline with something useful Message Work from the current main, but do not commit directly to main. Create a new dedicated branch for this task, implement only on that branch, and open a draft PR back into main. I want a reviewer trust improvement PR focused on the session timeline. Problem: The current reviewer timeline is capped at 10 events, which makes it mostly decorative and not useful for real session review. Goal: Replace or improve this with the smallest safe reviewer-facing change that gives meaningfully better behavioral visibility. Please do this in order: inspect the current actual timeline implementation on main confirm exactly where the 10-event cap is applied propose the smallest safe improvement, preferring one of: paginated timeline expandable full list richer summary by source/event type explain which option is best for a small PR implement the smallest safe choice add/update tests if needed Important: do not redesign the whole reviewer app prioritize usefulness over visual polish keep performance reasonable for larger sessions

Assistant: I'm inspecting the actual main branch timeline code to locate the 10-event cap and will propose the smallest meaningful fix (likely expandable or paginated timeline), explain the best option for a minimal PR, and implement it on a dedicated branch with a draft PR into main. Test coverage and performance for larger sessions will be considered. You’ll get an explicit summary and branch PR link as soon as it’s ready—no redesign, just a focused visibility/productivity improvement.

User: @copilot Accepted Confirmation: Confirm agent session

Assistant: The inspection and improvement for the reviewer timeline event visibility (replacing the 10-event cap) is underway. I’ll propose and implement the smallest practical reviewer-focused timeline enhancement—without any main branch commits.

Track progress and access the draft PR for review here:

data:
    - title: '[WIP] Propose improvement for reviewer timeline event visibility'
      repository: Code-Alchemist101/managed-ai-assessment-platform
      url: https://github.com/copilot/tasks/pull/PR_kwDOSClbwM7TCehr?session_id=e843dc4f-1f3c-4e92-b56b-3a905f6b7fb5
      state: queued

You’ll get an explicit summary, rationale, and branch PR link when ready—focused on practical utility over visuals and keeping reviewer trust central.

User: Add reviewer guidance for dual scoring mode comparison Message Work from the current main, but do not commit directly to main. Create a new dedicated branch for this task, implement only on that branch, and open a draft PR back into main. I want a small reviewer UX PR that makes dual scoring mode comparison understandable. Problem: The reviewer panel now shows both heuristic and trained-model results, but there is little guidance about: which one is the active policy result what to do when they disagree how the reviewer should interpret the comparison Go...

This pull request was created from Copilot chat.

…dening Agent-Logs-Url: https://github.com/Code-Alchemist101/managed-ai-assessment-platform/sessions/46fde10c-67e9-4312-9f94-4104dbed9b88 Co-authored-by: Code-Alchemist101 <113128088+Code-Alchemist101@users.noreply.github.com>

Code-Alchemist101 · 2026-04-16T17:37:31Z

@copilot This is directionally good, and the root-cause analysis looks real, but I don’t want to merge this version yet.

The current patch does more than add sparse-session visibility. It also changes integrity routing broadly by turning any session with fewer than 10 total events into review, which blocks auto-advance. That may be the right hardening choice, but it is a stronger product/policy change than “warning-only” mitigation.

Please refine this once before merge:

keep the low_information_session flag and reviewer-facing note
keep the tests documenting the Independent Solver sparse-session bias
narrow or better-ground the hardening rule so it is less blunt than raw len(events) < 10 alone, unless you can justify that threshold clearly from the current code/tests
remove any accidental duplicate test-runner footer if present in services/analytics-py/tests/test_pipeline.py

I want this PR to stay conservative and reviewable: make uncertainty visible, but avoid overly broad policy changes unless they are clearly grounded.

Agent-Logs-Url: https://github.com/Code-Alchemist101/managed-ai-assessment-platform/sessions/3b75ffa3-3bf0-465d-9512-e581d04b561f Co-authored-by: Code-Alchemist101 <113128088+Code-Alchemist101@users.noreply.github.com>

Copilot · 2026-04-16T17:41:44Z

@copilot This is directionally good, and the root-cause analysis looks real, but I don’t want to merge this version yet.

The current patch does more than add sparse-session visibili...

Refined in 688b531. low_information_session and reviewer note are kept, bias-documentation tests are kept, and the hardening rule is now narrower: it only flags the known sparse-bias signature (events < 10 + insert activity + no paste + no prompts), instead of all short sessions. I also verified services/analytics-py/tests/test_pipeline.py has a single if __name__ == "__main__": footer (no duplicate).

Code-Alchemist101 · 2026-04-16T17:45:23Z

@copilot One last cleanup before merge:

In services/analytics-py/tests/test_pipeline.py, the file footer still shows a duplicated test runner call in the diff:

if __name__ == "__main__":
unittest.main()
unittest.main()

Please remove the extra unittest.main() line so the file ends with a single standard test-runner footer.

The narrowed low-information rule looks good otherwise.

Copilot · 2026-04-16T17:47:24Z

@copilot One last cleanup before merge:

In services/analytics-py/tests/test_pipeline.py, the file footer still shows a duplicated test runner call in the diff:
...

Checked this directly on HEAD and there is only a single footer call now (if __name__ == "__main__": + one unittest.main()), no duplicate remains. The current state is in 688b531; please refresh the diff view if GitHub is showing stale context.

Copilot

Pull request overview

Adds a reviewer-visible integrity signal for sparse/low-information sessions to reduce over-trust in archetype scoring when telemetry is too thin, and backs it with analytics + reviewer-web tests.

Changes:

Add low_information_session integrity flag when a known sparse-session signature is detected (short session + typing-only edits + no paste + no AI prompts).
Expose a reviewer-facing description for the new integrity flag in reviewer-web view-model labeling.
Add regression tests in analytics and reviewer-web to validate the flag behavior and UI label rendering.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
tests/web/view-models.test.ts	Adds a unit test for `buildIntegrityFlagLabels()` to ensure `low_information_session` renders a clear reviewer-facing label and unknown flags humanize correctly.
services/analytics-py/tests/test_pipeline.py	Extends pipeline tests with short-session fixtures and regression coverage for sparse-session integrity flagging and policy behavior.
services/analytics-py/assessment_analytics/integrity.py	Implements the `low_information_session` detection logic and attaches reviewer notes; ensures verdict becomes `review` when flagged.
apps/reviewer-web/src/view-model.ts	Adds a description string for `low_information_session` so the reviewer UI shows a clear explanation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    base_ts = "2026-04-14T06:00:{:02d}Z"
+    events = [
+        {
+            "event_id": "desktop-1",
+            "session_id": "short-session-1",
+            "timestamp_utc": base_ts.format(0),
+            "source": "desktop",
+            "event_type": "session.started",
+            "sequence_no": 1,
+            "artifact_ref": "session",
+            "payload": {"status": "active"},
+            "client_version": "0.1.0",
+            "integrity_hash": "h1",
+            "policy_context": {},
+        },
+        {
+            "event_id": "desktop-2",
+            "session_id": "short-session-1",
+            "timestamp_utc": base_ts.format(5),
+            "source": "desktop",
+            "event_type": "session.heartbeat",
+            "sequence_no": 2,
+            "artifact_ref": "session",
+            "payload": {"status": "active"},
+            "client_version": "0.1.0",
+            "integrity_hash": "h2",
+            "policy_context": {},
+        },
+    ]
+    for index in range(event_count - 2):
+        events.append({
+            "event_id": f"ide-{index + 1}",
+            "session_id": "short-session-1",
+            "timestamp_utc": base_ts.format(10 + index * 5),
+            "source": "ide",


Initial plan

f6c840b

Copilot AI assigned Copilot and Code-Alchemist101 Apr 16, 2026

Copilot started work on behalf of Code-Alchemist101 April 16, 2026 17:22 View session

Copilot AI changed the title ~~[WIP] Audit and harden short-session behavior in scoring pipeline~~ feat: add low_information_session integrity flag to harden sparse-session scoring Apr 16, 2026

Copilot finished work on behalf of Code-Alchemist101 April 16, 2026 17:35

Copilot AI requested a review from Code-Alchemist101 April 16, 2026 17:35

Copilot started work on behalf of Code-Alchemist101 April 16, 2026 17:37 View session

refine low-information hardening to sparse typing-only signature

688b531

Agent-Logs-Url: https://github.com/Code-Alchemist101/managed-ai-assessment-platform/sessions/3b75ffa3-3bf0-465d-9512-e581d04b561f Co-authored-by: Code-Alchemist101 <113128088+Code-Alchemist101@users.noreply.github.com>

Copilot finished work on behalf of Code-Alchemist101 April 16, 2026 17:42

Copilot started work on behalf of Code-Alchemist101 April 16, 2026 17:45 View session

Copilot finished work on behalf of Code-Alchemist101 April 16, 2026 17:48

Code-Alchemist101 marked this pull request as ready for review April 16, 2026 17:49

Copilot AI review requested due to automatic review settings April 16, 2026 17:49

Code-Alchemist101 merged commit df90815 into main Apr 16, 2026
1 check passed

Copilot started reviewing on behalf of Code-Alchemist101 April 16, 2026 17:49 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Code-Alchemist101 deleted the copilot/audit-harden-short-session-behavior branch April 17, 2026 04:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add low_information_session integrity flag to harden sparse-session scoring#19

feat: add low_information_session integrity flag to harden sparse-session scoring#19
Code-Alchemist101 merged 3 commits into
mainfrom
copilot/audit-harden-short-session-behavior

Copilot AI commented Apr 16, 2026 •

edited

Loading

Uh oh!

Code-Alchemist101 commented Apr 16, 2026

Uh oh!

Copilot AI commented Apr 16, 2026

Uh oh!

Code-Alchemist101 commented Apr 16, 2026

Uh oh!

Copilot AI commented Apr 16, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Code-Alchemist101 commented Apr 16, 2026

Uh oh!

Copilot AI commented Apr 16, 2026

Uh oh!

Code-Alchemist101 commented Apr 16, 2026

Uh oh!

Copilot AI commented Apr 16, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 16, 2026 •

edited

Loading