Assessment Platform

Score how a candidate worked, not just what they submitted.

This repository contains the local v1 assessment platform for managed AI-assisted coding sessions. It is intentionally isolated from the research assets in the parent folder.

Published repository:

managed-ai-assessment-platform

Current Status

The project is in release-candidate territory for the local v1 goal:

Replay-fixture regression remains intact as the baseline path.
The control plane is the source of truth for manifests, session state, completeness, missing streams, and scoring metadata.
The desktop + VS Code live path is working and has multiple clean local sessions.
The full desktop + VS Code + Edge live path is implemented, build-green, integration-green, and has clean local runtime baselines for session-scoped browser bootstrap.
The latest human-driven full-manifest session also completed end to end and scored successfully, but it landed in review rather than clean because that session visited unsupported sites and surfaced sequence-gap integrity flags.
Reviewer and admin views support real live-session triage instead of only demo-style latest-session behavior.
The desktop controller now includes a manifest picker, explicit browser readiness messaging, and an operator recovery action for stuck sessions.
Provider-specific browser prompt/response capture is additive evidence only. It does not change stream-completeness rules for the full manifest.

Core Local URLs

Control plane: http://127.0.0.1:4010
Ingestion: http://127.0.0.1:4020
Ingestion event endpoint: http://127.0.0.1:4020/api/events
Analytics: http://127.0.0.1:4030
Reviewer: http://127.0.0.1:4173
Admin: http://127.0.0.1:4174

Repo Layout

apps/desktop-controller: Electron managed session launcher
apps/reviewer-web: reviewer console
apps/admin-web: admin console
extensions/vscode-assessment: VS Code live telemetry extension
extensions/edge-managed: managed Edge extension
packages/contracts: shared contracts and signal taxonomy
services/control-plane-api: manifests, sessions, bootstrap, scoring orchestration
services/ingestion-api: telemetry ingestion
services/analytics-py: 51-signal extraction, integrity, HACI, archetype scoring
fixtures: replay-fixture regression payloads

Supporting Docs

Local Setup

cd "C:\Users\hosan\Desktop\Research Project\assessment-platform"
npm install
python -m venv .venv
.\\.venv\\Scripts\\Activate.ps1
pip install -r services\\analytics-py\\requirements.txt

Build And Automated Verification

Run these before manual validation and again before final handoff:

npm run build
npm run test:web
npm run test:integration
npm run test:analytics

What these cover:

build: TypeScript workspaces plus web builds
test:web: reviewer/admin view-models, desktop-controller helper logic, Edge bootstrap/provider helpers
test:integration: replay-fixture regression, live desktop + IDE, invalid missing-IDE flow, full live browser bootstrap flow
test:analytics: Python feature extraction and integrity checks

Run The Local Stack

Build first:

npm run build

Or use the one-command local stack launcher, which builds first, starts the backend/web services in one console, and then launches the desktop controller automatically:

npm run dev:stack

Use the full live manifest directly from startup with:

npm run dev:stack:full

If you want the stack to run in the background and manage it more like a local app, use:

npm run dev:stack:start
npm run dev:stack:status
npm run dev:stack:stop

On Windows, npm run dev:stack:start and npm run dev:stack:start:full are now the least noisy startup path because they hide the helper shells used to launch the local services. You should still expect the intentional product surfaces to open during a live run:

Electron desktop controller
VS Code Extension Development Host
managed Edge browser window

Start the background stack directly into the full manifest with:

npm run dev:stack:start:full

Then start each service in its own PowerShell window from the repo root:

npm run dev:analytics
npm run dev:ingestion
npm run dev:control-plane
npm run dev:reviewer
npm run dev:admin
npm run dev:desktop-controller

For a non-interactive startup sanity check without opening the desktop controller, use:

npm run dev:stack:smoke

For automated local smoke runs, the desktop controller also supports:

ASSESSMENT_AUTO_START_WORKSPACE
ASSESSMENT_AUTO_START_DELAY_MS
ASSESSMENT_AUTO_END_WHEN_READY
ASSESSMENT_AUTO_END_DELAY_MS

These hooks are useful for managed local smoke validation when you want the controller to start a session from a known workspace and end it automatically once all required streams are present.

The local runtime data root used by the built-in service runner is:

C:\Users\hosan\Desktop\Research Project\assessment-platform\.runtime-data\local-dev

Desktop Controller Workflow

The desktop controller is the primary manual entry point for local live sessions.

What it now supports:

manifest picker loaded from the control-plane manifest inventory
default selection of manifest-python-cli-live-desktop-ide
explicit browser readiness messaging for the full live manifest
Abandon Session for stuck active runs
score gating until all required non-desktop streams are present

Important notes:

The manifest picker is the preferred manual path.
ASSESSMENT_MANIFEST_ID is still supported for scripted or automation runs.
The controller prefers native Code.exe on Windows when resolving VS Code.

Manual Acceptance

Use the local live test folder when prompted unless you are validating another workspace:

C:\Users\hosan\Desktop\Research Project\Test_folder

Default Manifest Acceptance

Start the stack and open the desktop controller.
Leave the picker on manifest-python-cli-live-desktop-ide.
Click Start Live Session and choose the test folder.
Confirm the controller progresses through:
- launching
- awaiting_ide_stream
- ready_to_score
Confirm VS Code opens automatically.
Make at least one edit/save in the Extension Development Host window.
Click End And Score Session.
Confirm reviewer/admin show:
- desktop + ide
- no missing required streams
- integrity verdict clean

Full Manifest Acceptance

Start the stack and open the desktop controller.
Use the manifest picker to select manifest-python-cli-live-full.
Click Start Live Session and choose the test folder.
Confirm VS Code opens automatically.
Confirm Edge opens in the managed session profile.
Stay on allowlisted sites if you want a clean acceptance run:
- chat.openai.com
- claude.ai
- gemini.google.com
- stackoverflow.com
- developer.mozilla.org
- docs.python.org
- www.google.com
Avoid non-allowlisted browsing such as bing.com or w3schools.com during a clean acceptance pass because the policy layer can intentionally downgrade the session to review.
Confirm the controller does not allow scoring until both IDE and browser telemetry appear.
Confirm reviewer/admin show:
- desktop + ide + browser
- no missing required streams
- integrity verdict clean
If the run gets stuck with a missing stream, use Abandon Session and start a fresh run.

Provider Prompt/Response Sanity Check

This is a supplemental browser-evidence check on top of the full manifest acceptance.

During a full-manifest run, use the managed Edge window.
Open one supported provider page:
- chat.openai.com
- claude.ai
- gemini.google.com
Sign in inside that managed session profile if required.
Send one prompt and wait for one visible response.
Confirm the active session records browser.ai.prompt and browser.ai.response without invalidating the session.

Provider capture is additive evidence only. A session can still be clean without provider prompt/response events if browser completeness and required streams are otherwise satisfied.

For VS Code:

the strongest first-class prompt/response telemetry currently comes from the assessment extension's own Assessment Platform: Open AI Assist command
third-party VS Code chat panes can still influence coding behavior, but they may not always emit first-class ide.ai.prompt or ide.ai.response events

Recovery Acceptance

Start a live session and intentionally leave one required stream missing.
Confirm the controller explains which stream is still missing.
Click Abandon Session.
Confirm the session is marked invalid and a new session can be started immediately.

Useful Checks

Health checks:

curl.exe -s http://127.0.0.1:4030/health
curl.exe -s http://127.0.0.1:4020/health
curl.exe -s http://127.0.0.1:4010/health

Runtime and session inspection:

curl.exe -s http://127.0.0.1:4010/api/runtime
curl.exe -s http://127.0.0.1:4010/api/manifests
curl.exe -s http://127.0.0.1:4010/api/sessions
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>/bootstrap
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>/scoring
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>/events
npm run session:report -- <sessionId>
npm run session:report:latest
npm run session:report:latest:json

session:report reads the saved local runtime files and prints one operator summary with HACI, archetype, integrity flags, missing streams, source mix, unsupported browser sites, and sequence anomaly hints. Use npm run session:report:latest for the latest scored session, and npm run session:report:latest:json for machine-readable output. If you need JSON for a specific session ID, run node scripts/session-report.mjs <sessionId> --json.

Managed Edge Notes

The managed Edge extension is loaded only in the isolated browser instance launched by the desktop controller.
It is not installed into your normal everyday Edge profile.
Each full-manifest session uses its own profile directory under:

C:\Users\hosan\Desktop\Research Project\assessment-platform\.runtime-data\local-dev\browser-profiles\<sessionId>

If you open your regular Edge and check extensions, you will not see the assessment extension there.

Acceptance Baselines

Saved local runtime data currently contains these useful clean baselines:

Default live baseline from earlier manual validation:
- 4f80c709-ba1b-422d-ac3d-471aef6a48bf
Fresh default live clean baseline on the integrated build:
- f5455aeb-91d5-4261-a49d-b8f5c42136a2
Full live clean baseline validating session-scoped browser bootstrap and clean scoring:
- d0ad26fb-7a63-47f1-9763-9aaaf849f7be
Latest automated clean full-manifest baseline on the integrated build:
- c5ebe45c-2888-4af7-8d1c-447709e8a12c
Latest human-driven full-manifest session:
- 36e6bd86-2423-49b7-9da1-9247d7f62e04
- status: scored
- verdict: review
- reason: unsupported_site_visited plus sequence_gap_detected

These full-manifest baselines validate the managed Edge bootstrap path and clean browser-complete scoring. Provider prompt/response capture should still be rechecked on the next signed-in manual full-manifest run.

Known Limitations And Deferred Items

Native Windows idle/focus hooks remain deferred for post-v1 hardening.
The analytics pipeline processes all 51 signal slots, but some signals still rely on generic or partial live evidence until native OS hooks are added.
Provider-specific browser capture is best-effort on supported provider pages and is intentionally limited to additive prompt/response evidence.
Third-party VS Code AI chat panes may influence coding behavior without always producing first-class ide.ai.prompt or ide.ai.response telemetry unless the interaction flows through the assessment extension's own managed AI panel.
One recent human-driven full-manifest session scored successfully but landed in review because it visited unsupported sites and surfaced browser/IDE sequence gaps. That is a real operational caveat and should be part of any honest handoff.
Browser completeness for the full manifest remains based on the existing managed browser events, not on provider capture.
Replay-fixture regression must remain untouched except for compatibility and regression protection.

Final Local v1 Definition Of Done

For this repository, local v1 is considered done when all of the following are true:

npm run build
npm run test:web
npm run test:integration
npm run test:analytics
default manifest manually validated clean on the latest build
full manifest manually validated clean on the latest build
reviewer/admin usable for successful and failed live-session triage
the repo documentation is sufficient for another operator to run and validate the system locally

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
apps		apps
docs		docs
extensions		extensions
fixtures		fixtures
infra		infra
packages/contracts		packages/contracts
scripts		scripts
services		services
tests		tests
validation		validation
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessment Platform

Current Status

Core Local URLs

Repo Layout

Supporting Docs

Local Setup

Build And Automated Verification

Run The Local Stack

Desktop Controller Workflow

Manual Acceptance

Default Manifest Acceptance

Full Manifest Acceptance

Provider Prompt/Response Sanity Check

Recovery Acceptance

Useful Checks

Managed Edge Notes

Acceptance Baselines

Known Limitations And Deferred Items

Final Local v1 Definition Of Done

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assessment Platform

Current Status

Core Local URLs

Repo Layout

Supporting Docs

Local Setup

Build And Automated Verification

Run The Local Stack

Desktop Controller Workflow

Manual Acceptance

Default Manifest Acceptance

Full Manifest Acceptance

Provider Prompt/Response Sanity Check

Recovery Acceptance

Useful Checks

Managed Edge Notes

Acceptance Baselines

Known Limitations And Deferred Items

Final Local v1 Definition Of Done

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages