Score how a candidate worked, not just what they submitted.
This repository contains the local v1 assessment platform for managed AI-assisted coding sessions. It is intentionally isolated from the research assets in the parent folder.
Published repository:
The project is in release-candidate territory for the local v1 goal:
- Replay-fixture regression remains intact as the baseline path.
- The control plane is the source of truth for manifests, session state, completeness, missing streams, and scoring metadata.
- The desktop + VS Code live path is working and has multiple clean local sessions.
- The full desktop + VS Code + Edge live path is implemented, build-green, integration-green, and has clean local runtime baselines for session-scoped browser bootstrap.
- The latest human-driven full-manifest session also completed end to end and scored successfully, but it landed in
reviewrather thancleanbecause that session visited unsupported sites and surfaced sequence-gap integrity flags. - Reviewer and admin views support real live-session triage instead of only demo-style latest-session behavior.
- The desktop controller now includes a manifest picker, explicit browser readiness messaging, and an operator recovery action for stuck sessions.
- Provider-specific browser prompt/response capture is additive evidence only. It does not change stream-completeness rules for the full manifest.
- Control plane:
http://127.0.0.1:4010 - Ingestion:
http://127.0.0.1:4020 - Ingestion event endpoint:
http://127.0.0.1:4020/api/events - Analytics:
http://127.0.0.1:4030 - Reviewer:
http://127.0.0.1:4173 - Admin:
http://127.0.0.1:4174
apps/desktop-controller: Electron managed session launcherapps/reviewer-web: reviewer consoleapps/admin-web: admin consoleextensions/vscode-assessment: VS Code live telemetry extensionextensions/edge-managed: managed Edge extensionpackages/contracts: shared contracts and signal taxonomyservices/control-plane-api: manifests, sessions, bootstrap, scoring orchestrationservices/ingestion-api: telemetry ingestionservices/analytics-py: 51-signal extraction, integrity, HACI, archetype scoringfixtures: replay-fixture regression payloads
- Platform Brief
- Release Status
- Integration Blueprint
- Demo Script
- AntiGravity KT Handoff
- Operator Manual
- Architecture Notes
- Signal Catalog
cd "C:\Users\hosan\Desktop\Research Project\assessment-platform"npm installpython -m venv .venv.\\.venv\\Scripts\\Activate.ps1pip install -r services\\analytics-py\\requirements.txt
Run these before manual validation and again before final handoff:
npm run build
npm run test:web
npm run test:integration
npm run test:analyticsWhat these cover:
build: TypeScript workspaces plus web buildstest:web: reviewer/admin view-models, desktop-controller helper logic, Edge bootstrap/provider helperstest:integration: replay-fixture regression, live desktop + IDE, invalid missing-IDE flow, full live browser bootstrap flowtest:analytics: Python feature extraction and integrity checks
Build first:
npm run buildOr use the one-command local stack launcher, which builds first, starts the backend/web services in one console, and then launches the desktop controller automatically:
npm run dev:stackUse the full live manifest directly from startup with:
npm run dev:stack:fullIf you want the stack to run in the background and manage it more like a local app, use:
npm run dev:stack:start
npm run dev:stack:status
npm run dev:stack:stopOn Windows, npm run dev:stack:start and npm run dev:stack:start:full are now the least noisy startup path because they hide the helper shells used to launch the local services. You should still expect the intentional product surfaces to open during a live run:
- Electron desktop controller
- VS Code Extension Development Host
- managed Edge browser window
Start the background stack directly into the full manifest with:
npm run dev:stack:start:fullThen start each service in its own PowerShell window from the repo root:
npm run dev:analytics
npm run dev:ingestion
npm run dev:control-plane
npm run dev:reviewer
npm run dev:admin
npm run dev:desktop-controllerFor a non-interactive startup sanity check without opening the desktop controller, use:
npm run dev:stack:smokeFor automated local smoke runs, the desktop controller also supports:
ASSESSMENT_AUTO_START_WORKSPACEASSESSMENT_AUTO_START_DELAY_MSASSESSMENT_AUTO_END_WHEN_READYASSESSMENT_AUTO_END_DELAY_MS
These hooks are useful for managed local smoke validation when you want the controller to start a session from a known workspace and end it automatically once all required streams are present.
The local runtime data root used by the built-in service runner is:
C:\Users\hosan\Desktop\Research Project\assessment-platform\.runtime-data\local-dev
The desktop controller is the primary manual entry point for local live sessions.
What it now supports:
- manifest picker loaded from the control-plane manifest inventory
- default selection of
manifest-python-cli-live-desktop-ide - explicit browser readiness messaging for the full live manifest
Abandon Sessionfor stuck active runs- score gating until all required non-desktop streams are present
Important notes:
- The manifest picker is the preferred manual path.
ASSESSMENT_MANIFEST_IDis still supported for scripted or automation runs.- The controller prefers native
Code.exeon Windows when resolving VS Code.
Use the local live test folder when prompted unless you are validating another workspace:
C:\Users\hosan\Desktop\Research Project\Test_folder
- Start the stack and open the desktop controller.
- Leave the picker on
manifest-python-cli-live-desktop-ide. - Click
Start Live Sessionand choose the test folder. - Confirm the controller progresses through:
launchingawaiting_ide_streamready_to_score
- Confirm VS Code opens automatically.
- Make at least one edit/save in the Extension Development Host window.
- Click
End And Score Session. - Confirm reviewer/admin show:
desktop + ide- no missing required streams
- integrity verdict
clean
- Start the stack and open the desktop controller.
- Use the manifest picker to select
manifest-python-cli-live-full. - Click
Start Live Sessionand choose the test folder. - Confirm VS Code opens automatically.
- Confirm Edge opens in the managed session profile.
- Stay on allowlisted sites if you want a clean acceptance run:
chat.openai.comclaude.aigemini.google.comstackoverflow.comdeveloper.mozilla.orgdocs.python.orgwww.google.com
- Avoid non-allowlisted browsing such as
bing.comorw3schools.comduring a clean acceptance pass because the policy layer can intentionally downgrade the session toreview. - Confirm the controller does not allow scoring until both IDE and browser telemetry appear.
- Confirm reviewer/admin show:
desktop + ide + browser- no missing required streams
- integrity verdict
clean
- If the run gets stuck with a missing stream, use
Abandon Sessionand start a fresh run.
This is a supplemental browser-evidence check on top of the full manifest acceptance.
- During a full-manifest run, use the managed Edge window.
- Open one supported provider page:
chat.openai.comclaude.aigemini.google.com
- Sign in inside that managed session profile if required.
- Send one prompt and wait for one visible response.
- Confirm the active session records
browser.ai.promptandbrowser.ai.responsewithout invalidating the session.
Provider capture is additive evidence only. A session can still be clean without provider prompt/response events if browser completeness and required streams are otherwise satisfied.
For VS Code:
- the strongest first-class prompt/response telemetry currently comes from the assessment extension's own
Assessment Platform: Open AI Assistcommand - third-party VS Code chat panes can still influence coding behavior, but they may not always emit first-class
ide.ai.promptoride.ai.responseevents
- Start a live session and intentionally leave one required stream missing.
- Confirm the controller explains which stream is still missing.
- Click
Abandon Session. - Confirm the session is marked invalid and a new session can be started immediately.
Health checks:
curl.exe -s http://127.0.0.1:4030/health
curl.exe -s http://127.0.0.1:4020/health
curl.exe -s http://127.0.0.1:4010/healthRuntime and session inspection:
curl.exe -s http://127.0.0.1:4010/api/runtime
curl.exe -s http://127.0.0.1:4010/api/manifests
curl.exe -s http://127.0.0.1:4010/api/sessions
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>/bootstrap
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>/scoring
curl.exe -s http://127.0.0.1:4010/api/sessions/<sessionId>/events
npm run session:report -- <sessionId>
npm run session:report:latest
npm run session:report:latest:jsonsession:report reads the saved local runtime files and prints one operator summary with HACI, archetype, integrity flags, missing streams, source mix, unsupported browser sites, and sequence anomaly hints. Use npm run session:report:latest for the latest scored session, and npm run session:report:latest:json for machine-readable output. If you need JSON for a specific session ID, run node scripts/session-report.mjs <sessionId> --json.
- The managed Edge extension is loaded only in the isolated browser instance launched by the desktop controller.
- It is not installed into your normal everyday Edge profile.
- Each full-manifest session uses its own profile directory under:
C:\Users\hosan\Desktop\Research Project\assessment-platform\.runtime-data\local-dev\browser-profiles\<sessionId>
- If you open your regular Edge and check extensions, you will not see the assessment extension there.
Saved local runtime data currently contains these useful clean baselines:
- Default live baseline from earlier manual validation:
4f80c709-ba1b-422d-ac3d-471aef6a48bf
- Fresh default live clean baseline on the integrated build:
f5455aeb-91d5-4261-a49d-b8f5c42136a2
- Full live clean baseline validating session-scoped browser bootstrap and clean scoring:
d0ad26fb-7a63-47f1-9763-9aaaf849f7be
- Latest automated clean full-manifest baseline on the integrated build:
c5ebe45c-2888-4af7-8d1c-447709e8a12c
- Latest human-driven full-manifest session:
36e6bd86-2423-49b7-9da1-9247d7f62e04- status: scored
- verdict:
review - reason:
unsupported_site_visitedplussequence_gap_detected
These full-manifest baselines validate the managed Edge bootstrap path and clean browser-complete scoring. Provider prompt/response capture should still be rechecked on the next signed-in manual full-manifest run.
- Native Windows idle/focus hooks remain deferred for post-v1 hardening.
- The analytics pipeline processes all 51 signal slots, but some signals still rely on generic or partial live evidence until native OS hooks are added.
- Provider-specific browser capture is best-effort on supported provider pages and is intentionally limited to additive prompt/response evidence.
- Third-party VS Code AI chat panes may influence coding behavior without always producing first-class
ide.ai.promptoride.ai.responsetelemetry unless the interaction flows through the assessment extension's own managed AI panel. - One recent human-driven full-manifest session scored successfully but landed in
reviewbecause it visited unsupported sites and surfaced browser/IDE sequence gaps. That is a real operational caveat and should be part of any honest handoff. - Browser completeness for the full manifest remains based on the existing managed browser events, not on provider capture.
- Replay-fixture regression must remain untouched except for compatibility and regression protection.
For this repository, local v1 is considered done when all of the following are true:
npm run buildnpm run test:webnpm run test:integrationnpm run test:analytics- default manifest manually validated clean on the latest build
- full manifest manually validated clean on the latest build
- reviewer/admin usable for successful and failed live-session triage
- the repo documentation is sufficient for another operator to run and validate the system locally