feat(studio): per-run comparison with retroactive labelling by christso · Pull Request #1040 · EntityProcess/agentv

christso · 2026-04-10T22:54:06Z

Note: this PR evolved significantly during review. It was originally built around a singular label concept (matching the issue description) and later pivoted to plural tags to match the Langfuse / W&B / GitHub convention for multi-valued post-hoc annotations. The commit history preserves the full arc; this body reflects the final merged state.

Summary

Studio Compare tab gains a Per run mode alongside the existing (experiment × target) aggregated matrix. Users can select 2+ individual runs via checkboxes and see them side-by-side, so running the same (experiment, target) twice no longer collapses into a single cell.
Runs can be retroactively tagged (multi-valued) via a tags.json sidecar written next to index.jsonl. Each run can carry up to 20 tags (≤60 chars each, control-char rejected, deduped). Mutation is exposed through PUT / DELETE /api/runs/:filename/tags (plus benchmark-scoped twins). Remote runs are read-only.
CompareTab.tsx was rewritten to match the existing Studio aesthetic — gray-950 canvas, cyan-400 accent, shared PassRatePill, single system-ui font stack — so the Compare tab is visually indistinguishable from ExperimentsTab / TargetsTab.
A new apps/studio/DESIGN.md documents Studio's actual design language (dark + cyan, canonical Tailwind patterns, do/don't list) so future agents can keep new Studio surfaces consistent.
No changes to eval YAML schema, no new CLI commands, no new tracker fields. Aggregated view semantics are unchanged — only the new per-run mode plus the tags annotation layer.

Files touched

apps/cli/src/commands/results/run-tags.ts (new) — sidecar read/write/delete helpers with per-tag validation (length, control chars, dedupe, MAX_TAGS_PER_RUN=20). Writing an empty array deletes the sidecar.
apps/cli/src/commands/results/serve.ts — handleCompare now also emits runs[], handleRuns attaches tags, new handleRunTagsPut / handleRunTagsDelete handlers wired behind the existing read-only check.
apps/studio/src/lib/types.ts — CompareRunEntry, RunTagsResponse, RunMeta.tags?: string[].
apps/studio/src/lib/api.ts — saveRunTagsApi / deleteRunTagsApi mutations that invalidate compare + runs query keys.
apps/studio/src/components/CompareTab.tsx — rewritten with the Studio Tailwind aesthetic; inline chip-based TagsEditor, per-run selection, side-by-side compare view with tags rendered as chips under the immutable timestamp header.
apps/studio/src/routes/index.tsx + apps/studio/src/routes/projects/$benchmarkId.tsx — forward benchmarkId and readOnly to CompareTab so label mutations in benchmark-scoped studio routes hit the correct endpoints.
apps/studio/DESIGN.md (new) — brand-aligned design system reference.
apps/web/src/content/docs/docs/tools/studio.mdx + 3 new screenshots in apps/web/src/assets/screenshots/studio-compare-*.png — user-facing docs for the Compare tab's two modes and the retroactive tag annotation workflow.
docs/plans/1037-per-run-compare.md — design plan (historical; preserved in the squash commit).

Verification

bun run build, typecheck, lint, test — all green (1976 tests pass).
Pre-push hook (prek) ran Build / Typecheck / Lint / Test / Validate eval YAML — passed on every push.
apps/web builds cleanly with the new screenshots embedded.
Live manual UAT via agent-browser --cdp 9222 against bun apps/cli/src/cli.ts studio --port 9100 --single on a 4-run synthetic fixture (two sharing (exp-a, claude-sonnet) to specifically exercise the feat(studio): per-run comparison with retroactive labelling #1037 collapse).

Test plan — verified

Post-merge interactive UAT against main (commit 016607e7) with real click/keystroke dispatching. Full evidence and per-flow screenshots in #1040 (comment).

Follow-up

Tracked as #1041: Filter compare views by tag. Tag filtering (chip row above the compare view to narrow both matrix and per-run table to runs matching a selected tag set) was discussed in this PR's thread and intentionally held out of scope.

🤖 Generated with Claude Code

Adds a per-run mode to the Studio Compare tab so users can select 2+ individual runs and see them side-by-side, independent of the existing (experiment, target) aggregation. Runs can be retroactively labelled via a sidecar label.json written next to index.jsonl; the label replaces the timestamp in compare column headers. Backend: - `apps/cli/src/commands/results/run-label.ts` — sidecar read/write/delete helpers (label.json next to manifest, 120-char cap, JSON schema). - `serve.ts` — /api/compare now returns a `runs[]` array with per-run entries (one per workspace), and enriches /api/runs with any label. - New endpoints: `PUT/DELETE /api/runs/:filename/label` and the benchmark-scoped variants. Remote runs are read-only. Frontend: - `CompareTab.tsx` completely reworked with an "Editorial Data Terminal" aesthetic — Fraunces serif display, JetBrains Mono tabular numerals, warm off-black canvas, antique gold accents. Scoped via inline styles under `[data-compare-root]` so it does not bleed into other surfaces. - Two modes: Aggregated (default, existing matrix re-skinned) and Per run (checkbox-selectable runs table + sticky Compare N bar + inline label editor). Compare view renders one column per selected run with label-or-timestamp headers and reuses the existing test breakdown. - API hooks `saveRunLabelApi` / `deleteRunLabelApi` invalidate compare and runs caches on mutation. Closes #1037

cloudflare-workers-and-pages · 2026-04-10T22:54:25Z

Deploying agentv with Cloudflare Pages

Latest commit:	`7fd9f06`
Status:	✅ Deploy successful!
Preview URL:	https://1a2c54c4.agentv.pages.dev
Branch Preview URL:	https://feat-1037-per-run-compare.agentv.pages.dev

View logs

- CompareTab AggregatedView: hoist useMemo above the early return so adding a second experiment/target after the initial render does not violate the Rules of Hooks. - Pass `benchmarkId` and `readOnly` through to CompareTab from both routes (single-project and benchmark-scoped). Previously label mutations in the benchmark view routed to the unscoped endpoint and either 404'd or wrote the sidecar into the wrong run directory. - LabelEditor: short-circuit Save/Clear onClick handlers on `busy` to avoid a save-then-clear race where both mutations could be in flight simultaneously. - writeRunLabel: reject control characters in labels so they cannot break compare column headers or confuse test assertions.

christso · 2026-04-10T23:06:33Z

Review follow-up (`c993a20`)

Addressed blockers from internal code review:

🔴 B1 — Rules-of-Hooks in AggregatedView: hoisted useMemo above the early-return guard in CompareTab.tsx:145. Verified with a single-experiment/single-target fixture — the "Not enough variation" notice now renders cleanly and switching to a multi-target project no longer risks a "Rendered more hooks than during the previous render" crash.
🔴 B2 — callers not forwarding benchmarkId / readOnly: fixed in apps/studio/src/routes/index.tsx and apps/studio/src/routes/projects/$benchmarkId.tsx. Label mutations in the benchmark-scoped view now hit /api/benchmarks/:benchmarkId/runs/:runId/label and invalidate the correct query keys; the readOnly prop propagates from the existing useStudioConfig() read.

Also addressed two should-fix items while I was in there:

🟡 S2 — control-character sanitization in writeRunLabel (run-label.ts): rejects any char code < 0x20 or == 0x7f.
🟡 S4 — LabelEditor save/clear race: Save and Clear onClick handlers now short-circuit on busy so a double-click in the same tick cannot fire both mutations in flight.

Deferred to follow-up (tracked mentally, not blocking):

S1 — handleCompare "last-wins" experiment/target aggregation when a run has mixed records. Rare in practice; worth a header note + first-wins switch in a cleanup PR.
S3 — compareOptions periodic refetch is pre-existing behaviour, not worth changing here.
S5 — no new unit tests. Real regret; I'll add run-label.test.ts + the /api/compare golden-fixture assertion in a small follow-up.

Verification after the fixes:

bun run lint ✅
bun run typecheck ✅
bun run test ✅ (1476 core + 67 eval + 433 cli = 1976 passing)
bun run build ✅ (studio bundle 468.08 kB / 129.90 kB gzip)
Pre-push hook (Build / Typecheck / Lint / Test / Validate eval YAML) — all Passed
Live UAT re-run: aggregated view OK in both 1×1 notice mode and 2×2 matrix mode, per-run selection + label edit + label clear + side-by-side compare view all still work

- Replace the single compare screenshot with three fresh shots at 1680x1000: the side-by-side per-run view (hero), the aggregated matrix, and the per-run list with labels. - Expand the Studio `## Compare` section to describe both modes, when to use per-run mode, how the sticky Compare N flow works, and how retroactive labels persist as sidecar `label.json` files. - While in CompareTab.tsx: honor `prefers-reduced-motion` (disables entrance animations, row stagger, hover translations), and restore focus to the row's label trigger button when the inline label editor closes so keyboard users don't lose their place.

Rewrites CompareTab markup from scratch using the same Tailwind patterns as the rest of Studio (ExperimentsTab, TargetsTab, RunList, PassRatePill) so the Compare tab is visually consistent with the rest of the app. Before: the component carried its own "Editorial Data Terminal" theme — Fraunces serif, JetBrains Mono, warm off-black canvas, antique gold hairlines, scoped via inline <style> on [data-compare-root]. This was jarringly off-brand. After: plain Tailwind utilities sourced from the existing Studio palette: - Surfaces: rounded-lg border border-gray-800 on gray-900/50 backgrounds - Tables: divide-y divide-gray-800/50 with hover:bg-gray-900/30 - Accents: cyan-400 / cyan-500 for interactive and selected states - Tones: emerald-400 (pass), red-400 (fail), yellow-400 (warn), matching ExperimentsTab and the existing Legend swatches - Pass rates: reuse the shared PassRatePill component everywhere - Selection highlight: cyan-950/20 row tint with a sticky cyan action bar - Label chip: cyan-bordered pill, matching cyan link styling elsewhere Drops the entire ScopedStyles block and the data-compare-root wrapper. Functional behavior (state, mutations, keyboard handlers, focus return, Rules-of-Hooks order, control-char validation) is preserved. Studio JS bundle drops ~31 KB (468 KB → 438 KB) from removing the embedded <style> string; CSS grows slightly from new Tailwind utilities. Screenshots in apps/web/src/assets/screenshots/studio-compare-*.png are re-captured to reflect the corrected styling.

Replaces the single-valued `label` feature with multi-valued `tags`, matching the Langfuse / W&B / GitHub convention for mutable post-hoc run annotations. A singular label boxed us in for future use cases like `regression + slow + v2-prompt`-style cross-cutting filters; tags keep the door open without blocking the current compare-column-header use case. Rationale: - "Label" (singular) is an uncommon vocabulary — Langfuse, W&B, and GitHub all use plural `tags`, and MLflow uses a singular `runName` only for the immutable display identity (not post-hoc annotations). - Experiment (set at eval-run time) is the run's grouping key; tags layer mutable cross-cutting attributes on top without touching the JSONL manifest. - Per-run compare already solved the ad-hoc comparison mechanics; this rename gives the UX a richer identity layer. Backend: - `apps/cli/src/commands/results/run-label.ts` → `run-tags.ts`: - `RUN_LABEL_FILENAME` → `RUN_TAGS_FILENAME` (`label.json` → `tags.json`) - `RunLabelFile { label: string }` → `RunTagsFile { tags: string[] }` - `readRunLabel/writeRunLabel/deleteRunLabel` → `readRunTags/writeRunTags/deleteRunTags` - New `normalizeTags()` helper: trim, dedupe, validate per-tag length (≤60 chars), reject control chars, enforce MAX_TAGS_PER_RUN (20). - Writing an empty array deletes the sidecar (single idempotent path). - `serve.ts`: - `PUT/DELETE /api/runs/:filename/label` → `/tags` (plus benchmark-scoped). - `handleRunLabelPut/Delete` → `handleRunTagsPut/Delete`. - `CompareRunEntry.label?` → `CompareRunEntry.tags?: string[]`. - `handleRuns` / `handleCompare` read `readRunTags` and surface `tags[]`. - `CompareRunEntry` and `RunMeta` wire-format fields updated accordingly. Frontend: - `types.ts`: `RunMeta.label?` → `tags?: string[]`; `RunLabelResponse { label }` → `RunTagsResponse { tags: string[] }`. - `api.ts`: `saveRunLabelApi/deleteRunLabelApi` → `saveRunTagsApi/deleteRunTagsApi`; URL paths `/label` → `/tags`; request body `{ tags }`. - `CompareTab.tsx`: - Table column "Label" → "Tags". - Per-run row: cell shows every tag as a cyan-bordered chip (wraps for long lists); placeholder "+ tags" dashed pill when empty. - New `TagsEditor` replaces `LabelEditor`: inline chip-based editor with staged `string[]` state, Enter/comma commits new tag, Backspace on empty input removes the last chip, × on each chip removes that specific tag, Clear all wipes the sidecar, Save persists the array. - `RunColumnHeader` (side-by-side view): timestamp stays as the primary identifier, tags render as small chips below it (was single label replacing the timestamp — now both coexist so the run's immutable identity is always visible). - Focus-restore on editor close preserved for keyboard users. Docs: - `apps/web/src/content/docs/docs/tools/studio.mdx`: - "Retroactive labels" section → "Retroactive tags"; explains the multi-valued model, the limits (20 tags × 60 chars), and the chip editor shortcuts. - Features bullet updated. - Alt text and surrounding prose reworded (`labelled` → `tagged`, `label cell` → `Tags cell`). - Screenshots recaptured: per-run view now shows one run with two tags (`improved-prompt`, `v2`) and another with one tag (`baseline`) so the multi-valued pattern is visible; side-by-side view shows tag chips directly under each column's timestamp.

Generated the file skeleton via `npx getdesign@latest add minimax` and then rewrote every section to describe AgentV Studio's actual design language rather than MiniMax's marketing-page aesthetic. The result is a practical reference that future agents and humans can drop into a Claude Code / Cursor session to keep new Studio surfaces consistent with the existing ones. Contents: - Color palette (gray-950 canvas, gray-900 surfaces, single cyan-400 accent, emerald/yellow/red data tones, blue gradient reserved for PassRatePill) - Typography (single system-ui stack, no webfonts, text-sm default, tabular-nums mandatory on numeric columns, font-medium over bold) - Canonical component patterns copied verbatim from ExperimentsTab, TargetsTab, RunList, and PassRatePill so new code can lift them without reinventing - Do / don't list codifying the hard rules: one accent, no shadows for elevation, no rounded-xl, no webfonts, PassRatePill is the only blue in the app, data tones never leak into interactive chrome - Responsive + layout principles matching the dense, desktop-first inspector posture of the current Studio UI - Agent prompt guide with ready-to-paste snippets for tables, primary buttons, segmented controls, tag chips, empty states, and form rows Placed at apps/studio/DESIGN.md (scoped to the studio app) so it lives next to the code it describes. This is documentation only — no runtime or build impact.

christso · 2026-04-11T06:48:42Z

Ready for merge.

Final state after review + design pivots

Original feature (commit `aee2186`): per-run compare mode + retroactive run annotations, fixes the collapse bug described in feat(studio): per-run comparison with retroactive labelling #1037.
Review fixes (commit `c993a20`): Rules-of-Hooks violation in `AggregatedView`, missing `benchmarkId`/`readOnly` prop forwarding in both callers, LabelEditor save/clear race, control-character sanitization.
Style rework (commit `0b732db`): rewrote `CompareTab` from scratch with Tailwind utilities matching the rest of Studio (`gray-950` canvas, `cyan-400` accent, shared `PassRatePill`). Dropped the earlier "Editorial Data Terminal" theme that had drifted off-brand. JS bundle dropped ~31 KB as a bonus.
Rename to tags[] (commit `5c48a53`): pivoted from singular `label` to plural `tags` to match the Langfuse / W&B / GitHub convention for mutable post-hoc run annotations. Each run can now carry up to 20 tags (≤60 chars each, control-char rejected, deduped). Chip-based inline editor replaces the single-input label editor; compare column headers now show tags as chips below the immutable timestamp instead of replacing it. Screenshots re-captured to show one run tagged `[improved-prompt, v2]` and another tagged `[baseline]` so the multi-valued pattern is visible.
DESIGN.md (commit `7fd9f06`): scaffolded via `npx getdesign@latest add minimax` and rewritten to document Studio's actual dark + cyan style. Placed at `apps/studio/DESIGN.md` as a reference for future agents working on Studio UI.

Verification

`bun run build`, `typecheck`, `lint`, `test` all green (1976 tests pass)
prek pre-push hook (Build / Typecheck / Lint / Test / Validate eval YAML) passed on every push
`apps/web` builds cleanly with the new screenshots embedded
Live manual UAT via `agent-browser --cdp 9222` against `bun apps/cli/src/cli.ts studio --port 9100 --single` on a 4-run fixture, covering:
- Aggregated matrix (2×2) renders correctly in the new cyan style
- Per-run list shows all 4 runs with the Tags column and chip affordances
- Multi-valued tag editor: add, remove via ×, remove last via Backspace, Clear all, Save, Cancel
- Side-by-side compare view with chips under each column's timestamp
- Aggregated 1×1 "Not enough variation" edge case (verifies the Rules-of-Hooks fix)
- Flip back-and-forth between modes — no regressions
CI green (Check Links / Validate Marketplace / Validate Evals / Cloudflare Pages all pass)

Deferred / follow-up

Tracked as #1041: Filter compare views by tag. Tag filtering (chip row above the compare view to narrow both matrix and per-run table to runs matching a selected tag set) was discussed in this PR's thread and intentionally held out of scope — #1037 is a collapse-bug fix, the tag filter is adjacent but not required, and no concrete user has asked for it yet. The issue documents the design direction (filter, not dimension), the recommended OR semantics, and an implementation sketch.

Squash-merging now.

christso · 2026-04-11T07:26:30Z

Post-merge manual UAT (agent-browser, interactive)

I ran the full interactive verification I should have done before merge — not just screenshot-rendering, but clicking every button and pressing every key — against the merged main (commit `016607e7`). All 11 interactive flows pass. Specific evidence below.

Setup:

Rebuilt `apps/studio/dist` from merged source (the previous dist I'd been serving was stale from April 9, which is what tripped me up in the first screenshot attempt).
Ran `bun apps/cli/src/cli.ts studio --port 9100 --single` against a fresh fixture at `/tmp/1037-uat-fixture` with 4 synthetic runs (2 sharing `(exp-a, claude-sonnet)`).
Drove via `agent-browser --cdp 9222` with manual click/keystroke dispatching for every interaction.

#	Flow	Evidence	Status
1	Click `+ tags` cell opens the inline `TagsEditor` below the row	Editor row appears with "TAG RUN" label, focused input, disabled Save button (no changes yet)	✅
2	Type + Enter commits a chip to the staged list	Typed `improved-prompt`, pressed Enter → chip appeared, input cleared, Save button enabled	✅
3	Comma commits a chip	Typed `v2`, pressed comma key → second chip appeared, input cleared. Note: `agent-browser keyboard type "v2,"` pastes the comma literally (no `keydown` per-char), so I used `agent-browser press ","` to dispatch a real `keydown` event. The component's `onKeyDown` handler intercepts `e.key === ','` correctly for real user input.	✅
4	Save persists the full array and closes the editor	Clicked Save → editor closed, 11:00 row shows both chips, `tags.json` on disk contains `["improved-prompt","v2"]` with fresh `updated_at`, focus returned to the Tags button (cyan outline visible in screenshot)	✅
5	× on a specific chip removes just that tag	Reopened editor, clicked × on `v2` → `v2` removed from staged list while `improved-prompt` stayed, Save enabled, click Save → sidecar on disk now `["improved-prompt"]` only	✅
6	Backspace on empty input removes the last chip	Reopened editor, pressed Backspace on empty input → `improved-prompt` chip removed, staged list now empty	✅
7	Clear all deletes the sidecar	Repopulated tags, clicked Clear all → editor closed, `tags.json` removed from disk (only `index.jsonl` in the run dir), row shows `+ tags` placeholder again	✅
8	Cancel discards staged changes	Opened editor, Backspace'd the chip, clicked Cancel → sidecar on disk unchanged (same `updated_at`), row still shows `improved-prompt`	✅
9	Escape discards staged changes	Opened editor, typed `draft-tag`, pressed Escape → editor closed, input + typed text discarded, sidecar on disk unchanged	✅
10	Duplicate tag silently deduped	Tried to re-add `improved-prompt` while it was already in the staged list → chip count stays at 1, input clears (user gets feedback that the action "took" without adding anything)	✅
11	Save disabled until `hasChanges` is true	Reopened editor without touching anything → `saveButton.disabled === true` via DOM inspection	✅

Also re-verified during the same session:

The aggregated matrix renders correctly (2×2, `PassRatePill` blue gradient, cyan Compare tab accent, legend in gray)
Mode toggle switches between Aggregated and Per-run without dropping state
Focus-return on editor close works (keyboard users don't lose their place in the table)

Finding

One real bug: I was serving a stale studio dist. The `apps/cli/src/cli.ts studio` command serves `apps/studio/dist/` as static assets, and the dist folder is build output (gitignored). After pulling main post-merge, I didn't rebuild the studio bundle, so my first UAT attempt was driving against a pre-#1040 build that didn't have `TagsEditor` at all. Rebuilding fixed it. This is a gotcha worth documenting — the current AGENTS.md guidance for functional CLI testing says "From TypeScript source (preferred): `bun apps/cli/src/cli.ts …`", which works for CLI logic but does not rebuild the embedded studio UI bundle. Working in a fresh worktree where you run `bun install` + `bun run build` anyway would have caught this; in the primary checkout I skipped the rebuild and got burned.

I've noted this in my project memory but it might be worth a one-line addition to the AGENTS.md "Functional Testing (CLI)" section, e.g. "If you are testing Studio UI changes, rebuild the studio bundle first: `cd apps/studio && bun run build`. The studio CLI serves static assets from `apps/studio/dist/` — it does NOT recompile on change like the Vite dev server does." Happy to open a tiny follow-up PR for that if you agree.

No code changes needed

All flows work as designed. Nothing to hot-fix. The tag editor is behaving correctly post-merge.

Running `bun apps/cli/src/cli.ts studio` only live-reloads the CLI and backend routes. The Studio web UI is served as a static bundle from `apps/studio/dist/`, which is build output and does not recompile on source changes. Without a manual `bun run build` in `apps/studio`, `agentv studio` silently serves whatever JS/CSS was last built — which may be from a different branch, before the merge you just pulled, or simply stale. This bit the post-merge UAT on #1040: the TagsEditor component was correctly in the source but not in the dist, so the driven-browser session kept rendering an older Compare tab and looked like a feature regression. Cost ~15 minutes of confusion to diagnose. Adds a paragraph under the existing "Functional Testing (CLI)" section so the next agent (or human) knows to rebuild the Studio dist before screenshotting or driving `agent-browser` against Studio.

#1042) Running `bun apps/cli/src/cli.ts studio` only live-reloads the CLI and backend routes. The Studio web UI is served as a static bundle from `apps/studio/dist/`, which is build output and does not recompile on source changes. Without a manual `bun run build` in `apps/studio`, `agentv studio` silently serves whatever JS/CSS was last built — which may be from a different branch, before the merge you just pulled, or simply stale. This bit the post-merge UAT on #1040: the TagsEditor component was correctly in the source but not in the dist, so the driven-browser session kept rendering an older Compare tab and looked like a feature regression. Cost ~15 minutes of confusion to diagnose. Adds a paragraph under the existing "Functional Testing (CLI)" section so the next agent (or human) knows to rebuild the Studio dist before screenshotting or driving `agent-browser` against Studio. Co-authored-by: devbox2-codex <devbox2-codex@agents.local>

devbox2-codex added 4 commits April 11, 2026 03:58

christso mentioned this pull request Apr 11, 2026

Filter compare views by tag #1041

Closed

christso marked this pull request as ready for review April 11, 2026 06:48

christso merged commit 016607e into main Apr 11, 2026
4 checks passed

christso deleted the feat/1037-per-run-compare branch April 11, 2026 06:48

christso mentioned this pull request Apr 11, 2026

docs(agents): note Studio stale-dist trap under Functional Testing #1042

Merged

4 tasks

christso mentioned this pull request Apr 11, 2026

feat(studio): filter compare views by tag #1043

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(studio): per-run comparison with retroactive labelling#1040

feat(studio): per-run comparison with retroactive labelling#1040
christso merged 6 commits intomainfrom
feat/1037-per-run-compare

christso commented Apr 10, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

christso commented Apr 10, 2026

Uh oh!

christso commented Apr 11, 2026

Uh oh!

Uh oh!

christso commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files touched

Verification

Test plan — verified

Follow-up

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

christso commented Apr 10, 2026

Review follow-up (c993a20)

Uh oh!

christso commented Apr 11, 2026

Final state after review + design pivots

Verification

Deferred / follow-up

Uh oh!

Uh oh!

christso commented Apr 11, 2026

Post-merge manual UAT (agent-browser, interactive)

Finding

No code changes needed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Apr 10, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Apr 10, 2026 •

edited

Loading

Review follow-up (`c993a20`)