MerverliPy · MerverliPy · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026
@@ -0,0 +1 @@
+legacy-peer-deps=true
@@ -46,7 +46,7 @@ Current Phase
 
 Phases 0–26 are complete.
 
-Phase 27 (remote access & collaboration) is complete. Phase 29 (model experimentation & eval) is next. See docs/27_PROJECT_ROADMAP.md for the full roadmap through Phase 30.
+Phase 27 (remote access & collaboration) is complete. Phase 29 (model experimentation & eval) is in progress. See docs/27_PROJECT_ROADMAP.md for the full roadmap through Phase 30.
 
 Protocol Rules
 

@@ -7,7 +7,7 @@
     <a href="package.json"><img src="https://img.shields.io/badge/Bun-%3E%3D1.0-cc00ff?logo=bun" alt="Bun" /></a>
     <a href="tsconfig.base.json"><img src="https://img.shields.io/badge/TypeScript-6.0-3178C6?logo=typescript" alt="TypeScript" /></a>
     <a href="biome.json"><img src="https://img.shields.io/badge/code_style-Biome-60a5fa?logo=biome" alt="Code Style" /></a>
-    <a href="#"><img src="https://img.shields.io/badge/tests-602%20passing-brightgreen" alt="Tests" /></a>
+    <a href="#"><img src="https://img.shields.io/badge/tests-604%20passing-brightgreen" alt="Tests" /></a>
     <a href="#"><img src="https://img.shields.io/badge/packages-20-blue" alt="Packages" /></a>
     <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="License" /></a>
     <a href="CONTRIBUTING.md"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs Welcome" /></a>
@@ -16,7 +16,7 @@
 
 ---
 
-> **Status:** Phases 0–27 complete · **602 tests, 0 failures** · Phase 29 (model eval) next
+> **Status:** Phases 0–27 complete · **604 tests, 0 failures** · Phase 29 (model eval) in progress
 
 ---
 
@@ -71,7 +71,7 @@ bun install
 # Build all workspace packages
 bash scripts/build-all.sh
 
-# Run the full test suite (523 tests, all passing)
+# Run the full test suite (604 tests, all passing)
 bun test
 
 # Start the server (Terminal 1)
@@ -254,7 +254,7 @@ All core systems are implemented and tested:
 - ✅ **Multi-session & workspaces** — side-by-side sessions, workspace management, bulk operations
 - ✅ **Observability** (packages/telemetry) — OpenTelemetry tracing, Prometheus metrics, error reporting, audit log
 - ✅ **Plugin system** (packages/plugin-sdk) — tool, provider, hook, and panel extension points; CLI management; sandbox permissions
-- ✅ **Automated testing** — 523 tests (unit, integration, e2e)
+- ✅ **Automated testing** — 604 tests (unit, integration, e2e)
 - ✅ **CI/CD pipeline** — GitHub Actions with static check + typecheck + tests + E2E
 
 ---
@@ -319,7 +319,7 @@ When continuing this project via an AI agent:
 
 ```bash
 # Full test suite
-bun test                           # 523 tests, 0 failures, 1495 expect() calls
+bun test                           # 604 tests, 0 failures, 1686 expect() calls
 
 # Build everything
 bash scripts/build-all.sh

@@ -13,7 +13,7 @@
     "typecheck": "tsc --noEmit"
   },
   "dependencies": {
-    "@agent-workbench/plugin-sdk": "workspace:*"
+    "@agent-workbench/plugin-sdk": "*"
   },
   "devDependencies": {
     "@types/bun": "^1.3.14"

@@ -1,6 +1,6 @@
-# my-bun-project
+# Bun Template
 
-Scaffolded with `agent-workbench init bun`.
+Scaffolded with `agent-workbench init bun`. This is a template for new Bun projects with TypeScript, testing, and watch mode pre-configured.
 
 ## Quick Start
 
@@ -10,3 +10,13 @@ bun run start
 bun run dev    # Watch mode
 bun test       # Run tests
 ```
+
+## Template Structure
+
+- `src/` — Application source code
+- `src/hello.ts` — Entry point with sample code
+- `src/hello.test.ts` — Sample test
+- `tsconfig.json` — TypeScript configuration
+- `package.json` — Project metadata with dev/watch/test scripts
+
+Customize `package.json` with your project name and description after scaffolding.
@@ -1,6 +1,6 @@
-# my-project
+# TypeScript Template
 
-Scaffolded with `agent-workbench init typescript`.
+Scaffolded with `agent-workbench init typescript`. This is a template for new TypeScript projects.
 
 ## Quick Start
 
@@ -9,3 +9,11 @@ bun install
 bun run build
 bun run src/index.ts
 ```
+
+## Template Structure
+
+- `src/` — Application source code
+- `tsconfig.json` — TypeScript configuration
+- `package.json` — Project metadata and scripts
+
+Customize `package.json` with your project name and description after scaffolding.
@@ -1,4 +1,4 @@
-# @agent-workbench/dashboard
+# 📊 @agent-workbench/dashboard
 
 Web-based monitoring dashboard for the agent-workbench server. Provides real-time visibility into agent sessions, system metrics, and provider status via SSE-driven live updates.
 
@@ -9,7 +9,6 @@ Web-based monitoring dashboard for the agent-workbench server. Provides real-tim
 cd apps/server && bun run dev
 
 # Dashboard available at http://localhost:8787/dashboard
-# Point your browser to the /dashboard route after starting the server
 ```
 
 ## Features
@@ -19,11 +18,16 @@ cd apps/server && bun run dev
 - **Provider status**: Health checks for all configured model providers
 - **Live updates**: SSE-based real-time data streaming
 
-## Scope
+## Architecture
 
-- Real-time session monitoring
-- System metrics visualization
-- Provider status dashboard
-- SSE-based live updates
+Built with SolidJS + Tailwind CSS. Consumes the typed SDK (`@agent-workbench/sdk`) to connect to the local server. All data flows through SSE event streams — no polling.
+
+## Development
+
+```bash
+cd apps/dashboard && bun run dev     # Dev server with hot reload
+cd apps/dashboard && bun run build   # Production build
+cd apps/dashboard && bun run typecheck
+```
 
 Part of **Phase 25** (observability & production readiness).
@@ -11,8 +11,8 @@
     "typecheck": "tsc --noEmit"
   },
   "dependencies": {
-    "@agent-workbench/protocol": "workspace:*",
-    "@agent-workbench/sdk": "workspace:*",
+    "@agent-workbench/protocol": "*",
+    "@agent-workbench/sdk": "*",
     "marked": "^18.0.5",
     "solid-js": "^1.9.14"
   },

@@ -23,20 +23,20 @@
     "typecheck": "tsc --noEmit"
   },
   "dependencies": {
-    "@agent-workbench/cache": "workspace:*",
-    "@agent-workbench/core": "workspace:*",
-    "@agent-workbench/events": "workspace:*",
-    "@agent-workbench/models": "workspace:*",
-    "@agent-workbench/permissions": "workspace:*",
-    "@agent-workbench/protocol": "workspace:*",
-    "@agent-workbench/shell": "workspace:*",
-    "@agent-workbench/storage": "workspace:*",
-    "@agent-workbench/tokens": "workspace:*",
-    "@agent-workbench/tools": "workspace:*",
-    "@agent-workbench/telemetry": "workspace:*",
-    "@agent-workbench/plugin-sdk": "workspace:*",
-    "@agent-workbench/auth": "workspace:*",
-    "@agent-workbench/collab": "workspace:*",
+    "@agent-workbench/cache": "*",
+    "@agent-workbench/core": "*",
+    "@agent-workbench/events": "*",
+    "@agent-workbench/models": "*",
+    "@agent-workbench/permissions": "*",
+    "@agent-workbench/protocol": "*",
+    "@agent-workbench/shell": "*",
+    "@agent-workbench/storage": "*",
+    "@agent-workbench/tokens": "*",
+    "@agent-workbench/tools": "*",
+    "@agent-workbench/telemetry": "*",
+    "@agent-workbench/plugin-sdk": "*",
+    "@agent-workbench/auth": "*",
+    "@agent-workbench/collab": "*",
     "hono": "^4.12.27",
     "ulid": "^2.3.0",
     "zod": "^4.4.3"

@@ -10,9 +10,9 @@
     "typecheck": "tsc --noEmit"
   },
   "dependencies": {
-    "@agent-workbench/eval": "workspace:*",
-    "@agent-workbench/protocol": "workspace:*",
-    "@agent-workbench/sdk": "workspace:*",
+    "@agent-workbench/eval": "*",
+    "@agent-workbench/protocol": "*",
+    "@agent-workbench/sdk": "*",
     "@opentui/core": "0.4.2",
     "@opentui/solid": "0.4.2",
     "solid-js": "1.9.14"

@@ -1,23 +1,34 @@
-# Benchmarks for agent-workbench
-#
-# Run with: bun vitest bench --reporter=verbose
-# (Requires vitest — not a dependency yet)
+# Benchmarks
 
-## Server benchmarks (planned)
+Performance benchmarks for agent-workbench packages and apps.
+
+## Running
+
+```bash
+# Run the benchmark suite
+bun run benchmarks/benchmark-runner.ts
+```
+
+## What's Benchmarked
+
+### Build & Type System
+- Build time per package (`tsc`)
+- Typecheck time per package
+- Bundle size analysis
+
+### Runtime
 - Server startup time
 - Session creation throughput
-- Message submission latency (10, 100, 1000 messages)
-- Concurrent session handling (10, 50 simulataneous)
+- Message submission latency
+
+### TUI
+- Render time for large timelines
+- Command palette search latency
 
-## TUI benchmarks (planned)
-- Render time for 100+ message timeline
-- Command palette search latency (1000 entries)
-- Panel switch latency
+### Permission Engine
+- Policy evaluation with many rules
+- Plan evaluation with many steps
 
-## SDK benchmarks (planned)
-- Session list with 100+ sessions
-- Stream throughput for long model responses
+## Adding Benchmarks
 
-## Permission engine benchmarks (planned)
-- Policy evaluation with 100+ rules
-- Plan evaluation with 50+ steps
+Add new benchmark suites in `benchmarks/tools/`. Each suite should export a `run()` function that returns a `BenchmarkResult`.
@@ -131,17 +131,17 @@ Two new E2E tests using mock providers, temp databases, and random ports:
 ## Validation Checklist
 
 ```text
-[ ] GitHub Actions CI pipeline configured and triggers on push to main.
-[ ] Pipeline runs bun test — all tests pass.
-[ ] Pipeline runs bash scripts/test-health.sh — all checks pass.
-[ ] Pipeline runs bun run typecheck in every workspace package.
-[ ] Pipeline runs git diff --check — no whitespace errors.
-[ ] Pipeline reports pass/fail status on PRs.
-[ ] Full-stack E2E test covers: server start → health → provider route → SDK session → model response → shutdown.
-[ ] Streaming E2E test validates: SSE subscription → stream_delta events → stream_complete → final persistence.
-[ ] All E2E tests use mock providers, temp databases, random ports.
-[ ] CI completes within 5 minutes for the full suite.
-[ ] CI requires no secrets, API keys, or network access.
+[x] GitHub Actions CI pipeline configured and triggers on push to main.
+[x] Pipeline runs bun test — all tests pass.
+[x] Pipeline runs bash scripts/test-health.sh — all checks pass.
+[x] Pipeline runs bun run typecheck in every workspace package.
+[x] Pipeline runs git diff --check — no whitespace errors.
+[x] Pipeline reports pass/fail status on PRs.
+[x] Full-stack E2E test covers: server start → health → provider route → SDK session → model response → shutdown.
+[x] Streaming E2E test validates: SSE subscription → stream_delta events → stream_complete → final persistence.
+[x] All E2E tests use mock providers, temp databases, random ports.
+[x] CI completes within 5 minutes for the full suite.
+[x] CI requires no secrets, API keys, or network access.
 ```
 
 ## Notes for Future Agents