Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .npmrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
legacy-peer-deps=true
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Current Phase

Phases 0–26 are complete.

Phase 27 (remote access & collaboration) is complete. Phase 29 (model experimentation & eval) is next. See docs/27_PROJECT_ROADMAP.md for the full roadmap through Phase 30.
Phase 27 (remote access & collaboration) is complete. Phase 29 (model experimentation & eval) is in progress. See docs/27_PROJECT_ROADMAP.md for the full roadmap through Phase 30.

Protocol Rules

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<a href="package.json"><img src="https://img.shields.io/badge/Bun-%3E%3D1.0-cc00ff?logo=bun" alt="Bun" /></a>
<a href="tsconfig.base.json"><img src="https://img.shields.io/badge/TypeScript-6.0-3178C6?logo=typescript" alt="TypeScript" /></a>
<a href="biome.json"><img src="https://img.shields.io/badge/code_style-Biome-60a5fa?logo=biome" alt="Code Style" /></a>
<a href="#"><img src="https://img.shields.io/badge/tests-602%20passing-brightgreen" alt="Tests" /></a>
<a href="#"><img src="https://img.shields.io/badge/tests-604%20passing-brightgreen" alt="Tests" /></a>
<a href="#"><img src="https://img.shields.io/badge/packages-20-blue" alt="Packages" /></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="License" /></a>
<a href="CONTRIBUTING.md"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen" alt="PRs Welcome" /></a>
Expand All @@ -16,7 +16,7 @@

---

> **Status:** Phases 0–27 complete · **602 tests, 0 failures** · Phase 29 (model eval) next
> **Status:** Phases 0–27 complete · **604 tests, 0 failures** · Phase 29 (model eval) in progress

---

Expand Down Expand Up @@ -71,7 +71,7 @@ bun install
# Build all workspace packages
bash scripts/build-all.sh

# Run the full test suite (523 tests, all passing)
# Run the full test suite (604 tests, all passing)
bun test

# Start the server (Terminal 1)
Expand Down Expand Up @@ -254,7 +254,7 @@ All core systems are implemented and tested:
- ✅ **Multi-session & workspaces** — side-by-side sessions, workspace management, bulk operations
- ✅ **Observability** (packages/telemetry) — OpenTelemetry tracing, Prometheus metrics, error reporting, audit log
- ✅ **Plugin system** (packages/plugin-sdk) — tool, provider, hook, and panel extension points; CLI management; sandbox permissions
- ✅ **Automated testing** — 523 tests (unit, integration, e2e)
- ✅ **Automated testing** — 604 tests (unit, integration, e2e)
- ✅ **CI/CD pipeline** — GitHub Actions with static check + typecheck + tests + E2E

---
Expand Down Expand Up @@ -319,7 +319,7 @@ When continuing this project via an AI agent:

```bash
# Full test suite
bun test # 523 tests, 0 failures, 1495 expect() calls
bun test # 604 tests, 0 failures, 1686 expect() calls

# Build everything
bash scripts/build-all.sh
Expand Down
2 changes: 1 addition & 1 deletion apps/cli/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"typecheck": "tsc --noEmit"
},
"dependencies": {
"@agent-workbench/plugin-sdk": "workspace:*"
"@agent-workbench/plugin-sdk": "*"
},
"devDependencies": {
"@types/bun": "^1.3.14"
Expand Down
14 changes: 12 additions & 2 deletions apps/cli/templates/bun/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# my-bun-project
# Bun Template

Scaffolded with `agent-workbench init bun`.
Scaffolded with `agent-workbench init bun`. This is a template for new Bun projects with TypeScript, testing, and watch mode pre-configured.

## Quick Start

Expand All @@ -10,3 +10,13 @@ bun run start
bun run dev # Watch mode
bun test # Run tests
```

## Template Structure

- `src/` — Application source code
- `src/hello.ts` — Entry point with sample code
- `src/hello.test.ts` — Sample test
- `tsconfig.json` — TypeScript configuration
- `package.json` — Project metadata with dev/watch/test scripts

Customize `package.json` with your project name and description after scaffolding.
12 changes: 10 additions & 2 deletions apps/cli/templates/typescript/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# my-project
# TypeScript Template

Scaffolded with `agent-workbench init typescript`.
Scaffolded with `agent-workbench init typescript`. This is a template for new TypeScript projects.

## Quick Start

Expand All @@ -9,3 +9,11 @@ bun install
bun run build
bun run src/index.ts
```

## Template Structure

- `src/` — Application source code
- `tsconfig.json` — TypeScript configuration
- `package.json` — Project metadata and scripts

Customize `package.json` with your project name and description after scaffolding.
18 changes: 11 additions & 7 deletions apps/dashboard/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# @agent-workbench/dashboard
# 📊 @agent-workbench/dashboard

Web-based monitoring dashboard for the agent-workbench server. Provides real-time visibility into agent sessions, system metrics, and provider status via SSE-driven live updates.

Expand All @@ -9,7 +9,6 @@ Web-based monitoring dashboard for the agent-workbench server. Provides real-tim
cd apps/server && bun run dev

# Dashboard available at http://localhost:8787/dashboard
# Point your browser to the /dashboard route after starting the server
```

## Features
Expand All @@ -19,11 +18,16 @@ cd apps/server && bun run dev
- **Provider status**: Health checks for all configured model providers
- **Live updates**: SSE-based real-time data streaming

## Scope
## Architecture

- Real-time session monitoring
- System metrics visualization
- Provider status dashboard
- SSE-based live updates
Built with SolidJS + Tailwind CSS. Consumes the typed SDK (`@agent-workbench/sdk`) to connect to the local server. All data flows through SSE event streams — no polling.

## Development

```bash
cd apps/dashboard && bun run dev # Dev server with hot reload
cd apps/dashboard && bun run build # Production build
cd apps/dashboard && bun run typecheck
```

Part of **Phase 25** (observability & production readiness).
4 changes: 2 additions & 2 deletions apps/mobile-web/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
"typecheck": "tsc --noEmit"
},
"dependencies": {
"@agent-workbench/protocol": "workspace:*",
"@agent-workbench/sdk": "workspace:*",
"@agent-workbench/protocol": "*",
"@agent-workbench/sdk": "*",
"marked": "^18.0.5",
"solid-js": "^1.9.14"
},
Expand Down
28 changes: 14 additions & 14 deletions apps/server/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,20 @@
"typecheck": "tsc --noEmit"
},
"dependencies": {
"@agent-workbench/cache": "workspace:*",
"@agent-workbench/core": "workspace:*",
"@agent-workbench/events": "workspace:*",
"@agent-workbench/models": "workspace:*",
"@agent-workbench/permissions": "workspace:*",
"@agent-workbench/protocol": "workspace:*",
"@agent-workbench/shell": "workspace:*",
"@agent-workbench/storage": "workspace:*",
"@agent-workbench/tokens": "workspace:*",
"@agent-workbench/tools": "workspace:*",
"@agent-workbench/telemetry": "workspace:*",
"@agent-workbench/plugin-sdk": "workspace:*",
"@agent-workbench/auth": "workspace:*",
"@agent-workbench/collab": "workspace:*",
"@agent-workbench/cache": "*",
"@agent-workbench/core": "*",
"@agent-workbench/events": "*",
"@agent-workbench/models": "*",
"@agent-workbench/permissions": "*",
"@agent-workbench/protocol": "*",
"@agent-workbench/shell": "*",
"@agent-workbench/storage": "*",
"@agent-workbench/tokens": "*",
"@agent-workbench/tools": "*",
"@agent-workbench/telemetry": "*",
"@agent-workbench/plugin-sdk": "*",
"@agent-workbench/auth": "*",
"@agent-workbench/collab": "*",
"hono": "^4.12.27",
"ulid": "^2.3.0",
"zod": "^4.4.3"
Expand Down
6 changes: 3 additions & 3 deletions apps/tui/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
"typecheck": "tsc --noEmit"
},
"dependencies": {
"@agent-workbench/eval": "workspace:*",
"@agent-workbench/protocol": "workspace:*",
"@agent-workbench/sdk": "workspace:*",
"@agent-workbench/eval": "*",
"@agent-workbench/protocol": "*",
"@agent-workbench/sdk": "*",
"@opentui/core": "0.4.2",
"@opentui/solid": "0.4.2",
"solid-js": "1.9.14"
Expand Down
45 changes: 28 additions & 17 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,34 @@
# Benchmarks for agent-workbench
#
# Run with: bun vitest bench --reporter=verbose
# (Requires vitest — not a dependency yet)
# Benchmarks

## Server benchmarks (planned)
Performance benchmarks for agent-workbench packages and apps.

## Running

```bash
# Run the benchmark suite
bun run benchmarks/benchmark-runner.ts
```

## What's Benchmarked

### Build & Type System
- Build time per package (`tsc`)
- Typecheck time per package
- Bundle size analysis

### Runtime
- Server startup time
- Session creation throughput
- Message submission latency (10, 100, 1000 messages)
- Concurrent session handling (10, 50 simulataneous)
- Message submission latency

### TUI
- Render time for large timelines
- Command palette search latency

## TUI benchmarks (planned)
- Render time for 100+ message timeline
- Command palette search latency (1000 entries)
- Panel switch latency
### Permission Engine
- Policy evaluation with many rules
- Plan evaluation with many steps

## SDK benchmarks (planned)
- Session list with 100+ sessions
- Stream throughput for long model responses
## Adding Benchmarks

## Permission engine benchmarks (planned)
- Policy evaluation with 100+ rules
- Plan evaluation with 50+ steps
Add new benchmark suites in `benchmarks/tools/`. Each suite should export a `run()` function that returns a `BenchmarkResult`.
2 changes: 1 addition & 1 deletion bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 11 additions & 11 deletions decisions/0017-ci-pipeline-and-e2e-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,17 +131,17 @@ Two new E2E tests using mock providers, temp databases, and random ports:
## Validation Checklist

```text
[ ] GitHub Actions CI pipeline configured and triggers on push to main.
[ ] Pipeline runs bun test — all tests pass.
[ ] Pipeline runs bash scripts/test-health.sh — all checks pass.
[ ] Pipeline runs bun run typecheck in every workspace package.
[ ] Pipeline runs git diff --check — no whitespace errors.
[ ] Pipeline reports pass/fail status on PRs.
[ ] Full-stack E2E test covers: server start → health → provider route → SDK session → model response → shutdown.
[ ] Streaming E2E test validates: SSE subscription → stream_delta events → stream_complete → final persistence.
[ ] All E2E tests use mock providers, temp databases, random ports.
[ ] CI completes within 5 minutes for the full suite.
[ ] CI requires no secrets, API keys, or network access.
[x] GitHub Actions CI pipeline configured and triggers on push to main.
[x] Pipeline runs bun test — all tests pass.
[x] Pipeline runs bash scripts/test-health.sh — all checks pass.
[x] Pipeline runs bun run typecheck in every workspace package.
[x] Pipeline runs git diff --check — no whitespace errors.
[x] Pipeline reports pass/fail status on PRs.
[x] Full-stack E2E test covers: server start → health → provider route → SDK session → model response → shutdown.
[x] Streaming E2E test validates: SSE subscription → stream_delta events → stream_complete → final persistence.
[x] All E2E tests use mock providers, temp databases, random ports.
[x] CI completes within 5 minutes for the full suite.
[x] CI requires no secrets, API keys, or network access.
```

## Notes for Future Agents
Expand Down
Loading
Loading