Progress

Live state of the repository. Update after every meaningful work increment (sub-task done, blocker hit, decision made). Entries dated YYYY-MM-DD. Newest first.

How to read this file

Each dated section is a single working day (or session).
Bullets are chronological inside a day.
Each bullet states what changed, why, and what's next where relevant.
After a session interruption, the last bullet of the latest day is the resume point.

2026-05-17

Started task/bootstrap-governance from main (commit c25dd4e = initial). Goal: stand up the process governance — branch strategy, validation loop, Copilot review automation, docs scaffolding, repo tooling — before any product code is written.
Task 0 closed. PR #1 merged (d9cb4b3). Tag v0.0.1-governance pushed. 8 Copilot review iterations; 52 actionable comments addressed.
Task 1 closed. PR #2 merged (3871cd1). @aqa/schemas — Zod source of truth + JSON Schema (Draft 2020-12) generated artifacts. Determinism contract from §3.1 codified in Finding. Hash-chained audit codified in Event. 4 Copilot review passes; 29 actionable comments addressed. Follow-up #3 tracks remaining JSON-Schema parity work.
Task 2 closed. PR #4 merged (895cec9). @aqa/kit — aqa CLI (init/doctor/validate) + project profiler. CI bun + Node 22 jobs aligned to per-package script runner; topological build added (run-workspace-script DFS sort) so downstream packages can resolve workspace imports through dist/.
Task 3 closed. PR #5 merged. @aqa/pack-loader + 5 baseline packs (core / api-core / web-ui / llm-agent / security). One Copilot review pass; 15 actionable comments addressed (slug placeholders, manifest descriptions, OWASP coverage scoped to v0.1.0 subset).
Task 4 closed. PR #6 merged. @aqa/adapters — Claude, Codex, Gemini, Copilot adapters with per-target capability profiles and deterministic render(ctx).
Task 5 closed. PR #7 merged. @aqa/runner — RunLifecycle state machine, hash-chained EventChainWriter (end-to-end verified), FindingsWriter (in-run dedup), built-in oracles, runScenario orchestrator.
Task 6 closed. PR #8 merged. @aqa/reporter — Markdown + JSON reporters + 3-level replay artifact generator (repro.sh, repro.curl, repro.playwright.ts).
Task 7 — admin panel bootstrap done. packages/admin (@aqa/admin, private) — Vite + React 19 + TS strict scaffold with a 12-route sidebar shell (Dashboard, Runs, Findings, Risk map, Profiles, Packs, Scenarios, Agents, Replay, Audit log, Cost, Settings). Each route renders a typed ScreenPlaceholder documenting what lands when. Vite build produces dist/ (197 KB JS, gzip 62 KB). Full Tailwind 4 wiring, TanStack Router migration, and per-screen data wiring are deferred to Task 17 (task/admin-editing). 4 node:test tests; 86 repo-wide.
Repo health snapshot: 9 packages (schemas, kit, pack-loader, adapters, runner, reporter, admin + 5 packs), 86 tests passing under both Bun and Node 22, biome + tsc strict zero errors, hash-chained audit verified, JSON Schemas Draft 2020-12 compliant.
Next: Task 8 — docs/getting-started.md (junior 15-min onboarding), docs/architecture/reference.md (real diagram + component map), docs/methodology/agentic-qa.md (Risk/Invariant/Probe/Oracle), ADR-001..ADR-009, examples/bun-api, examples/nextjs-saas, then v0.1.0 release tag. Task 9 (FINAL) — knowledge consolidation across LESSON.md / RULES.md / agent files.
Tasks 8 — 22 closed. v0.1.0 through v0.6.0 tags pushed (#9..#16). Stack grew to 18 packages: schemas, kit, pack-loader, pack-scanner, adapters, llm-adapters, runner, reporter, admin, admin-core, auth, sandbox, store, generator, server, clustering, methodology, + 5 packs. Deploy scaffolds (deploy/helm, deploy/terraform, scripts/air-gap-install.sh) shipped with explicit "v0.6 / v1.0" labels.
Task 23 — v1.0 readiness in progress. @aqa/compliance ships SOC2/ISO controls catalog (CONTROL_MAPPINGS, controlsCoverage) + hash-chain audit verifier (verifyEventChain, aqa-audit-verify CLI). docs/compliance/soc2-iso-mapping.md is the auditor-facing source of truth; docs/compliance/pen-test-scope.md is the engagement contract. 7 new tests; 165 repo-wide.
v1.1 polish shipped (PR #18, tag v1.1.0). README banner now points to a real PNG. deploy/helm is feature-complete (runner StatefulSet w/ per-pod PVC, optional Ingress + TLS, NetworkPolicy that confines runner egress, optional in-cluster Postgres subchart). Three examples: bun-api, nextjs-saas (session-cookie invariant), laravel-app (demonstrates language-agnostic targeting). docs/LESSON.md consolidated retrospective. GitHub Releases backfilled for every tag from v0.0.1-governance through v1.1.0. README pre-alpha badge replaced with GA + Release badges.

2026-05-18

v1.7 slices 1+2 shipped — pack authoring tutorial + aqa pack new CLI. PR #25 merged (6cc0013), prerelease tag v1.7.0-rc.1 published. 19 review iterations with Copilot + Codex; the convergence pattern hit a sharp tail (5→1→4→2→1→2→0 real items per round) after Copilot started re-flagging the same ~13 already-addressed comments. Real issues caught and fixed before merge: slug-length validation against derived-ID schema cap (52-char limit), in-memory schema validation of generated Scenario/RiskMap/PackManifest before writing, symlink rejection at both packs/ parent and packDir, non-directory parent rejection, atomic backup-rename --force (failed scaffolds restore the original pack), package.json#files matching reality, scoped publish guidance, schema-valid profile snippet, integration test asserts scn-pack-demo-starter actually executed (rejects false-positives via bundled packs), honest NO_NETWORK_PROBE documentation. 54 tests in @aqa/kit (12 pack-new + 42 run-cmd). Still pending in v1.7: slice 3 (admin Create-pack wizard) and slice 4 (audit + wire/implement 81 silent admin placeholder buttons, plan in docs/internal/admin-placeholder-audit.md). Final v1.7.0 tag after those slices ship.
v1.6 shipped — aqa run + bundled packs + ecosystem foundation. PR #24 merged (21d7b10), tag v1.6.0 pushed, GitHub release published. The CLI now has the missing aqa run command that closes the loop between aqa init and a real audit trail. 21 review iterations with Copilot + Codex, every one surfacing a real bug or coverage gap (zero false alarms). 42 TDD tests in packages/kit/test/run-cmd.test.ts cover every behavior. Highlights: SUT-aware init pack selection, three-tier pack discovery (project / node_modules / kit-bundled — all 5 baseline packs now ship inside @aqa/kit's tarball via bundle-packs.mjs), atomic run-dir creation (TOCTOU-safe for concurrent seeded runs), path-traversal + symlink-escape rejection, applies_when filtering, manifest-name dedup with priority, legacy bare-slug aliasing, agent-mode rejection until that driver lands, unrelated-broken-pack tolerance with structured warnings, capped error strings (MAX_DETAIL_PER_KIND), detail samples in run_finished audit event for auditors. Known scoped follow-ups: real HTTP probe runner (current is no-network stub → release-gate strict semantics deferred), EventChainWriter ↔ verifyEventChain canonical-form reconciliation, browser-driven ecosystem smoke.
Next macro task — v1.7 pack-authoring story. Per user confirmation: (a) docs/PACK-AUTHORING.md community tutorial, (b) aqa pack new <slug> CLI scaffolding, (c) Admin "Create pack" wizard wired over the new CLI. PLUS: a full audit pass on every placeholder button/interaction in the admin panel — no onClick={() => {}} or no-op silent clicks. Each placeholder either gets wired to a real endpoint, gets a client-side implementation, or gets an explicit "decorative" doc note.
v1.5 admin design integration shipped. PR #23 merged (f7b879f), tag v1.5.0 pushed, GitHub release created. The 30-screen hi-fi prototype from Claude Design is now the official admin web panel: bundled into packages/admin/src/app.tsx (8.9k LOC, @ts-nocheck), token-driven CSS, Vite production build. New E2E (Playwright, admin UI) CI job runs the full Playwright suite (*.e2e.ts) — per-screen smoke for all 19 nav routes + audit-chain verify (OK/tampered) + Findings views (Clusters/List/Kanban) + Replay tabs + risk-map matrix + theme + palette. Total 36 Playwright tests green in 1m27s. Known scoped tradeoffs (deferred): in-memory routing only (not URL-driven), live-mode still reads in-file mocks (no real fetch layer wired). Both intentional for the design port; will be picked up in v1.6.
v1.5 lessons captured. Documented in docs/LESSON.md: (a) bundled-prototype @ts-nocheck pattern with Biome ignore-list; (b) window.useTweaks fallback injection for design-tool-only hooks; (c) Playwright .e2e.ts extension to avoid Bun's test runner picking it up; (d) nav-item locator pattern (no $ anchor, escape regex metas, target prototype's actual .replay-tab/.seg-btn classes, not getByRole('button')).
Next macro task — v1.6 ecosystem end-to-end smoke. Full end-to-end ecosystem smoke via Playwright: boot server + runner pool + admin in a single command, drive a real aqa run against examples/bun-api, verify findings appear in the admin, verify audit chain remains valid. TDD: any broken path → failing test first, then fix. After that, the README/docs refresh closing step (see below).
v1.4 admin API surface (in flight). Expanded packages/server's makeApi() from 4 to 28 routes covering everything docs/design/admin-panel-spec-v2.md references: runs detail + events, finding status mutation, packs CRUD, profiles CRUD, risks CRUD, scenarios edit, audit query, cost summary, queue snapshot, notifications, saved views, API tokens, tenancy (orgs + projects). StoreProvider extended with matching methods; MemoryStore implements all of them (Postgres scaffold throws not implemented). New @aqa/schemas namespaces: Notification, SavedView, ApiToken, CostSummary, Tenancy. Multi-tenant via x-aqa-org / x-aqa-project headers. 8 new tests; 184 repo-wide.
Design brief for admin v2 shipped. docs/design/admin-panel-spec-v2.md — self-contained enterprise-grade spec (tokens, 30 screens, full component library, interaction patterns, a11y, perf budget, deliverables checklist) so an external designer (or Claude Design) can build the React template in parallel.
Next macro task (post-admin-design). After admin v2 design lands and integrates: full end-to-end ecosystem smoke via Playwright — boot server + runner pool + admin in a single command, drive a real aqa run against examples/bun-api, verify findings appear in the admin, verify audit chain remains valid. TDD: any broken path → failing test first, then fix.
Issue #3 closed. Mirrored 3 remaining Zod superRefines into JSON Schema (Finding status='duplicate' ⇒ duplicate_of, ReproLevel deterministic=true ⇒ attempts >= 1, ProfilesFile profile.name === key via $comment). Added Ajv 2020 round-trip test (packages/schemas/test/ajv-roundtrip.test.ts) that validates every fixture against the emitted schema — catches Zod/JSON-Schema divergence at build time. 204 tests repo-wide. Patches resolve #/definitions/<name> indirection emitted by zod-to-json-schema.
PR #22 local gates verified (2026-05-18). bun install ✅, bun run build ✅, bun run typecheck ✅, bun run lint ✅ (4 warnings, no errors), bun test ✅ 204/204.
BLOCKER — Copilot review request (PR #22). Both gh pr edit --add-reviewer copilot-pull-request-reviewer and GraphQL requestReviewsByLogin return HTTP 403 (DNS monitoring proxy blocks GitHub API). Action required: please open PR #22 in the GitHub UI and manually add copilot-pull-request-reviewer from the Reviewers sidebar panel.
Final closing step (after every macro task above is closed). README + docs refresh pass:
1. Audit every v0.x.x reference in README.md — replace stale ones with the current shipped surface or drop.
2. Quick-start section: remove the "preview of v0.1.0" disclaimer; write the definitive end-to-end junior flow that actually works today, including booting the web admin panel. No more "this will work in vX" hedging.
3. Architecture section in README.md: refresh diagram + component list to match the 18 packages shipped (schemas, kit, pack-loader, pack-scanner, adapters, llm-adapters, runner, reporter, admin, admin-core, auth, sandbox, store, generator, server, clustering, methodology, compliance).
4. docs/: audit every file, prune obsolete content, keep only current/good. Anything that says "stub" or "lands in vX" must either be filled in or removed.
5. After "The mental model in 7 words" section, add a new section titled "How you use it" — clean, concise, written in the same rhythm as "7 words" — describing the end-to-end junior workflow:
  - aqa init (detect repo, scaffold .aqa/)
  - edit risk-map.yaml (declare what matters)
  - install agent files for your coding agent
  - aqa run --profile smoke (skills + scenarios + runner + oracles)
  - open admin panel (bun --filter @aqa/admin dev)
  - inspect findings, replay deterministically, verify audit chain
  - iterate on risks + scenarios until release-gate green
6. Tag the README/docs refresh PR as the official closure of the agentic-qa-kit v1.x line.

2026-05-18 — earlier

v1.2 admin wired. @aqa/admin migrated from inline-style placeholder shell to a real SPA: Tailwind 4 + TanStack Router + TanStack Query + Zustand + lucide-react. 12 screens shipped end-to-end: Dashboard (KPIs), Runs (table), Findings (clustered via content-hash signature, async via Web Crypto), Risk map (grouped by category), Profiles, Packs (with signature badge), Scenarios (pack→scenario tree), Agents (per-agent instruction-file detection), Replay (per-finding repro.sh / repro.curl preview + verify button), Audit log (paste events.jsonl → re-walk the sha256 chain in-browser; "Load good chain" / "Load tampered chain" demo buttons), Cost (bar by profile), Settings (theme toggle).
Browser-side hash-chain verifier. node:crypto is not Vite-safe, so the admin re-implements verifyEventChain + signatureOf on top of crypto.subtle.digest. The CLI version in @aqa/compliance remains the SOC2 source of truth; the in-browser copy is a UX affordance only. Documented in docs/LESSON.md.
Build: 376 KB JS (116 KB gzip), Tailwind CSS 9.94 KB (2.92 KB gzip). 165 tests still pass.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress

How to read this file

2026-05-17

2026-05-18

2026-05-18 — earlier

FilesExpand file tree

PROGRESS.md

Latest commit

History

PROGRESS.md

File metadata and controls

Progress

How to read this file

2026-05-17

2026-05-18

2026-05-18 — earlier