Skip to content

Releases: padosoft/agentic-qa-kit

v1.7.0-rc.1 — pack authoring (slices 1+2)

18 May 20:57
6cc0013

Choose a tag to compare

First release candidate for v1.7. Delivers the pack-authoring story (slices 1 and 2 of 4 planned).

What's in

📖 docs/PACK-AUTHORING.md

End-to-end tutorial for community pack authors:

  • Directory layout, manifest schema, scenarios + risks structure
  • Three distribution patterns (workspace pack / vendored copy / npm scope alias)
  • Programmatic validation (aqa validate)
  • Honest about current limitations: no-network NO_NETWORK_PROBE stub returns {probe_id, status: 200, body: null} for every probe kind, only http_status / response_contains / response_not_contains oracles are wired, no custom oracle/probe loader yet

🛠 aqa pack new

CLI to scaffold a runnable pack at <cwd>/packs/<slug>/:

aqa pack new pack-myapp --sut-type api
aqa pack new pack-frontend --sut-type web --description "Smoke tests for the marketing site"

The scaffold produces a starter scenario whose http_status: 200 oracle passes cleanly against the stub probe out of the box (avoiding the iter-17 footgun where bundled packs emitted synthetic findings). Supports:

  • --sut-type (api / web / cli / lib / agent / pipeline)
  • --force — atomic backup-rename overwrite (non-destructive on failure)
  • --description, --author, --license (SPDX)

Hardened against:

  • Symlinks at packs/ parent and packDir (lstatSync checks)
  • Non-directory parent (regular file at <root>/packs)
  • Over-length slugs (cap at 52 chars to keep every derived ID within the 64-char Slug schema cap)
  • TOCTOU on the existence check (uses non-recursive mkdir + explicit lstat)
  • Schema-invalid generated output (in-memory PackManifest/Scenario/RiskMap validation before writing)
  • Validation failures destroying the existing pack (backup-rename + restore-on-error)

🧪 Tests

54 in @aqa/kit (50 pack-new + 42 run-cmd subset). Lint + typecheck clean.

What's still pending in v1.7

  • Slice 3 — Admin "Create pack" wizard (future PR)
  • Slice 4 — Audit + wire/implement/document all 81 silent placeholder buttons across packages/admin/src/app.tsx (slices 4a–4f, future PRs; plan documented in docs/internal/admin-placeholder-audit.md)
  • Final v1.7.0 tag after slices 3 + 4 ship

Review

19 iterations on PR #25 with both Copilot and Codex review bots. All real issues addressed; final iter returned 0 new must-fix items.

v1.6.0 — aqa run CLI + bundled packs

18 May 16:36
21d7b10

Choose a tag to compare

v1.6 — aqa run + ecosystem foundation

The missing piece between aqa init and a real audit trail. After 21 review iterations with Copilot + Codex (every one surfaced a real bug or coverage gap, zero false alarms), the inner loop is end-to-end usable.

What lands

  • aqa run CLI command. Loads .aqa/project.yaml + .aqa/profiles.yaml via the canonical @aqa/schemas shapes, resolves packs from three discovery tiers (project's packs/*, node_modules/@aqa/*, kit-bundled dist/packs/*), filters scenarios by the selected profile's tags, and runs each one via @aqa/runner.runScenario. Streams events + findings into .aqa/runs/<run_id>/.
  • Flags: --profile <name> (defaults to smoke if present, else first profile) and --seed <string> (deterministic run_id for tests + replay).
  • Bundled packs. All 5 baseline packs (pack-core, pack-api-core, pack-web-ui, pack-llm-agent, pack-security) now ship inside @aqa/kit's npm tarball via a bundle-packs.mjs build step. A fresh aqa init + aqa run --profile smoke works with only @aqa/kit installed.
  • SUT-aware init. aqa init picks the right packs from the detected sut_type (api → pack-api-core, web → pack-web-ui, agent → pack-llm-agent, else → pack-core). The framework clause on pack-api-core was dropped so plain Node/Bun APIs without a recognized framework still get coverage.
  • Hardened orchestration: atomic run-dir creation (no TOCTOU on concurrent seeded runs), pack-manifest scenario discovery (no glob-scanning), path-traversal + symlink-escape rejection, applies_when filtering, manifest-name dedup with priority (project > node_modules > bundled), legacy bare-slug pack-name aliasing, agent-mode profile rejection until that driver lands, unrelated-broken-pack tolerance with structured warnings.
  • Structured RunResult with ok, runId, runDir, scenariosRun, findingsCount, capped error string (MAX_DETAIL_PER_KIND + "…+N more" truncation), and a warnings array for non-fatal diagnostics. Detail samples (pack_error_samples, scenario_error_samples, …) live in the run_finished audit event for auditors.
  • 42 TDD tests in packages/kit/test/run-cmd.test.ts — every behavior above is covered, written before the code existed.

Known scoped follow-ups (v1.7)

  • Real HTTP probe runner. Today's runScenario still uses the no-network probe stub (@aqa/runner's NO_NETWORK_PROBE). The release-gate "fail on any finding" semantic (from require_deterministic_replay: true) is deferred until probes hit a real SUT — every finding the stub produces is synthetic.
  • EventChainWriterverifyEventChain reconciliation. Writer omits prev_hash from the canonical body and emits null for seq=0; @aqa/compliance.verifyEventChain includes prev_hash and expects "0…". Tests ship a local writer-matching verifier; reconciling the two implementations is a separate cleanup.
  • Pack authoring story. v1.7 will add docs/PACK-AUTHORING.md (community tutorial), aqa pack new <slug> (scaffolding CLI), and an admin "Create pack" wizard — plus a full audit pass on every placeholder button in the admin panel so nothing renders as a "muted click".
  • Browser-driven ecosystem smoke. Playwright test that starts admin + runs aqa run against examples/bun-api and asserts findings appear in the admin UI end-to-end.

v1.5.0 — admin design integration

18 May 11:34
f7b879f

Choose a tag to compare

v1.5 — Admin design integration

The hi-fi prototype shipped by Claude Design (30 screens) is now the official admin web panel. The bundled prototype was ported to Vite + React 19 + TypeScript strict, all 30 screens render in production, and a Playwright suite drives every screen.

What landed

  • 30 screens, real markup — 8.9k LOC ported to packages/admin/src/app.tsx (bundled, @ts-nocheck for the prototype's design-tool conventions). Dark-themed, token-driven CSS.
  • Vite production build — replaced design-tool CDN React/Babel scripts with a regular Vite SPA. bun run dev boots in <500ms; bun run build ships a static bundle.
  • Playwright suitepackages/admin/test/e2e/*.e2e.ts covers per-screen smoke, audit chain verify (OK + tampered), Findings views (Clusters/List/Kanban), Replay tabs, risk-map matrix, theme, palette. Real DOM, no mocks. bun run test:e2e runs it.
  • CI gating — new E2E (Playwright, admin UI) job in .github/workflows/ci.yml builds the admin and runs the Playwright suite against the dev server.
  • Quality — Biome ignores the bundled prototype to keep lint targeted; smoke filter tolerates the prototype's intentional console.error demo calls.

Known scoped tradeoffs

  • In-memory routing only (the prototype was never URL-driven); reading window.location on boot is deferred to a follow-up.
  • Live-mode currently animates time but still reads in-file mock data; wiring VITE_AQA_SERVER_URL to a real fetch layer is deferred to the next macro task.

What's next (v1.6)

Full end-to-end ecosystem smoke via Playwright: boot server + runner pool + admin in a single command, drive a real aqa run against examples/bun-api, verify findings appear in the admin and the audit chain stays valid end-to-end.

v1.4.0 — Admin API surface + issue #3 closed

18 May 09:21
bb0f7c6

Choose a tag to compare

v1.4.0 — Admin API surface + issue #3 closed

Backend gap closure ahead of the parallel admin v2 design integration.

Server expansion

packages/server/src/api.ts makeApi() grows from 4 → 28 routes:

  • Runs: list, detail, events, create
  • Findings: list, detail, status mutation (with audit reason)
  • Packs: list, detail, install, uninstall
  • Profiles: list, detail, save, delete
  • Risks: list, detail, save, delete
  • Scenarios: list, detail, save
  • Audit: scoped event query
  • Cost: per-window summary aggregation
  • Queue: snapshot + runner tap
  • Notifications: list, mark-read
  • Saved views: list, save, delete
  • API tokens: list, create, revoke
  • Tenancy: list orgs, list projects, create org, create project

All routes are permission-gated via @aqa/auth, tenant-scoped via
x-aqa-org / x-aqa-project headers, and return shape-compliant
@aqa/schemas objects. Multi-tenant fail-closed: missing scope → 400;
cross-tenant ID lookup → 404 (so probing for IDs in other projects
gains no information).

Schemas

6 new @aqa/schemas namespaces with Draft 2020-12 JSON Schemas
emitted (schemas/v1/ now ships 15 files):

  • Notification
  • SavedView
  • ApiToken
  • CostSummary
  • Tenancy.Org + Tenancy.ProjectRef

Store

StoreProvider extended with 15+ methods covering the new endpoints.
MemoryStore implements all of them; PostgresStore retains the
explicit not implemented pattern so a misconfigured production
deployment fails loudly.

RunnerQueue gains snapshot(), requeue(id), kill(id) for the
admin queue ops screen.

Issue #3 closed

Three remaining Zod superRefines mirrored into JSON Schema:

  • Finding.status='duplicate' ⇒ duplicate_of required
  • ReproLevel.deterministic=true ⇒ attempts >= 1
  • ProfilesFile.profile.name === key (via $comment — cross-field)

Cross-field invariants JSON Schema cannot express
(duplicate_of !== id, successes === attempts,
finished_at >= started_at, profile.name === key) surfaced via
$comment on the emitted schemas.

Ajv 2020 round-trip test (packages/schemas/test/ajv-roundtrip.test.ts)
validates every fixture against the emitted schema — catches Zod ↔
JSON-Schema divergence at build time.

All 6 emitter patches now resolve the #/definitions/<name>
indirection that zod-to-json-schema emits.

Docs

  • docs/design/admin-panel-spec-v2.md — full enterprise design brief
    (tokens, 30 screens, component library, interactions, a11y, perf,
    deliverables) for the external designer who builds the React
    template in parallel.
  • docs/PROGRESS.md updated with v1.4 entry, the post-design Playwright
    smoke roadmap, and the final closing step (README + docs refresh
    pass: audit v0.x references, finalise quick-start, write the
    "How you use it" workflow section, prune obsolete docs).

Review loop

Codex + Copilot iterated 2 times before merge, surfacing and addressing
5 must-fix items across schema enum alignment, tenant-scope enforcement
on runs / findings detail, MemoryStore audit filter leniency on
unstamped events. CI 14/15 green throughout.

Numbers

  • 28 server routes (was 4)
  • 15 JSON Schemas (was 9)
  • 205 tests (was 165)
  • 19 packages (added @aqa/compliance previously; this release adds no new package)

PR: #22.

v1.3.0 — Quality batch

18 May 01:04
a1408f3

Choose a tag to compare

v1.3.0 — Quality batch

Six post-v1.2 polish items + an extended review-and-fix loop. No new packages; all quality / coverage / docs / correctness.

What landed

1. Admin server↔UI mapping

  • packages/admin/src/data/api.ts fetches from VITE_AQA_SERVER_URL (real @aqa/server shape) with explicit error surfacing — no silent mock fallback in live mode.
  • mapRun() / mapFinding() translate Run.Run (state, totals.findings, totals.llm_cost_usd) and Finding.Finding (status enum draft|verified|rejected|duplicate|fixed, verification_floor enum bug_level|scenario_level|agent_level, discovered_at) into the UI types. Screens stay source-agnostic.
  • live/mock badge + red error banner on Runs and Findings.

2. Admin sub-screens (6 detail routes)

/runs/$runId, /findings/$findingId, /risk-map/$riskId, /profiles/$profileName, /packs/$packSlug, /scenarios/$scenarioId. Each with Breadcrumb + PageHeader. Runs table rows are clickable links.

3. Admin unit tests (12 new, 176 total)

  • test/audit.test.ts (5): parseEventLines ×2, verifyEventChain ×3 (good chain, tampered, vacuous truth).
  • test/cluster.test.ts (6): signatureOf (identity, normalisation, divergence), clusterFindings (grouping, worst-severity, sort).

4. CLI E2E smoke gate

scripts/e2e-cli.mjs runs against a fresh tmpdir sandbox (seeded with a minimal package.json + aqa init). All four checks (--version, --help, doctor, validate) must exit 0. Wired into CI as a new e2e-cli job in .github/workflows/ci.yml.

5. Threat model expansion

docs/security/threat-model.md from 12-line stub to full STRIDE catalog: trust-boundary diagram, 20 specific threats with current mitigation + status, agentic-specific cross-cutting threats (tool-result poisoning, confirmation bypass, supply chain, cost-based DoS).

6. CHANGELOG.md backfill

Entries for v0.2.0 → v1.3.0 in Keep-a-Changelog format.

Review loop

Codex + Copilot review iterated 3 times before merge, surfacing and addressing 21 must-fix items across schema enum alignment, fake-live fallback, error-vs-not-found splitting, CLI E2E hardness, threat-model precision (S-03 narrower scope, D-01 / S-01 / I-03 downgraded from Mitigated to Partial / Unmitigated to reflect actual code). All inline comments addressed. CI 15/15 green.

PR: #20 + docs follow-up #21.

v1.2.0 — Admin wired

17 May 23:07
3ba9bef

Choose a tag to compare

v1.2.0 — Admin SPA wired

The admin panel goes from inline-style placeholder to a real SPA.

Stack

  • Tailwind 4 via @tailwindcss/vite@theme tokens + .dark variant
  • TanStack Router (code-based) — 12 typed routes
  • TanStack Query — async data + mutations
  • Zustand — theme store, persisted to localStorage
  • lucide-react — icons
  • date-fns — relative timestamps

12 screens

Dashboard (KPIs), Runs (table), Findings (clustered via content-hash signature), Risk map (grouped by category), Profiles, Packs (with signature badge), Scenarios (pack→scenario tree), Agents (per-agent instruction files), Replay (per-finding repro.sh / repro.curl preview + verify button), Audit log (paste events.jsonl → re-walk sha256 chain in-browser; "Load tampered chain" demo button), Cost (bar by profile), Settings (theme toggle).

Real, not placeholder

  • Findings clusters via signatureOf(scenario × risk × normalised summary) using Web Crypto. Mirrors @aqa/clustering.
  • Audit log re-walks the sha256 prev_hash chain in-browser; demo shows mechanical tamper detection. Mirrors @aqa/compliance's verifyEventChain on top of crypto.subtle.
  • Theme toggle persisted, applies .dark on <html>.

Browser-side hash verifier

node:crypto is not Vite-safe; the admin re-implements the verifier on Web Crypto. The Node CLI in @aqa/compliance remains the SOC2 source of truth — the in-browser copy is a UX affordance only.

Build

376 KB JS (116 KB gzip), 9.94 KB CSS (2.92 KB gzip). 165 tests still pass.

PR: #19.

v1.1.0 — Polish

17 May 22:33
b639095

Choose a tag to compare

v1.1.0 — Polish

Post-GA polish release. Drops the "pre-alpha" label, ships the operator-facing chart, demonstrates language-agnostic targeting.

What landed

  • README — banner image wired (docs/assets/banner.png), pre-alpha badge replaced with GA + Release badges, Status section reflects v1.0 GA / v1.1 current.
  • deploy/helm feature-complete — server Deployment + Service, runner StatefulSet with per-pod PVC (deterministic fixtures), optional Ingress + TLS, NetworkPolicy that confines runner egress to server + DNS + operator-provided CIDRs, optional in-cluster Postgres subchart for dev/PoC.
  • Examplesexamples/bun-api (Hono+Bun, api-core + security), examples/nextjs-saas (Next.js 15 with session-cookie invariant, web-ui + api-core), examples/laravel-app (PHP/Laravel 11, api-core — demonstrates that AQA is target-language agnostic).
  • docs/LESSON.md — consolidated v1.0 → v1.1 retrospective: bundling strategy, exactOptionalPropertyTypes patterns, LongSlug pitfalls, deploy-scaffold self-labeling, audit-verifier package separation.

165 tests pass, biome + tsc strict zero errors.

PR: #18.

v1.0.0 — GA

17 May 22:27
76ebe33

Choose a tag to compare

v1.0.0 — GA: SOC2/ISO readiness

Task 23 — closes the 24-task roadmap.

What landed

  • @aqa/compliance — SOC2 TSC + ISO 27001:2022 Annex A controls catalog (CONTROL_MAPPINGS), controlsCoverage() summarizer.
  • verifyEventChain(events) — re-walks the sha256 prev_hash chain emitted by the runner; reports first mismatch.
  • aqa-audit-verify <path> CLI — non-zero exit on chain break; wire into CI to fail builds on tampered audit logs.
  • docs/compliance/soc2-iso-mapping.md — auditor-facing source of truth.
  • docs/compliance/pen-test-scope.md — pen-test engagement contract.

Roadmap closed

24 tasks, 18 packages, 165 tests across Bun 1.3.11 and Node 22 LTS.

PR: #17.

v0.6.0

17 May 22:27
5b6aeff

Choose a tag to compare

v0.6.0 — Methodology + deploy assets

Tasks 21, 22 in one bundle.

What landed

  • @aqa/methodology (Task 21) — STRIDE/FMEA/OWASP risk mapping. strideOf, fmeaScore (RPN = severity × occurrence × detection), owaspOf, methodologyCheck (flags risks without any framework anchor).
  • Deploy scaffolds (Task 22)deploy/helm chart skeleton, deploy/terraform namespace module, scripts/air-gap-install.sh (bundle + verify).

PR: #16.

v0.5.0

17 May 22:27
078992e

Choose a tag to compare

v0.5.0 — Multi-team + clustering

Tasks 19, 20 in one bundle.

What landed

  • @aqa/server (Task 19) — framework-agnostic makeApi() routing table + RunnerQueue (FIFO with visibility leases).
  • @aqa/clustering (Task 20)signatureOf (sha256 of scenario / risk / normalised summary) + clusterFindings (representative = earliest, severity = worst).

PR: #15.