Skip to content

[kafka pr-1 · 06/6] Slice 06 — Auth scopes (HITL)#407

Open
zsculac wants to merge 7 commits intofeat/kafka-list-revoke-verifyfrom
feat/kafka-auth-scopes
Open

[kafka pr-1 · 06/6] Slice 06 — Auth scopes (HITL)#407
zsculac wants to merge 7 commits intofeat/kafka-list-revoke-verifyfrom
feat/kafka-auth-scopes

Conversation

@zsculac
Copy link
Copy Markdown
Contributor

@zsculac zsculac commented May 5, 2026

Stack position

LAST sub-PR for the kafka foundation (PR #390). Stacked on slice 05's PR #395.

main
 └── feat/kafka-walking-skeleton    (foundation rollup, PR #390 — draft)
      └── feat/kafka-list-revoke-verify    (slice 05, PR #395)
           └── feat/kafka-auth-scopes      ← THIS PR (HITL)

Base is slice 05's branch — diff is just slice 06's 6 commits / 19 files. Once slice 05 merges into the foundation, retarget this PR's base to feat/kafka-walking-skeleton.

⚠️ HITL — human inspection required before merge

This slice modifies packages/cli/src/auth.ts, the single source of truth for daemon auth across HTTP / MCP / SSE. SDD review passed all four stages (implementer → spec → code quality → polish), with one critical fix loop on the code-quality stage. The human MUST inspect the diff themselves before merging. Specifically verify:

  • The exact backward-compat code path: legacy lines (no TAB) → * scope set → grants all routes
  • The atomic-write implementation: race window between rename and crash
  • The mint response surface: full token returned ONCE; never in list responses
  • Per-route scope strings match the PRD table exactly: kafka:endpoint:write for write routes, kafka:endpoint:read for read routes

What slice 06 adds

Per-token scopes via extended bearer-token file format. Implements ADR-0003.

  • New token-store deep module extracted from auth.ts — pure parser/serializer for the token file format
  • File format extended: <token><TAB><scopes-comma-separated>[<TAB><name>[<TAB><createdAt>]]. Legacy lines (no TAB) continue to grant full access (* scope) — non-negotiable
  • New verifyTokenScope(token, requiredScope, store) helper with explicit fail-closed guard for empty/undefined required scope
  • All 5 kafka endpoint routes gated by their scope per PRD table; 403 (not 401) on scope mismatch; no WWW-Authenticate header on 403
  • Token mint flows: POST /api/auth/tokens (root-only), GET /api/auth/tokens (root-only, prefixes only), DELETE /api/auth/tokens/<prefix> (root-only)
  • CLI: dkg auth mint-token / list-tokens / revoke-token (under existing dkg auth group)
  • Concurrency-safe write to ~/.dkg/auth.token via shared writeFileAtomic helper (extended to accept { mode: 0o600 } so the temp file is created secure from the start, closing the umask window)
  • Disk-first write order in mint/revoke: persist before mutating in-memory state, with try/catch rollback and 5xx on write failure

Test plan

  • Backward compat: legacy token file (no TABs anywhere) grants full access on every kafka route — uses real on-disk legacy file, no mocks
  • Unit: token-store parse/serialize/round-trip across legacy/scoped/mixed/malformed (30 cases)
  • Unit: verifyTokenScope semantics including * grants all + fail-closed on empty/undefined required scope (6 cases)
  • Integration: token scoped kafka:endpoint:read rejected (403) on POST /api/kafka/endpoint, accepted on GET — asserts no WWW-Authenticate on 403
  • Integration: scoped token cannot mint another token via API (403); root token can; mint of * scope rejected; auth:tokens:* scope rejected
  • Integration: mint returns full token ONCE; subsequent list calls strip via toPublicRecord; assertion: full token string never appears in list response body
  • Persistence-failure injection: chmod 0o500 on dkg-home mid-test → 5xx returned, in-memory store unchanged, on-disk file byte-identical
  • Concurrency: parallel writes (10 Promise.all mints) produce well-formed file, no truncation, all 10 records survive
  • CLI smoke: full token printed exactly once on stdout (match(/.../g)?.length === 1 strict count); negative-control: secret never appears on stderr
  • e2e: mint scoped token via CLI, register endpoint with it, attempt list → 403; cleanup via revoke
  • Coverage ratchets unchanged: kafka package coverage unaffected (no kafka/ source touched); CLI uses package-level ratchet (not per-module), so the 1300+ LOC of new well-tested cli/ code can only raise it

SDD pipeline outcome

Stage Verdict
Implementer DONE
Spec compliance review ✅ SPEC COMPLIANT
Code quality review ❌ → ✅ APPROVED (after 1 critical + 4 important fixes in e5bd1512)
Polish review 📝 SUGGESTIONS (3 mechanical wins applied in 294b6182)

Critical fix in the loop: writeFileAtomic originally created the temp file with default umask (0o644 typically), opening a brief permissions window on ~/.dkg/auth.token between rename and the explicit chmod. Fixed at the helper level so slice 07 inherits the secure default by construction.

Pre-existing footguns surfaced (not fixed in this slice)

  • dkg auth show (pre-slice-06) prints all token strings including new scoped tokens. Description unchanged.
  • dkg auth rotate (pre-slice-06) silently overwrites scoped tokens. Description prose updated to warn operators (this slice); behavior unchanged. Full fix is slice 07 cleanup territory.
  • Multi-process write races: slice 06 routes all admin via the daemon's in-process mutex (CLI commands hit the API). Cross-daemon-process races aren't covered but aren't a normal config.

Risk notes

Highest-risk slice in the foundation:

  • auth.ts is daemon-wide; mistakes silently lock or unlock entire deployments
  • Concurrency-safe write must be correct under realistic contention
  • Token leakage paths (logs, list responses, error responses) are the worst possible bug

Related

  • ADR-0003 (auth scopes via extended bearer-token file)
  • Issue: .scratch/kafka-registry/issues/06-auth-scopes.md

Zvonimir and others added 6 commits May 5, 2026 14:23
… phase 1)

ADR-0003: extend the bearer-token model to per-token scopes without
breaking the legacy single-line file format. Backward-compat is
non-negotiable: every existing deployment carries a scope-less token
on disk; that token must continue to grant access to every API surface.

Changes:
- New `packages/cli/src/token-store.ts`: pure parser + serializer for the
  extended file format `<token>[\\t<scopes>[\\t<name>[\\t<createdAt>]]]`.
  Legacy lines (no TAB) parse to scopes='*' (full access). Comments and
  blank lines round-trip byte-identically. Skips malformed lines with a
  warning sink rather than crashing.
- `auth.ts` now exports `loadTokenStore` (structured map) and
  `verifyTokenScope` (pure scope checker; '*' grants any scope). Existing
  `loadTokens` is kept as a Set-returning wrapper so the 13 existing
  call sites (api-client, lifecycle, every route module) compile and
  behave identically for legacy tokens.
- `daemon/routes/kafka.ts`: per-route scope gates wired in:
    POST /api/kafka/endpoint           → kafka:endpoint:write
    POST /api/kafka/endpoint/verify    → kafka:endpoint:write
    DELETE /api/kafka/endpoint/<uri>   → kafka:endpoint:write
    GET  /api/kafka/endpoint           → kafka:endpoint:read
    GET  /api/kafka/endpoint/<uri>     → kafka:endpoint:read
  Scope mismatch returns 403 (NOT 401) and omits WWW-Authenticate.
- `RequestContext` carries a `tokenStore` alongside the legacy
  `validTokens` Set. Lifecycle wires both from a single
  `loadTokenStore()` call; per-agent tokens are added with scopes='*'.
- `kafka-route-verify` test ctx mock updated to include a wildcard
  `tokenStore` so existing route assertions are unaffected by the
  scope check (scope behavior is covered separately in
  `daemon-auth-scopes.test.ts`).

Tests:
- New `daemon-auth-scopes.test.ts` (THE backward-compat test):
    - legacy scope-less file accepted on every kafka route (200/500, never 401/403)
    - read-scoped token: 200 on GET, 403 on POST/DELETE
    - write-scoped token: 200 on POST, 403 on GET
- New `token-store.test.ts` (30 cases): legacy/scoped/mixed/malformed
  parsing, round-trip determinism, lookup helpers.
- New `auth-scopes.test.ts`: `verifyTokenScope` ` '*' grants any` semantics,
  exact-match for explicit lists, fail-closed on unknown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se 2)

Three new root-only verbs for token administration (ADR-0003):

  POST   /api/auth/tokens           mint a scoped token
  GET    /api/auth/tokens           list (prefix + scopes only — never the secret)
  DELETE /api/auth/tokens/<prefix>  revoke (idempotent: 204 hit / 404 miss)

Critical contracts pinned by tests:
- Root-only via `record.scopes === '*'`. Any explicitly-scoped caller
  gets 403 — privilege-escalation guard. ADR-0003 explicitly forbids
  an `auth:tokens:*` scope; both that prefix and a wildcard `*` mint
  attempt are rejected with 400.
- Mint response carries the full token EXACTLY ONCE (`{token, prefix,
  scopes, name?, createdAt}`). List/get responses go through
  `toPublicRecord` which strips `fullToken`. Tests assert the secret
  string never appears anywhere in the serialized list response.
- Read-modify-write critical section serialized by an in-process mutex.
  The on-disk write goes through `writeFileAtomic` from
  `daemon/fs-utils.ts` (POSIX-rename atomic). Concurrency test
  (`token-store-concurrency.test.ts`) runs Promise.all of 10 parallel
  mints and asserts the resulting file re-parses with all 10 records
  + zero malformed-line warnings.
- DELETE refuses to revoke the bearer token used to make the request
  (400) — guards against locking yourself out.

Wiring:
- New `daemon/routes/auth.ts` (~270 LOC) with a single `handleAuthRoutes`
  exporting the three handlers.
- `handle-request.ts` adds the dispatch after `handleKafkaRoutes`.
- The new routes go through the standard `httpAuthGuard` for auth
  presence (401 if no token), THEN do the root-only check (403 if
  scope ≠ '*'). They are NOT in `PUBLIC_PATHS`.

Tests:
- `daemon-token-mint.test.ts`: 14 cases — mint shapes (string / array
  / comma-separated), wildcard + auth:tokens:* rejection, scoped-can't-mint
  403, list-without-secrets, revoke 204/404, self-revoke 400,
  end-to-end mint→list→revoke.
- `token-store-concurrency.test.ts`: parallel mints invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e 06 phase 3)

CLI surface for the slice 06 token-admin API. Subcommands sit under the
existing `dkg auth` group (alongside `show`, `rotate`, `status`) so we
don't proliferate top-level groups.

  dkg auth mint-token --scope <comma-list> [--name <label>]
      Hits POST /api/auth/tokens. Prints the full token ONCE with a
      "save this — it won't be shown again" notice. Exits 1 on 403
      with a "root token required" message.

  dkg auth list-tokens
      Hits GET /api/auth/tokens. Renders a 3-column table:
      PREFIX | SCOPES | NAME / CREATED. Full secrets are never printed.

  dkg auth revoke-token <prefix>
      Hits DELETE /api/auth/tokens/<prefix>. Exits 0 on 204, 1 on 404.

Implementation notes:
- All three commands go through `ApiClient` so the daemon's serialization
  mutex covers the on-disk write. Bypassing the daemon to mutate the
  file directly would race with concurrent API mints.
- New `ApiClient.mintAuthToken / listAuthTokens / revokeAuthToken`
  methods. `delete<T>` private helper now returns `{}` on 204 (was
  unconditionally `res.json()`-ing an empty body and throwing).
- Smoke test (`auth-cli-smoke.test.ts`) spins up a stub HTTP server
  and asserts the full request/response wire shape: auth header,
  body, table rendering, "save this token" notice, error paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er+403 flow (slice 06)

Adds a slice 06 case to the existing e2e suite: mint a write-only
scoped token via the CLI's `dkg auth mint-token`, use it to register a
kafka endpoint via the HTTP API (200), then attempt to list endpoints
with the same token (must 403 — read scope is missing). Cleans up by
revoking the e2e token via `dkg auth revoke-token`.

Gated behind DKG_KAFKA_E2E=1 like the other e2e tests; skips cleanly
when devnet isn't reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…I4, P2)

C1 (CRITICAL) — writeFileAtomic now accepts an optional `mode`:
  - The temp file is CREATED with that mode (closes the umask-derived
    0o644 window between rename and the post-write chmod).
  - A defensive `chmod` runs after rename in case the destination
    pre-existed and source-mode transfer wasn't consistent.
  - mint + revoke routes pass `{ mode: 0o600 }`; their now-redundant
    explicit `chmod` calls are gone.

I1 — verifyTokenScope fails closed on empty/undefined requiredScope
BEFORE the wildcard shortcut. JS callers (or `as Scope` casts) can't
accidentally have a root token grant a "no scope" check. JSDoc
documents the contract explicitly so slice 07 inherits it.

I2 — disk-first ordering in mint + revoke. The in-memory store and
`validTokens` Set are now mutated AFTER `writeFileAtomic` resolves;
on disk-write failure the route returns 500 and the in-memory state is
unchanged. Without this, a failed mint would leave the daemon
accepting a token that's absent from disk; a failed revoke would
leave the daemon rejecting a token that's still in the file —
restart resurrects either footgun.

I3 — `dkg auth rotate` description now warns that scoped tokens
minted via `dkg auth mint-token` will be deleted by the rotate.
Description-only change; behavior is unchanged (full fix deferred).

I4 — `auth-cli-smoke.test.ts` "prints token once" assertion is now an
exact-match-count (`...?.length === 1`) instead of `toContain`.
A regression that re-prints the secret in a debug/error path now fails
the test loudly. Added a negative-control: secret must NOT appear on
stderr either.

P2 (polish, in path of C1/I2 edits) — three sites that hand-rolled
`t.length >= 8 ? t.slice(0, 8) : t` now use the existing `tokenPrefix`
helper from `token-store.ts`: `auth.ts` config-token + first-run
generator, `lifecycle.ts` per-agent token loop.

P1 (drop unused exports) — SKIPPED. ParseTokenFileOptions is the
second-arg type of the exported `parseTokenFile`, and PreservedLine is
referenced from the exported `ParsedTokenFile.preserved`. Dropping
their `export` keyword would trigger TS4063 on the consuming exports.

New tests:
- `auth-scopes.test.ts`: + 1 case asserting `verifyTokenScope` returns
  false for empty / undefined / null requiredScope on both wildcard
  and scoped tokens.
- `daemon-token-mint-failures.test.ts` (NEW, 2 cases): real-fs
  failure injection by chmod'ing the dkg-home dir read-only mid-test;
  asserts mint and revoke both surface a 5xx, leave the in-memory
  store + Set untouched, and leave the on-disk file byte-identical.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1: parseMintBody (routes/auth.ts) returns the result object directly
via spread, eliminating the mutable intermediate. Inferred return type
matches the existing function signature.

P2: ApiClient.mintAuthToken (api-client.ts) builds the request body
inline with the same spread pattern as P1, also matching the
construction in cli.ts:318-319 so both call sites read alike.

P3: routes/auth.ts file-header docblock — DELETE description was
"idempotent: 204 on hit, 404 on miss", which is technically wrong
because the response code changes between the first and subsequent
deletes. Replaced with "204 on first delete; 404 if already gone".
The per-handler comment at lines 357-361 was already precise; this
just brings the header in line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
github-actions[bot]

This comment was marked as outdated.

@zsculac zsculac changed the base branch from feat/kafka-walking-skeleton to feat/kafka-list-revoke-verify May 5, 2026 13:39
@zsculac zsculac closed this May 5, 2026
@zsculac zsculac reopened this May 5, 2026
Comment thread packages/cli/src/daemon/routes/kafka.ts
Comment thread packages/cli/src/daemon/lifecycle.ts
Comment thread packages/cli/src/daemon/lifecycle.ts Outdated
Comment thread packages/cli/src/daemon/routes/auth.ts
Bug 1 (CRITICAL — auth.enabled=false regression): the per-route
`requireScope` and `requireRoot` gates now short-circuit to true when
auth is disabled, matching `httpAuthGuard`'s bypass semantics. Without
this, every request that previously worked unauthenticated against
kafka or auth-admin routes would 403 because `verifyTokenScope(undefined,
...)` and `lookupTokenRecord(undefined)` both fail closed.
- New `RequestContext.authEnabled` field, derived once in
  `handle-request.ts` from `config.auth?.enabled !== false` (single
  source of truth — no extra parameter).
- Both `requireScope` (kafka.ts) and `requireRoot` (routes/auth.ts) now
  early-return true when `ctx.authEnabled === false`.
- New integration tests in `daemon-auth-scopes.test.ts` exercise the
  bypass for every kafka route AND for `/api/auth/tokens` POST/GET
  with NO Authorization header → 200/201/2xx.

Bug 2 (CRITICAL — privilege escalation): per-agent tokens issued by
`/api/agent/register` carried `scopes: '*'` for backward-compat on every
other API surface. After slice 06 added admin routes, that wildcard
implicitly let any registered agent mint/list/revoke node auth tokens
— a privilege-escalation path.
- Added `TokenSource = 'file' | 'config' | 'agent'` to `TokenRecord`.
  In-memory only; the parser stamps every disk line as `'file'`; the
  serializer never writes it; round-trip determinism preserved.
- `requireRoot` now requires `scopes === '*'` AND
  `source === 'file' || source === 'config'`. `'agent'` tokens keep
  full kafka access (backward-compat) but are walled off from auth admin.
- `'config'` accepted as root: putting a token in dkg.config.yaml is
  itself an operator action — documented in `requireRoot` JSDoc.
- Test in `daemon-token-mint.test.ts`: agent-source token → 403 from
  POST/GET/DELETE `/api/auth/tokens`. Sibling test: config-source token
  → 201 from POST.

Bug 3 (CRITICAL — runtime drift): the `/api/agent/register` route at
`agent-chat.ts:376` mutated only `validTokens` (the legacy Set), not
`tokenStore`. Freshly registered agents passed `httpAuthGuard` but got
403 from every scope-gated route until daemon restart.
- New `addTokenToStore(store, validTokens, record)` and
  `removeTokenFromStore(store, validTokens, prefix)` helpers in
  `token-store.ts` mutate BOTH structures in lockstep.
- All four mutation sites — startup loader (lifecycle.ts), runtime
  agent register (agent-chat.ts), mint route (auth.ts), revoke route
  (auth.ts) — now go through these helpers.
- Test in `daemon-auth-scopes.test.ts` exercises the runtime add path:
  call `addTokenToStore` → use the new token on a kafka GET → 200.

Bug 4 (operator UX): `list-tokens` returned every store entry but
DELETE only knew about disk-sourced rows, leaving operators confused
about which entries are revocable.
- `toPublicRecord` now includes `source`. The wire shape and the
  `ApiClient.listAuthTokens` return type expose it.
- CLI table is now 4-column: PREFIX | SOURCE | SCOPES | NAME / CREATED.
  Operator can see at a glance which rows are file-sourced (revocable
  via DELETE), config-sourced (requires editing dkg.config.yaml), or
  agent-sourced (lives in daemon memory only).
- Smoke test asserts the SOURCE column renders, including the
  agent-source row that's flagged but not revocable here.

Test deltas:
- token-store.test.ts: 30 → 32 (parser stamps `source: 'file'`;
  addTokenToStore + removeTokenFromStore unit tests).
- daemon-auth-scopes.test.ts: 3 → 6 (auth-disabled bypass on kafka,
  auth-disabled bypass on /api/auth/tokens, runtime addTokenToStore).
- daemon-token-mint.test.ts: 14 → 16 (agent-source rejected;
  config-source accepted).
- All other slice 06 test files updated to pass `authEnabled: true`
  in their ctx mocks (default flips the gate's bypass; the
  auth-disabled bypass has its own dedicated tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zsculac zsculac closed this May 5, 2026
@zsculac zsculac reopened this May 5, 2026
@zsculac zsculac closed this May 5, 2026
@zsculac zsculac reopened this May 5, 2026
Comment thread packages/cli/src/auth.ts
Comment thread packages/cli/src/daemon/routes/auth.ts
@zsculac
Copy link
Copy Markdown
Contributor Author

zsculac commented May 5, 2026

Codex round 2 — known limitations (deferred for v1)

Codex flagged two prefix-collision edge cases that are deliberately deferred to a future slice:

  • packages/cli/src/auth.ts — config-token dedupe uses full-token equality but the store is prefix-keyed. A config token sharing an 8-char prefix with a file token would silently overwrite it in memory.
  • packages/cli/src/daemon/routes/auth.ts — mint does not check prefix uniqueness before setTokenRecord(). A prefix collision would silently revoke the older token.

Why deferred: prefix is 8 chars of base64url (64⁸ ≈ 281 trillion combinations). The birthday-problem collision threshold is ~75M tokens for >1% probability. Real deployments are nowhere near this, and triggering the collision deliberately requires operator-root control of token generation (i.e. the attacker already has the keys).

Future fix (one-liner): bump prefix length to 16 chars in tokenPrefix() — drops collision probability to ~1 in 10²⁸. The full collision-detect-and-retry path is unnecessary.

Tracking for slice 07 cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant