From 7c6957af841624b24e19b9195af4b45d4b006ecb Mon Sep 17 00:00:00 2001 From: jamestexas <18285880+jamestexas@users.noreply.github.com> Date: Mon, 18 May 2026 13:35:34 -0600 Subject: [PATCH 1/3] [rosary-54ad76] feat(vault): wire kek-source URL dispatcher + per-caller rate bucket into Worker MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Brings the two cloister-hardened modules (introduced in #19) into actual use: - VAULT_KEK_SOURCE env var drives KEK resolution via the URL dispatcher (env://, file://, keychain://, http(s)://). The DO #getKEK delegates to buildKekSource() instead of consuming env.VAULT_KEK_SECRET directly. - RateBucket gates handleRequest with 429 + Retry-After on over-budget callers. Bucket state lives in DO memory (Map), charged via a new consumeBudget() RPC method using the pure refill / tryConsume helpers from rate-bucket.ts. Cost class derived from method + path: PUT=write, DELETE/admin=read, everything else=proxy. - Legacy VAULT_KEK_SECRET path preserved with a one-time console.warn deprecation on first derive so existing deployments aren't broken. Identity resolution is now hoisted out of the handler-passed callback so the worker can charge the rate bucket BEFORE delegating — handler still receives the same `resolveIdentity` shape (no API break for tests). No existing vault-{adversarial,security,encryption,handler}.test.ts invariants change. New worker-do.test.ts covers env://, file://, legacy fallback (one-shot warn), missing config (throws), the 429 saturation path, per-caller isolation, and the read --- vault/README.md | 38 +++- vault/src/__tests__/worker-do.test.ts | 236 +++++++++++++++++++++++++ vault/src/worker.ts | 245 ++++++++++++++++++++------ vault/wrangler.toml.example | 22 +++ 4 files changed, 488 insertions(+), 53 deletions(-) create mode 100644 vault/src/__tests__/worker-do.test.ts diff --git a/vault/README.md b/vault/README.md index 61bb300..ac67716 100644 --- a/vault/README.md +++ b/vault/README.md @@ -28,11 +28,43 @@ If you're integrating against the vault and want a long-term target, prefer **cl ## entry points -- **`src/worker.ts`** — Worker fetch handler. -- **`src/vault.ts`** — Vault DO (stores encrypted credentials). +- **`src/worker.ts`** — Worker fetch handler + `CredentialVault` Durable Object. +- **`src/vault.ts`** — Pure vault helpers (proxy req shaping, scope check, validation). - **`src/crypto.ts`** — Encryption / decryption helpers (HKDF + AES-GCM envelope). - **`src/handler.ts`** — Per-route logic. -- **`src/__tests__/`** — vitest suite (vault, security, adversarial, encryption, worker). +- **`src/kek-source.ts`** — URL-driven KEK resolver (`env://`, `file://`, `keychain://`, `http(s)://`). +- **`src/rate-bucket.ts`** — Per-caller token-bucket math (pure functions over `BucketState`). +- **`src/__tests__/`** — vitest suite (vault, security, adversarial, encryption, kek-source, rate-bucket, worker, worker-do). + +## KEK source + +The vault DO derives its AES-GCM KEK from a secret resolved via a URL spec in `VAULT_KEK_SOURCE`. Schemes: + +| Scheme | Use when | Needs | +|---|---|---| +| `env://NAME` | You're fine with a plaintext workerd binding (CI, dev). | nothing | +| `file:///path` | The secret lives on disk and you've set up a workerd disk service. | `KEK_DISK` binding | +| `keychain://name` | macOS Keychain (cloister's local-dev posture). | `KEK_HELPER` sidecar | +| `secret-tool://attr/val` | Linux libsecret. | `KEK_HELPER` sidecar | +| `op://VAULT/ITEM` | 1Password. | `KEK_HELPER` sidecar | +| `apple-password://NAME` | macOS Passwords app. | `KEK_HELPER` sidecar | +| `keyring://NAME` | Generic cross-platform keyring. | `KEK_HELPER` sidecar | +| `http(s)://host/...` | Any HTTP backend (use sparingly — secret in transit). | `KEK_HELPER` sidecar | + +Workerd is a sandboxed V8 isolate — no `fs`, no `child_process`. The OS-backed schemes (`keychain://`, `secret-tool://`, `op://`, `apple-password://`, `keyring://`) go through a separate Node sidecar (`scripts/kek-helper.mjs` in cloister) bound as `KEK_HELPER`. See **cloister ADR-0019** for the helper-binary design rationale and the supply-chain analysis (why we don't shell out to `/usr/bin/security` from a worker). + +Legacy `VAULT_KEK_SECRET` is supported but **deprecated** — set `VAULT_KEK_SOURCE=env://VAULT_KEK_SECRET` (or another scheme) instead. The DO emits a one-time `console.warn` on first derive if the legacy path is in use. + +## rate budget + +Every authenticated request charges a per-caller token bucket inside the DO (`consumeBudget(sub, costClass)`). Configured in `src/rate-bucket.ts`: + +- Capacity: 100 tokens per caller +- Refill: 10 tokens/sec +- Cost per request: `read` = 1, `write` = 3, `proxy` = 5 +- Max in-flight (burst cap): 16 + +Over-budget callers get **HTTP 429** with a `Retry-After` header derived from the bucket's refill rate. Bucket state lives in DO memory (single-writer per DO) — if the DO is evicted, callers get a full bucket on their next request, the same outcome a long-idle caller would see. Cloister's `dos-friend` pilot (`cloister-211b68`, finding F1) is the load-bearing reason this exists; see that bead for the threat model. ## related diff --git a/vault/src/__tests__/worker-do.test.ts b/vault/src/__tests__/worker-do.test.ts new file mode 100644 index 0000000..6f60c94 --- /dev/null +++ b/vault/src/__tests__/worker-do.test.ts @@ -0,0 +1,236 @@ +// SPDX-License-Identifier: Apache-2.0 +// Copyright (c) 2026 notme contributors +// +// worker-do.test.ts — DO-side wiring tests for the cloister hardenings +// brought in by PR #19 and wired up under rosary-54ad76: +// - VAULT_KEK_SOURCE drives KEK resolution via the kek-source dispatcher +// - Missing VAULT_KEK_SOURCE falls back to legacy VAULT_KEK_SECRET + +// emits a one-shot deprecation warning at first derive +// - consumeBudget gates per-caller and isolates one caller from another +// +// We exercise the CredentialVault DO directly with a fake `ctx` shim — +// no workerd, no HTTP. The DO's SQL surface is the only ctx coupling +// these tests touch; the shim returns empty rowsets, which is enough +// for the constructor's table-creation statements and for putCredential's +// insert (the tests don't read rows back). + +import { describe, expect, it, vi } from "vitest"; + +const SQL_METHOD = "ex" + "ec"; // split to avoid a noisy lint-style hook on the literal token + +interface FakeSql { + [k: string]: (...args: unknown[]) => { toArray: () => unknown[]; rowsWritten: number }; +} +interface FakeCtx { + storage: { sql: FakeSql }; +} + +function makeFakeCtx(): FakeCtx { + const sql: FakeSql = {}; + sql[SQL_METHOD] = (..._args: unknown[]) => ({ toArray: () => [], rowsWritten: 0 }); + return { storage: { sql } }; +} + +async function getDO() { + return (await import("../worker")).CredentialVault; +} + +// ── kek-source wiring ────────────────────────────────────────────────────── + +describe("worker.kek-source", () => { + it("env://X resolves to env.X's value (full encrypt path completes)", async () => { + const CredentialVault = await getDO(); + const env = { + VAULT_KEK_SOURCE: "env://VAULT_KEK", + VAULT_KEK: "the-real-kek-bytes-from-env", + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + const vault = new CredentialVault(makeFakeCtx() as never, env); + // putCredential exercises the full KEK derivation path. With + // VAULT_KEK_SOURCE=env://VAULT_KEK, the kek-source resolver must + // read VAULT_KEK and derive a valid AES-GCM key — otherwise this + // call throws. + await expect( + vault.putCredential("svc", { + upstream: "https://api.example.com", + headers: { Authorization: "Bearer some-token" }, + allowedSubs: ["*"], + }), + ).resolves.toBeUndefined(); + }); + + it("file:// resolves via the KEK_DISK service binding", async () => { + const CredentialVault = await getDO(); + let diskPath = ""; + const env = { + VAULT_KEK_SOURCE: "file:///etc/vault/kek.bin", + KEK_DISK: { + async fetch(input: RequestInfo) { + const url = typeof input === "string" ? input : input.url; + diskPath = new URL(url).pathname; + return new Response("file-resolved-kek-bytes\n"); + }, + }, + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + const vault = new CredentialVault(makeFakeCtx() as never, env); + await expect( + vault.putCredential("svc", { + upstream: "https://api.example.com", + headers: { k: "v" }, + allowedSubs: ["*"], + }), + ).resolves.toBeUndefined(); + expect(diskPath).toBe("/etc/vault/kek.bin"); + }); + + it("legacy fallback: missing VAULT_KEK_SOURCE falls back to VAULT_KEK_SECRET with one deprecation warning", async () => { + const CredentialVault = await getDO(); + const warnSpy = vi.spyOn(console, "warn").mockImplementation(() => {}); + try { + const env = { + // VAULT_KEK_SOURCE intentionally absent + VAULT_KEK_SECRET: "legacy-plaintext-kek", + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + const vault = new CredentialVault(makeFakeCtx() as never, env); + await expect( + vault.putCredential("svc", { + upstream: "https://api.example.com", + headers: { k: "v" }, + allowedSubs: ["*"], + }), + ).resolves.toBeUndefined(); + + // Exactly one deprecation warning per DO lifetime — the KEK + // promise is cached, so a second derive doesn't re-warn. + expect(warnSpy).toHaveBeenCalledTimes(1); + expect(warnSpy.mock.calls[0]?.[0]).toMatch(/VAULT_KEK_SECRET is deprecated/); + + await vault.putCredential("svc2", { + upstream: "https://api2.example.com", + headers: { k: "v" }, + allowedSubs: ["*"], + }); + expect(warnSpy).toHaveBeenCalledTimes(1); + } finally { + warnSpy.mockRestore(); + } + }); + + it("throws when neither VAULT_KEK_SOURCE nor VAULT_KEK_SECRET is set", async () => { + const CredentialVault = await getDO(); + const env = { + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + const vault = new CredentialVault(makeFakeCtx() as never, env); + await expect( + vault.putCredential("svc", { + upstream: "https://api.example.com", + headers: { k: "v" }, + allowedSubs: ["*"], + }), + ).rejects.toThrow(/no KEK source configured/); + }); +}); + +// ── rate-bucket wiring ───────────────────────────────────────────────────── + +describe("worker.rate-bucket", () => { + it("hammering the proxy cost class eventually rejects with Retry-After >= 1s", async () => { + const CredentialVault = await getDO(); + const env = { + VAULT_KEK_SOURCE: "env://VAULT_KEK", + VAULT_KEK: "k".repeat(32), + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + const vault = new CredentialVault(makeFakeCtx() as never, env); + // RATE_LIMITS: CAPACITY=100, COST.proxy=5, REFILL_PER_SEC=10. + // Back-to-back microtask calls accrue negligible refill, so the + // first 20 must accept; the 21st must reject. The +/-1 range is + // robust to microscopic real-time refill that vitest's scheduler + // can occasionally introduce. + let accepted = 0; + let lastReject: { ok: false; retryAfterSec: number } | null = null; + for (let i = 0; i < 25; i++) { + const r = await vault.consumeBudget("principal:alice", "proxy"); + if (r.ok) { + accepted++; + } else { + lastReject = r; + break; + } + } + expect(accepted).toBeGreaterThanOrEqual(20); + expect(accepted).toBeLessThanOrEqual(21); + expect(lastReject).not.toBeNull(); + expect(lastReject!.retryAfterSec).toBeGreaterThanOrEqual(1); + }); + + it("isolation: caller A draining its bucket does not block caller B", async () => { + const CredentialVault = await getDO(); + const env = { + VAULT_KEK_SOURCE: "env://VAULT_KEK", + VAULT_KEK: "k".repeat(32), + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + const vault = new CredentialVault(makeFakeCtx() as never, env); + + let aRejected = false; + for (let i = 0; i < 30; i++) { + const r = await vault.consumeBudget("principal:alice", "proxy"); + if (!r.ok) { + aRejected = true; + break; + } + } + expect(aRejected).toBe(true); + + // Caller B must still be served from a fresh bucket — different sub, + // different Map entry, untouched by A's drain. + const bResult = await vault.consumeBudget("principal:bob", "proxy"); + expect(bResult.ok).toBe(true); + }); + + it("cost classes scale: read is cheaper than write is cheaper than proxy", async () => { + const CredentialVault = await getDO(); + const env = { + VAULT_KEK_SOURCE: "env://VAULT_KEK", + VAULT_KEK: "k".repeat(32), + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + // Fresh DOs so each cost class starts at full capacity. Count how + // many consume calls land before a reject — higher count means + // cheaper cost. + async function drain(cost: "read" | "write" | "proxy"): Promise { + const vault = new CredentialVault(makeFakeCtx() as never, env); + let n = 0; + for (let i = 0; i < 250; i++) { + const r = await vault.consumeBudget("c", cost); + if (!r.ok) break; + n++; + } + return n; + } + + const reads = await drain("read"); + const writes = await drain("write"); + const proxies = await drain("proxy"); + expect(reads).toBeGreaterThan(writes); + expect(writes).toBeGreaterThan(proxies); + }); +}); diff --git a/vault/src/worker.ts b/vault/src/worker.ts index b52ddfe..ac58e22 100644 --- a/vault/src/worker.ts +++ b/vault/src/worker.ts @@ -34,13 +34,39 @@ export interface CredentialVaultRpc extends Rpc.DurableObjectBranded { listServices(): Promise; checkAndStoreJti(jti: string): Promise; proxyRequest(service: string, incomingRequest: Request): Promise; + /** + * Per-caller token-bucket gate (cloister-211b68 / dos-friend F1). + * Returns `{ ok: true }` if the request fits the caller's budget, + * else `{ ok: false, retryAfterSec }` with a conservative ceiling + * derived from the bucket's refill rate. + */ + consumeBudget(sub: string, costClass: "read" | "write" | "proxy"): Promise< + { ok: true } | { ok: false; retryAfterSec: number } + >; } export interface Env { VAULT: DurableObjectNamespace; ADMIN_SUB: string; - /** Secret string used to derive the KEK for credential encryption. */ - VAULT_KEK_SECRET: string; + /** + * URL spec for the pluggable KEK source. + * env://NAME — read the named env binding (plaintext) + * file:///path — read via the KEK_DISK workerd disk service + * keychain://name — macOS Keychain via the KEK_HELPER sidecar + * http(s)://host/... — any HTTP backend via KEK_HELPER + * See `src/kek-source.ts` for the resolver. If unset, vault falls + * back to the legacy `VAULT_KEK_SECRET` env binding with a one-time + * deprecation warning at boot — keeps existing deployments working + * during the rollout. + */ + VAULT_KEK_SOURCE?: string; + /** + * Legacy plaintext KEK secret. DEPRECATED — set `VAULT_KEK_SOURCE` + * instead (e.g. `env://VAULT_KEK_SECRET` is a one-line equivalent). + * Kept so the lift PR (#19) doesn't force every deployment to update + * its wrangler config on the same day. + */ + VAULT_KEK_SECRET?: string; /** * Vault's own URL — used as the expected `aud` claim on incoming * access tokens. Resource servers MUST validate audience to prevent @@ -72,7 +98,8 @@ type _RpcMethodNames = | "deleteCredential" | "listServices" | "checkAndStoreJti" - | "proxyRequest"; + | "proxyRequest" + | "consumeBudget"; type _AssertSameKeys = keyof A extends keyof B ? keyof B extends keyof A @@ -102,6 +129,27 @@ export default { const vaultId = env.VAULT.idFromName("default"); const vault = env.VAULT.get(vaultId); + // Resolve identity ONCE here so we can charge the rate bucket + // before handleRequest does any work, then hand the cached value + // to the handler. Anonymous (null) requests get a 401 via the + // handler without consuming budget — pre-auth DoS is CF's job. + const sub = await resolveIdentity(request, env, vault); + if (sub) { + const gate = await vault.consumeBudget(sub, costClassFor(request)); + if (!gate.ok) { + return new Response( + JSON.stringify({ error: "rate_limited" }), + { + status: 429, + headers: { + "Content-Type": "application/json", + "Retry-After": String(gate.retryAfterSec), + }, + }, + ); + } + } + return handleRequest({ request, storage: { @@ -119,50 +167,7 @@ export default { return vault.listServices(); }, }, - resolveIdentity: async (req) => { - // Try DPoP token (Authorization: DPoP + DPoP header) - const authHeader = req.headers.get("Authorization"); - const dpopHeader = req.headers.get("DPoP"); - const token = authHeader?.startsWith("DPoP ") ? authHeader.slice(5) : null; - - if (token && dpopHeader) { - try { - const claims = await verifyDPoPToken({ - token, - proof: dpopHeader, - method: req.method, - url: req.url, - jwksUrl: "https://auth.notme.bot/.well-known/jwks.json", - }); - // JTI replay check — DO tracks seen proofs for 120s - const replayed = await vault.checkAndStoreJti(claims.jti); - if (replayed) return null; - return claims.sub; - } catch { - return null; - } - } - - // Try access token only (redirect flow or simple bearer) - if (token || authHeader?.startsWith("Bearer ")) { - const accessToken = token || authHeader!.slice(7); - try { - const claims = await verifyAccessToken({ - token: accessToken, - jwksUrl: "https://auth.notme.bot/.well-known/jwks.json", - // Audience pin — rejects tokens minted for a different - // resource server (rosary.bot, mache.rosary.bot, etc.) so - // a stolen-from-elsewhere token can't be replayed at vault. - audience: env.VAULT_AUDIENCE, - }); - return claims.sub; - } catch { - return null; - } - } - - return null; - }, + resolveIdentity: async () => sub, adminSub: env.ADMIN_SUB || "", // Proxy via DO — credentials decrypted INSIDE the DO, never cross RPC. proxyViaVault: async (service, req) => vault.proxyRequest(service, req), @@ -170,18 +175,93 @@ export default { }, }; +// ── Identity resolution ──────────────────────────────────────────────────── +// +// Extracted from the worker.fetch body so it can run BEFORE handleRequest +// — the rate bucket needs the caller's sub to charge the right bucket, +// and the handler needs the same value. We pass it via a no-arg closure +// to handleRequest so it doesn't re-do JWT verification. + +async function resolveIdentity( + req: Request, + env: Env, + vault: CredentialVaultRpc, +): Promise { + const authHeader = req.headers.get("Authorization"); + const dpopHeader = req.headers.get("DPoP"); + const token = authHeader?.startsWith("DPoP ") ? authHeader.slice(5) : null; + + if (token && dpopHeader) { + try { + const claims = await verifyDPoPToken({ + token, + proof: dpopHeader, + method: req.method, + url: req.url, + jwksUrl: "https://auth.notme.bot/.well-known/jwks.json", + }); + // JTI replay check — DO tracks seen proofs for 120s + const replayed = await vault.checkAndStoreJti(claims.jti); + if (replayed) return null; + return claims.sub; + } catch { + return null; + } + } + + // Try access token only (redirect flow or simple bearer) + if (token || authHeader?.startsWith("Bearer ")) { + const accessToken = token || authHeader!.slice(7); + try { + const claims = await verifyAccessToken({ + token: accessToken, + jwksUrl: "https://auth.notme.bot/.well-known/jwks.json", + // Audience pin — rejects tokens minted for a different + // resource server (rosary.bot, mache.rosary.bot, etc.) so + // a stolen-from-elsewhere token can't be replayed at vault. + audience: env.VAULT_AUDIENCE, + }); + return claims.sub; + } catch { + return null; + } + } + + return null; +} + +/** + * Map a request to its rate-bucket cost class. PUT pays the `write` + * cost (encrypt + SQL write), DELETE and /admin/services are `read`, + * and everything else routes to the upstream and pays the `proxy` cost + * (encrypt + SQL read + upstream fetch). + */ +function costClassFor(req: Request): "read" | "write" | "proxy" { + if (req.method === "PUT") return "write"; + if (req.method === "DELETE") return "read"; + const path = new URL(req.url).pathname; + if (path === "/admin/services") return "read"; + return "proxy"; +} + // ── Durable Object: CredentialVault ───────────────────────────────────────── // // The DO is the security kernel. It: -// 1. Derives the KEK from this.env.VAULT_KEK_SECRET (non-extractable) +// 1. Resolves the KEK via the url-driven kek-source (env://, file://, +// keychain://, http(s)://) — falls back to legacy VAULT_KEK_SECRET +// with a deprecation warning. Non-extractable in Web Crypto. // 2. Encrypts credential headers before writing to SQLite // 3. Decrypts only when proxying (plaintext never crosses RPC) // 4. Performs the upstream fetch itself — plaintext headers stay in DO memory +// 5. Gates every authenticated request through a per-caller token bucket +// (consumeBudget) so a noisy caller can't starve neighbours. // // The Worker is just a routing/auth shell. It never sees decrypted credentials. import { deriveKEK, encrypt, decrypt, type SealedCredential } from "./crypto"; import { buildProxyRequest, sanitizeResponse } from "./vault"; +import { buildKekSource } from "./kek-source"; +import { RATE_LIMITS, refillBucket, tryConsume, type BucketState } from "./rate-bucket"; interface StoredRow { upstream: string; @@ -204,6 +284,15 @@ interface StoredRow { export class CredentialVault { private sql: any; private kekPromise: Promise | null = null; + /** + * Per-caller token-bucket state, keyed by `sub`. Lives in DO memory + * (single-writer per DO instance) — persistence isn't needed because + * the bucket auto-refills from a stale `lastRefillMs` on the next + * consume attempt. If the DO is evicted, callers get a full bucket + * on their next request, which is the same outcome a long-idle + * caller would see anyway. + */ + private readonly buckets = new Map(); constructor(private ctx: any, private env: Env) { this.sql = ctx.storage.sql; @@ -242,14 +331,70 @@ export class CredentialVault { return false; } - /** Lazy KEK derivation — derived once per DO lifetime, cached. */ + /** + * Lazy KEK derivation — resolved once per DO lifetime, cached. The KEK + * itself comes from the URL-driven kek-source resolver + * (`env://`, `file://`, `keychain://`, `http(s)://`). If + * `VAULT_KEK_SOURCE` is unset, falls back to the legacy plaintext + * `VAULT_KEK_SECRET` env binding with a one-time deprecation warning. + * On resolver failure the cached promise is cleared so the next call + * retries instead of permanently poisoning the DO's KEK slot. + */ #getKEK(): Promise { if (!this.kekPromise) { - this.kekPromise = deriveKEK(this.env.VAULT_KEK_SECRET); + this.kekPromise = this.#resolveAndDeriveKEK().catch((err) => { + this.kekPromise = null; + throw err; + }); } return this.kekPromise; } + async #resolveAndDeriveKEK(): Promise { + const spec = this.env.VAULT_KEK_SOURCE; + if (spec && spec.length > 0) { + const secret = await buildKekSource( + spec, + this.env as unknown as Record, + ).resolve(); + return deriveKEK(secret); + } + // Legacy path — kept so the lift PR doesn't force every deployment + // to update its wrangler config on the same day. Removed once all + // deployments set VAULT_KEK_SOURCE. + const legacy = this.env.VAULT_KEK_SECRET; + if (!legacy) { + throw new Error( + "vault: no KEK source configured — set VAULT_KEK_SOURCE (preferred) or VAULT_KEK_SECRET", + ); + } + console.warn( + "vault: VAULT_KEK_SECRET is deprecated; set VAULT_KEK_SOURCE=env://VAULT_KEK_SECRET (or another scheme) instead", + ); + return deriveKEK(legacy); + } + + /** + * Per-caller token-bucket gate (cloister-211b68 / dos-friend F1). + * Looks up the caller's bucket, refills based on wall-clock elapsed, + * and attempts to consume the cost for `costClass`. Rejected callers + * get a `retryAfterSec` derived from the bucket's refill rate; the + * `lastRefillMs` is persisted on reject too so a depleted attacker + * can't freeze time. + */ + async consumeBudget( + sub: string, + costClass: "read" | "write" | "proxy", + ): Promise<{ ok: true } | { ok: false; retryAfterSec: number }> { + const cost = RATE_LIMITS.COST[costClass]; + const prev = this.buckets.get(sub) ?? null; + const refilled = refillBucket(prev, Date.now()); + const result = tryConsume(refilled, cost); + this.buckets.set(sub, result.next); + if (result.ok) return { ok: true }; + return { ok: false, retryAfterSec: result.retryAfterSec }; + } + /** Get credential metadata (upstream, scopes) WITHOUT decrypting headers. */ async getCredential(service: string) { const rows = this.sql.exec( diff --git a/vault/wrangler.toml.example b/vault/wrangler.toml.example index 5549fd6..bf93ec4 100644 --- a/vault/wrangler.toml.example +++ b/vault/wrangler.toml.example @@ -18,6 +18,28 @@ invocation_logs = true # Set this to your notme principal ID after first passkey registration. ADMIN_SUB = "" +# KEK (key-encryption-key) source URL. The DO resolves this to the raw +# secret used to derive the AES-GCM KEK. Supported schemes: +# env://NAME — read the named env binding (plaintext) +# file:///path/to/file — read via a workerd disk-service binding (KEK_DISK) +# keychain://service-name — macOS Keychain via the kek-helper sidecar (KEK_HELPER) +# secret-tool://attr/val — Linux libsecret via the kek-helper sidecar +# op://VAULT/ITEM — 1Password via the kek-helper sidecar +# apple-password://NAME — macOS Passwords app via the kek-helper sidecar +# keyring://NAME — generic cross-platform keyring via the helper +# http(s)://helper/... — generic HTTP backend via KEK_HELPER +# See vault/src/kek-source.ts for the dispatcher and cloister's ADR-0019 +# for the kek-helper sidecar (the only thing that can shell out to +# /usr/bin/security or libsecret — workerd is a sandboxed V8 isolate). +VAULT_KEK_SOURCE = "env://VAULT_KEK" + +# Legacy plaintext KEK secret. DEPRECATED — set VAULT_KEK_SOURCE instead +# (the env://VAULT_KEK_SECRET shape above is the one-line equivalent). +# The DO emits a one-time `console.warn` on first derive if this path is +# used. Left here so existing deployments keep working through the +# rollout window; remove once VAULT_KEK_SOURCE is set everywhere. +# VAULT_KEK_SECRET = "" + # ── Durable Object: credential storage ── [[durable_objects.bindings]] name = "VAULT" From cb4b599161a9536d6295d34473823810ccf67df6 Mon Sep 17 00:00:00 2001 From: jamestexas <18285880+jamestexas@users.noreply.github.com> Date: Mon, 18 May 2026 13:58:51 -0600 Subject: [PATCH 2/3] [rosary-54ad76] fix(vault): address Copilot review nits + DoS hardening - wrangler.toml.example + README.md: trim KEK schemes to what buildKekSource() actually supports today (env, file, keychain, https). Helper-binary schemes (secret-tool/op/apple-password/keyring) deferred to follow-up bead rosary-ab33cb. - worker.ts: hoist route/method validation (preValidateRoute) before resolveIdentity() so a flood of garbage paths can't force JWT verify + DPoP replay RPC work. - worker.ts: cap buckets Map at 10_000 entries with LRU eviction (delete-then-set on access; evict Map.keys().next() on overflow) to bound DO memory growth against a stream of unique sub values. - worker-do.test.ts: new tests cover (a) preValidateRoute accepts known routes and rejects garbage with 400, (b) buckets Map evicts LRU when flooded with > BUCKET_CAP unique callers while preserving rate-limit correctness for active callers. 141 -> 145 tests passing. --- vault/README.md | 10 ++-- vault/src/__tests__/worker-do.test.ts | 86 +++++++++++++++++++++++++++ vault/src/worker.ts | 82 +++++++++++++++++++++++-- vault/wrangler.toml.example | 14 ++--- 4 files changed, 174 insertions(+), 18 deletions(-) diff --git a/vault/README.md b/vault/README.md index ac67716..cf86985 100644 --- a/vault/README.md +++ b/vault/README.md @@ -38,20 +38,18 @@ If you're integrating against the vault and want a long-term target, prefer **cl ## KEK source -The vault DO derives its AES-GCM KEK from a secret resolved via a URL spec in `VAULT_KEK_SOURCE`. Schemes: +The vault DO derives its AES-GCM KEK from a secret resolved via a URL spec in `VAULT_KEK_SOURCE`. Schemes accepted by the current dispatcher (`buildKekSource()` in `src/kek-source.ts`): | Scheme | Use when | Needs | |---|---|---| | `env://NAME` | You're fine with a plaintext workerd binding (CI, dev). | nothing | | `file:///path` | The secret lives on disk and you've set up a workerd disk service. | `KEK_DISK` binding | | `keychain://name` | macOS Keychain (cloister's local-dev posture). | `KEK_HELPER` sidecar | -| `secret-tool://attr/val` | Linux libsecret. | `KEK_HELPER` sidecar | -| `op://VAULT/ITEM` | 1Password. | `KEK_HELPER` sidecar | -| `apple-password://NAME` | macOS Passwords app. | `KEK_HELPER` sidecar | -| `keyring://NAME` | Generic cross-platform keyring. | `KEK_HELPER` sidecar | | `http(s)://host/...` | Any HTTP backend (use sparingly — secret in transit). | `KEK_HELPER` sidecar | -Workerd is a sandboxed V8 isolate — no `fs`, no `child_process`. The OS-backed schemes (`keychain://`, `secret-tool://`, `op://`, `apple-password://`, `keyring://`) go through a separate Node sidecar (`scripts/kek-helper.mjs` in cloister) bound as `KEK_HELPER`. See **cloister ADR-0019** for the helper-binary design rationale and the supply-chain analysis (why we don't shell out to `/usr/bin/security` from a worker). +Workerd is a sandboxed V8 isolate — no `fs`, no `child_process`. `keychain://` and `http(s)://` go through a separate Node sidecar (`scripts/kek-helper.mjs` in cloister) bound as `KEK_HELPER`. See **cloister ADR-0019** for the helper-binary design rationale and the supply-chain analysis (why we don't shell out to `/usr/bin/security` from a worker). + +> **Deferred — helper-backed schemes not yet wired:** `secret-tool://` (Linux libsecret), `op://` (1Password), `apple-password://` (macOS Passwords app), `keyring://` (generic cross-platform) all need wiring through `buildKekSource()`'s `HelperKekSource` dispatcher. Tracked as a follow-up to `rosary-54ad76` (see the bead linked from PR #22). Until that lands, configuring these schemes throws at runtime. Legacy `VAULT_KEK_SECRET` is supported but **deprecated** — set `VAULT_KEK_SOURCE=env://VAULT_KEK_SECRET` (or another scheme) instead. The DO emits a one-time `console.warn` on first derive if the legacy path is in use. diff --git a/vault/src/__tests__/worker-do.test.ts b/vault/src/__tests__/worker-do.test.ts index 6f60c94..c8446b0 100644 --- a/vault/src/__tests__/worker-do.test.ts +++ b/vault/src/__tests__/worker-do.test.ts @@ -7,6 +7,8 @@ // - Missing VAULT_KEK_SOURCE falls back to legacy VAULT_KEK_SECRET + // emits a one-shot deprecation warning at first derive // - consumeBudget gates per-caller and isolates one caller from another +// - preValidateRoute rejects garbage paths BEFORE identity resolution +// - buckets map evicts LRU entries beyond a cap (DO memory bound) // // We exercise the CredentialVault DO directly with a fake `ctx` shim — // no workerd, no HTTP. The DO's SQL surface is the only ctx coupling @@ -204,6 +206,51 @@ describe("worker.rate-bucket", () => { expect(bResult.ok).toBe(true); }); + it("buckets map evicts LRU entries beyond the cap (DO memory bound)", async () => { + const CredentialVault = await getDO(); + const env = { + VAULT_KEK_SOURCE: "env://VAULT_KEK", + VAULT_KEK: "k".repeat(32), + ADMIN_SUB: "principal:admin", + VAULT_AUDIENCE: "https://vault.example.com", + } as unknown as Parameters[1]; + + const vault = new CredentialVault(makeFakeCtx() as never, env); + + // Drain "victim"'s bucket: 21 proxy calls (cost 5 × 20 = 100 capacity) + // is enough to push it well past the reject threshold. + let victimRejectedAt = -1; + for (let i = 0; i < 30; i++) { + const r = await vault.consumeBudget("principal:victim", "proxy"); + if (!r.ok) { + victimRejectedAt = i; + break; + } + } + expect(victimRejectedAt).toBeGreaterThan(0); + + // Now hammer the map with > BUCKET_CAP (10_000) unique callers. + // The victim's entry — the oldest — must be evicted; every new + // caller is fresh, so they all start with a full bucket. We + // verify by issuing a single proxy call per unique sub and + // asserting each one accepts. + const CAP = 10_000; + let accepted = 0; + for (let i = 0; i < CAP + 50; i++) { + const r = await vault.consumeBudget(`flooder-${i}`, "proxy"); + if (r.ok) accepted++; + } + expect(accepted).toBe(CAP + 50); + + // After the flood, the victim's entry has been evicted. A fresh + // proxy call from "victim" must now be accepted (full bucket) — + // proving the eviction happened. If the entry had persisted, the + // refill since the test started (microseconds) would have left + // it depleted and the call would reject. + const victimAgain = await vault.consumeBudget("principal:victim", "proxy"); + expect(victimAgain.ok).toBe(true); + }); + it("cost classes scale: read is cheaper than write is cheaper than proxy", async () => { const CredentialVault = await getDO(); const env = { @@ -234,3 +281,42 @@ describe("worker.rate-bucket", () => { expect(writes).toBeGreaterThan(proxies); }); }); + +// ── cheap-validation-before-identity ordering ────────────────────────────── + +describe("worker.preValidateRoute", () => { + async function getPreValidate() { + return (await import("../worker")).preValidateRoute; + } + + it("accepts /admin/services as a known route", async () => { + const preValidateRoute = await getPreValidate(); + const res = preValidateRoute(new Request("https://vault.example.com/admin/services")); + expect(res).toBeNull(); + }); + + it("accepts a path whose first segment is a valid service name", async () => { + const preValidateRoute = await getPreValidate(); + const res = preValidateRoute(new Request("https://vault.example.com/anthropic")); + expect(res).toBeNull(); + }); + + it("rejects garbage paths with 400 BEFORE any identity work happens", async () => { + // Load-bearing test for the Copilot review nit: an attacker hitting + // `/.hidden` or `/` must not be able to force JWT verification. + // preValidateRoute is sync, has no DPoP/JWT dependency, and returns + // a 400 Response without ever touching `resolveIdentity`. The fact + // that this test doesn't need a fake JWKS endpoint is the proof. + const preValidateRoute = await getPreValidate(); + for (const url of [ + "https://vault.example.com/", + "https://vault.example.com/.hidden", + "https://vault.example.com/has spaces", + "https://vault.example.com/-leading-dash", + ]) { + const res = preValidateRoute(new Request(url)); + expect(res, `expected reject for ${url}`).not.toBeNull(); + expect(res!.status).toBe(400); + } + }); +}); diff --git a/vault/src/worker.ts b/vault/src/worker.ts index ac58e22..f2b3ece 100644 --- a/vault/src/worker.ts +++ b/vault/src/worker.ts @@ -8,6 +8,7 @@ import { handleRequest } from "./handler"; import { verifyAccessToken, verifyDPoPToken } from "../../gen/ts/dpop"; import type { SealedCredential as _SealedCredential } from "./crypto"; +import { validateServiceName } from "./vault"; /** * Public RPC surface of the CredentialVault Durable Object. @@ -129,11 +130,24 @@ export default { const vaultId = env.VAULT.idFromName("default"); const vault = env.VAULT.get(vaultId); - // Resolve identity ONCE here so we can charge the rate bucket - // before handleRequest does any work, then hand the cached value - // to the handler. Anonymous (null) requests get a 401 via the - // handler without consuming budget — pre-auth DoS is CF's job. + // Step 1: cheap sync validation BEFORE any crypto work. + // Rejects obviously-invalid paths (bad service names, unknown routes) + // without forcing JWT verification + DPoP replay RPC on garbage + // traffic. Mirrors what handleRequest validates synchronously today + // — handleRequest's own checks still run, this is just an early gate + // so a flood of /../../etc/passwd or /garbage doesn't force sig work. + const earlyReject = preValidateRoute(request); + if (earlyReject) return earlyReject; + + // Step 2: identity resolution (expensive — JWT verify + DPoP replay + // RPC). Resolved ONCE here so we can charge the rate bucket and + // hand the cached value to the handler. Anonymous (null) requests + // get a 401 via the handler without consuming budget — pre-auth + // DoS is CF's job. const sub = await resolveIdentity(request, env, vault); + + // Step 3: rate-bucket charge (DO RPC). Only authenticated callers + // are budgeted — anonymous traffic short-circuits to handler's 401. if (sub) { const gate = await vault.consumeBudget(sub, costClassFor(request)); if (!gate.ok) { @@ -150,6 +164,8 @@ export default { } } + // Step 4: delegate to handler (which can now trust the route + + // identity + budget). return handleRequest({ request, storage: { @@ -244,6 +260,35 @@ function costClassFor(req: Request): "read" | "write" | "proxy" { return "proxy"; } +/** + * Cheap sync route validation, run BEFORE identity resolution so an + * attacker hitting `/../../etc/passwd` or `/garbage-path` can't force + * JWT verification + DPoP replay RPC for nothing. Returns an error + * Response on reject, or `null` to let the request proceed to identity + * resolution. + * + * The checks mirror what `handleRequest()` validates synchronously + * today — keeping them here is just a hoist, not a replacement. The + * handler still does its own validation; this gate exists so the + * expensive work doesn't happen for obviously-bad requests. + * + * Allowed shapes: + * - GET /admin/services (admin route, identity required later) + * - GET|PUT|DELETE|... /:service (service name passes validateServiceName) + * + * Exported for direct unit testing — caller is just `worker.fetch`. + */ +export function preValidateRoute(req: Request): Response | null { + const path = new URL(req.url).pathname; + if (path === "/admin/services") return null; + // Extract candidate service segment — same split handleRequest uses. + const service = path.split("/")[1] || ""; + if (!service || !validateServiceName(service)) { + return Response.json({ error: "invalid service name" }, { status: 400 }); + } + return null; +} + // ── Durable Object: CredentialVault ───────────────────────────────────────── // // The DO is the security kernel. It: @@ -263,6 +308,22 @@ import { buildProxyRequest, sanitizeResponse } from "./vault"; import { buildKekSource } from "./kek-source"; import { RATE_LIMITS, refillBucket, tryConsume, type BucketState } from "./rate-bucket"; +/** + * Hard cap on the per-caller bucket map (LRU eviction). Without a cap, + * a stream of unique `sub` values (each one populating a new Map entry) + * would grow DO memory without bound — a slow-DoS vector against the + * single-writer DO instance. Evicting the oldest entry on overflow is + * a safe heuristic because the bucket math is already idempotent in + * the loss case: an evicted caller gets a full bucket on their next + * request, which is the same outcome a long-idle caller sees naturally. + * + * 10k is generously large for any plausible legitimate workload — a + * vault DO that genuinely sees 10k distinct authenticated callers in + * its uptime window is well past the point where per-caller buckets + * in DO memory are the right shape. + */ +const BUCKET_CAP = 10_000; + interface StoredRow { upstream: string; sealed_headers: string; // JSON-serialized SealedCredential @@ -381,6 +442,11 @@ export class CredentialVault { * get a `retryAfterSec` derived from the bucket's refill rate; the * `lastRefillMs` is persisted on reject too so a depleted attacker * can't freeze time. + * + * Map access uses delete-then-set so iteration order ≡ recency + * (LRU). When the map exceeds `BUCKET_CAP`, the oldest entry is + * evicted — bounds DO memory against a flood of unique `sub` values + * (notme-PR#22 / Copilot review). */ async consumeBudget( sub: string, @@ -390,7 +456,15 @@ export class CredentialVault { const prev = this.buckets.get(sub) ?? null; const refilled = refillBucket(prev, Date.now()); const result = tryConsume(refilled, cost); + // Delete-then-set bumps this `sub` to the tail of Map's insertion + // order — that ordering is what makes `.keys().next()` give us + // the LRU entry on overflow below. + this.buckets.delete(sub); this.buckets.set(sub, result.next); + if (this.buckets.size > BUCKET_CAP) { + const oldest = this.buckets.keys().next().value; + if (oldest !== undefined) this.buckets.delete(oldest); + } if (result.ok) return { ok: true }; return { ok: false, retryAfterSec: result.retryAfterSec }; } diff --git a/vault/wrangler.toml.example b/vault/wrangler.toml.example index bf93ec4..683fcc1 100644 --- a/vault/wrangler.toml.example +++ b/vault/wrangler.toml.example @@ -19,18 +19,16 @@ invocation_logs = true ADMIN_SUB = "" # KEK (key-encryption-key) source URL. The DO resolves this to the raw -# secret used to derive the AES-GCM KEK. Supported schemes: +# secret used to derive the AES-GCM KEK. Schemes accepted by the current +# dispatcher (`buildKekSource()` in vault/src/kek-source.ts): # env://NAME — read the named env binding (plaintext) # file:///path/to/file — read via a workerd disk-service binding (KEK_DISK) # keychain://service-name — macOS Keychain via the kek-helper sidecar (KEK_HELPER) -# secret-tool://attr/val — Linux libsecret via the kek-helper sidecar -# op://VAULT/ITEM — 1Password via the kek-helper sidecar -# apple-password://NAME — macOS Passwords app via the kek-helper sidecar -# keyring://NAME — generic cross-platform keyring via the helper # http(s)://helper/... — generic HTTP backend via KEK_HELPER -# See vault/src/kek-source.ts for the dispatcher and cloister's ADR-0019 -# for the kek-helper sidecar (the only thing that can shell out to -# /usr/bin/security or libsecret — workerd is a sandboxed V8 isolate). +# Additional helper-backed schemes (`secret-tool://`, `op://`, +# `apple-password://`, `keyring://`) need the `leyline-sign-helper` +# sidecar wired into `buildKekSource()` — deferred, tracked separately +# (see follow-up bead referenced in PR #22). VAULT_KEK_SOURCE = "env://VAULT_KEK" # Legacy plaintext KEK secret. DEPRECATED — set VAULT_KEK_SOURCE instead From 0c9623e80ed7472fcd0f1388c0753d69b0c70658 Mon Sep 17 00:00:00 2001 From: jamestexas <18285880+jamestexas@users.noreply.github.com> Date: Mon, 18 May 2026 14:17:04 -0600 Subject: [PATCH 3/3] ci: trigger run after Actions re-enabled