From 9020ad64de06d4c1085d39b32811e7d0ed4c834f Mon Sep 17 00:00:00 2001 From: Cooper Maruyama Date: Wed, 29 Apr 2026 05:15:48 -0700 Subject: [PATCH] ci: detect drift between source SOPS YAMLs and embedded @gen/env payloads MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #15/#17/#18 chased an Unauthorized: Authentication error in CI for roughly half a day. The proximate fix was a fresh Cloudflare token, but the real bug was that rotating that token in shared.sops.yaml never propagated into packages/gen/env/src/runtime/generated-payloads/_envs/deploy.ts, which is what loaders.deploy() actually decrypts at runtime. Codegen only runs on devshell entry; CI deploys never run codegen, so they happily shipped the old cfat_KJ57… value into every Worker request. This workflow closes the drift class with a hard CI gate: - On PR/push touching any source-of-truth path - enter the devshell and re-run stackpanel codegen build - git diff --quiet against the embedded runtime payloads under packages/gen/env/data/_envs/ and packages/gen/env/src/runtime/generated-payloads/_envs/ - on drift, print the affected files + a remediation command Verified locally that a simulated sops set edit triggers the failure and that a clean tree passes. Closes stackpanel-04d. --- .beads/issues.jsonl | 4 +- .github/workflows/secrets-codegen-check.yml | 135 ++++++++++++++++++++ 2 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 .github/workflows/secrets-codegen-check.yml diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index e3eeda0c..2fff653c 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -1,3 +1,6 @@ +{"id":"stackpanel-8yl","title":"Bootstrap alchemy v2 Cloudflare state store (one-time interactive)","description":"With the alchemy@2 migration + working CF API token, main's production deploys now fail with:\n\n AuthError: State store not found for script alchemy-state-store. Deploy the state store first.\n at node_modules/alchemy/src/Cloudflare/StateStore/State.ts:101:17\n\nThis is a new blocker class, exposed only after the auth fix landed (previously masked by 401s from the under-scoped token). It's a deliberate guard in alchemy v2:\n\n // TODO(sam): do we want to support bootstrapping the state store from CI?\n // for now - just die here\n\nThe state store provisioning flow (in alchemy/src/Cloudflare/StateStore/State.ts):\n\n 1. Read profile credentials cache (~/.alchemy/\u003cprofile\u003e/cloudflare-state-store)\n → if present, use it.\n 2. Else, query Cloudflare for the alchemy-state-store worker.\n 3. If the worker exists → loginWithCloudflare() (works in CI; uses the API\n token to read the secrets-store auth token via an edge-preview probe).\n 4. If the worker does NOT exist AND CI=true → die with the error above.\n 5. If the worker does NOT exist AND CI=false → interactive prompt; deploys\n the state store + secrets store + auth token.\n\nThe recommended remediation is a one-time interactive bootstrap from a maintainer's devshell:\n\n cd /path/to/stackpanel\n nix develop --impure\n bunx alchemy deploy --stage staging --yes # or production\n # When prompted \"Cloudflare State Store not found. Do you want to deploy it?\" → y\n # alchemy creates:\n # - Cloudflare Worker: alchemy-state-store\n # - Cloudflare Secrets Store: \u003csingle per-account\u003e\n # - Auth token in the secrets store\n # After this, every CI deploy on every branch can use Cloudflare.state()\n # because step (3) above succeeds.\n\nThe CF API token already provisioned (cfut_A8wV…) has all the scopes needed (verified\nvia curl probe: workers/scripts read+write, workers/subdomain, kv/namespaces,\nzones/.../workers/routes, workers/domains).\n\nAcceptance:\n- One-time deploy of alchemy-state-store completed (verified via\n GET /accounts/:id/workers/scripts/alchemy-state-store returning 200).\n- A subsequent CI Deploy Web run on main against --stage production succeeds\n through the Cloudflare.state() initialization step.\n- README/AGENTS.md updated with the bootstrap procedure so future maintainers\n don't repeat this discovery.\n\nFollow-up (longer term): file an upstream alchemy issue to support CI\nbootstrapping (maybe via an explicit `bunx alchemy state-store deploy`\nsubcommand), so projects don't need a one-shot human interaction.\n\nRefs:\n- main HEAD: 0f95da6f (Deploy Web Run 25107110360 — failed with this error)\n- comment in packages/infra/src/lib/deploy.ts:140-141 already documents this\n expectation: \"deployed on first interactive use; CI relies on it existing\"","status":"open","priority":0,"issue_type":"bug","owner":"me@cooperm.com","created_at":"2026-04-29T11:53:57Z","created_by":"Cooper Maruyama","updated_at":"2026-04-29T11:53:57Z","dependencies":[{"issue_id":"stackpanel-8yl","depends_on_id":"stackpanel-04d","type":"discovered-from","created_at":"2026-04-29T04:53:57Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0} +{"id":"stackpanel-04d","title":"codegen: regenerate embedded SOPS payloads on .sops.yaml edit + rekey","description":"The runtime alchemy deploy reads from the codegen-emitted module\n\\`packages/gen/env/src/runtime/generated-payloads/_envs/deploy.ts\\` (and its\ncompanion \\`data/_envs/deploy.sops.json\\`), not from\n\\`.stack/secrets/vars/shared.sops.yaml\\` directly.\n\nToday, those embedded payloads only get regenerated when the devshell hook\nruns codegen. That means edits to the source SOPS YAML — including the\ncommon \\`chore: rekey\\` flow and ad-hoc \\`sops set\\` rotations — silently\ndrift from what production deploys actually use. We hit this in PR-15 and\nPR-17: the Cloudflare API token rotation went into the source YAML and was\neven merged to main, but the embedded payload at HEAD continued decrypting\nto the old, under-scoped token, so every CI deploy after the rotation kept\n401-ing on \\`Cloudflare.Worker\\` create.\n\nFix options to consider (pick one in design):\n\n- Make the SOPS edit path (custom \\`sops\\` wrapper / hook) auto-run codegen\n for any \\`packages/gen/env/data/**.sops.*\\` or \\`.stack/secrets/**.sops.*\\`\n edit, and refuse to commit if the embedded payloads are stale.\n- Add a pre-commit hook that re-runs codegen and stages the resulting\n embedded files (mirrors the existing oxlint/format hook surface).\n- Or move the runtime to read the source SOPS YAML directly at deploy time\n instead of an embedded snapshot — eliminates the drift class entirely\n but is a much bigger change.\n\nAcceptance:\n- After any edit to \\`.stack/secrets/vars/*.sops.yaml\\` or rekey, the\n embedded deploy/app payloads under \\`packages/gen/env/\\` are guaranteed\n to be regenerated before the change can land on main.\n- A regression test (or CI check) fails the build if the source-derived\n plaintext for any secret diverges from what the embedded payload\n decrypts to with the same recipients.","status":"open","priority":1,"issue_type":"bug","owner":"me@cooperm.com","created_at":"2026-04-29T11:35:28Z","created_by":"Cooper Maruyama","updated_at":"2026-04-29T11:35:28Z","dependencies":[{"issue_id":"stackpanel-04d","depends_on_id":"stackpanel-49t","type":"discovered-from","created_at":"2026-04-29T04:35:28Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0} +{"id":"stackpanel-49t","title":"Restore .open-next/cache asset overlay once alchemy@2 supports AssetsProps.sources","description":"In the alchemy-effect → alchemy@2 migration (PR #16), we deleted the vendored OpenNext asset overlay (vendor/alchemy-effect-opennext-overlay/, scripts/apply-alchemy-effect-opennext-assets.ts, root postinstall hook) because:\n\n1. It was tied to alchemy-effect@0.12.x's file structure and is incompatible with v2's restructured Worker.ts/Assets.ts\n2. The script's own self-disable path explicitly instructs maintainers to delete the hook when no alchemy-effect@0.12.x installs are found\n\napps/docs/alchemy.run.ts had its assets.sources field commented out with a TODO referencing this issue:\n\n assets: {\n directory: '.open-next/assets',\n // TODO(stackpanel): re-enable the .open-next/cache overlay once\n // alchemy@2 natively supports AssetsProps.sources …\n config: { … },\n }\n\nImpact: OpenNext incremental cache misses for cdn-cgi/_next_cache paths fall back to ISR revalidation. Cache hit-rate regression, not a hard breakage.\n\nscripts/ALCHEMY_EFFECT_OPENNEXT_UPSTREAM.md (also deleted) tracked the upstream PR for AssetsProps.sources support — verify whether it landed in alchemy@2's main branch and is just pending a release, or whether it needs to be re-pitched.\n\nResolution path:\n- Option A: wait for upstream alchemy to ship native AssetsProps.sources, then uncomment the field in apps/docs/alchemy.run.ts\n- Option B: re-vendor a v2-compatible overlay (risky — alchemy@2 has restructured Worker.ts/Assets.ts internals)\n- Option C: switch to a different OpenNext cache strategy that doesn't require the overlay\n\nAcceptance: docs deploy serves cdn-cgi/_next_cache assets from Workers Assets directly (Option A or B), or this issue is closed as won't-fix with a documented alternative.","status":"open","priority":1,"issue_type":"feature","owner":"me@cooperm.com","created_at":"2026-04-29T09:17:00Z","created_by":"Cooper Maruyama","updated_at":"2026-04-29T11:39:57Z","dependencies":[{"issue_id":"stackpanel-49t","depends_on_id":"stackpanel-r7g","type":"discovered-from","created_at":"2026-04-29T02:17:00Z","created_by":"Cooper Maruyama","metadata":"{}"}],"comments":[{"id":"019dd909-e48a-7687-9a8f-1aac7f21fe0a","issue_id":"stackpanel-49t","author":"Cooper Maruyama","text":"Severity bump from regression-only to hard breakage. With current state, every Docs deploy on every branch (including main) fails Worker create with:\n\n UnknownCloudflareError: Uncaught TypeError: Cannot destructure property 'name' of '(intermediate value)' as it is undefined.\n at worker.js:1:23445 in createGenericHandler\n\nLatest reproductions:\n- main @ d97359fb (chore: rekey) — failed\n- main @ 66b3e57b (Merge PR #16 alchemy@2) — failed\n- claude/demo-via-project-swap @ 7b62e2c2 — failed (Run 25106523090)\n\nWeb deploys are unaffected and pass cleanly with the rotated CF token + regenerated embedded payload (see PR #15). Only Docs is broken, and the trace points at the Worker bundle itself, not at alchemy/CF auth — supporting Option B/C over A: even if upstream alchemy ships AssetsProps.sources, the createGenericHandler/DO-name destructure error is a separate failure to chase. Recommend bisecting the Docs Worker build (.open-next/worker.js) between the last-green Docs deploy and the alchemy@2 migration to find the exact regression site.","created_at":"2026-04-29T11:39:57Z"}],"dependency_count":0,"dependent_count":0,"comment_count":1} {"id":"stackpanel-r7g","title":"Fix broken bun install on main: alchemy-effect catalog reference","description":"Five packages reference `alchemy-effect: catalog:` (apps/api, apps/docs, apps/web, packages/db, packages/infra) but the root package.json#workspaces.catalog has no alchemy-effect entry. Result: bun install --frozen-lockfile fails with 'alchemy-effect@catalog: failed to resolve' on a clean clone of main.\n\nReproduction:\n rm -rf node_modules\n bun install --frozen-lockfile\n # error: alchemy-effect@catalog: failed to resolve (x5)\n\nRoot cause: introduced in commit dda9c459 'refactor: replace AWS EC2 infra with Cloudflare Workers + Neon, add agenix module' — the dep was added to packages but the catalog entry was never added.\n\nFix: add \"alchemy-effect\": \"^0.12.0\" to root package.json#workspaces.catalog (bun.lock already resolves alchemy-effect@0.12.0).\n\nSurfaced while working on PR #15 — could not run vite to regen routeTree.gen.ts after route deletions, had to hand-edit. Worktree node_modules from before the regression still work, masking the issue locally.","status":"closed","priority":1,"issue_type":"bug","owner":"me@cooperm.com","created_at":"2026-04-29T08:31:32Z","created_by":"Cooper Maruyama","updated_at":"2026-04-29T09:16:42Z","closed_at":"2026-04-29T09:16:42Z","close_reason":"Fixed by PR #16 — migrated workspace to alchemy@2.0.0-beta.20 (alchemy-effect rebrand). bun install now resolves cleanly and lockfile regenerates without the catalog miss.","dependency_count":0,"dependent_count":0,"comment_count":0} {"id":"stackpanel-os2.8","title":"Add Hetzner provision regression test with ephemeral instances","description":"Add a reproducible regression test for stackpanel provision using ephemeral Hetzner Cloud instances created on-demand via the hcloud API. The token already exists in SOPS as hetzner_api_key. The implementation should add the hcloud CLI to the devshell, create a disposable-machine test script, load hetzner_api_key from SOPS into HCLOUD_TOKEN, inject a temporary machine via .stack/config.local.nix, run stackpanel provision against it, verify the resulting NixOS host, and always clean up the instance.","design":"Prefer a real end-to-end infrastructure regression test over mocks for the final provision path, but keep verification safe and deterministic where possible. Use existing shell smoke test patterns for script structure and use .stack/config.local.nix for the highest-priority temporary machine override.","acceptance_criteria":"- hcloud is available in the devshell\n- tests/provision-hetzner-e2e.sh provisions an ephemeral CX22 in fsn1 from Debian 12\n- The script exports HCLOUD_TOKEN from the SOPS key hetzner_api_key\n- The script injects machine config via .stack/config.local.nix and cleans up in a trap\n- Justfile exposes a command to run the regression test and a dry-run mode","status":"closed","priority":1,"issue_type":"task","assignee":"Cooper Maruyama","owner":"me@cooperm.com","created_at":"2026-03-29T08:03:28Z","created_by":"Cooper Maruyama","updated_at":"2026-03-29T08:13:29Z","closed_at":"2026-03-29T08:13:29Z","close_reason":"Implemented: added hcloud to devshell, tests/provision-hetzner-e2e.sh, and Justfile entries. Commit d54bdbc7.","labels":["deployment","hetzner","testing"],"dependencies":[{"issue_id":"stackpanel-os2.8","depends_on_id":"stackpanel-os2","type":"parent-child","created_at":"2026-03-29T01:03:27Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0} {"id":"stackpanel-foe.4","title":"P4: Green test matrix on stackpanel","description":"Real deploys (not dry-run) of {docs,web} x {colmena,nixos-rebuild,fly} on stackpanel infra. Ensure docs and web have Nix packages that build to deployable artifacts. Deploy to ovh-usw-1 (direct) for NixOS backends. Add nix flake check validation for deployment outputs. All 6 cells must go green.","status":"open","priority":1,"issue_type":"task","owner":"me@cooperm.com","created_at":"2026-03-28T20:39:32Z","created_by":"Cooper Maruyama","updated_at":"2026-03-28T20:39:32Z","dependencies":[{"issue_id":"stackpanel-foe.4","depends_on_id":"stackpanel-foe","type":"parent-child","created_at":"2026-03-28T13:39:32Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-foe.4","depends_on_id":"stackpanel-foe.2","type":"blocks","created_at":"2026-03-28T13:47:41Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-foe.4","depends_on_id":"stackpanel-foe.3","type":"blocks","created_at":"2026-03-28T13:47:42Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":2,"dependent_count":2,"comment_count":0} @@ -44,7 +47,6 @@ {"id":"stackpanel-os2.5","title":"Add stackpanel provision --new and config round-trip machine authoring","description":"apps/stackpanel-go/cmd/cli/provision.go handles provisioning for machines that already exist in config, but the provisioning design also calls for a --new workflow that can author a minimal machine entry and preserve Nix path literals for hardwareConfig/diskLayout updates. Add that machine-authoring path so new-machine setup is not a manual edit-before-provision step.","design":"Reuse the repo's existing config-writing/serialization patterns instead of inventing a new config mutator; add tagged path handling if necessary to preserve Nix path types.","acceptance_criteria":"- stackpanel provision --new \u003cname\u003e --host \u003ctarget\u003e creates a minimal machine entry in the canonical Stackpanel config\n- hardwareConfig and diskLayout paths round-trip as Nix path literals instead of quoted absolute strings\n- The provision flow can update the new machine entry after generating hardware config\n- Add tests for config edit / serialization behavior","status":"closed","priority":2,"issue_type":"task","owner":"me@cooperm.com","created_at":"2026-03-28T15:02:37Z","created_by":"Cooper Maruyama","updated_at":"2026-03-28T20:19:21Z","closed_at":"2026-03-28T20:19:21Z","close_reason":"Dropped: manual config editing is acceptable, provision --new deferred indefinitely","external_ref":"https://linear.app/darkmatterlabs/issue/ENG-382","labels":["deployment"],"dependencies":[{"issue_id":"stackpanel-os2.5","depends_on_id":"stackpanel-os2","type":"parent-child","created_at":"2026-03-28T08:02:36Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-os2.5","depends_on_id":"stackpanel-os2.1","type":"blocks","created_at":"2026-03-28T08:02:40Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":1,"dependent_count":2,"comment_count":0} {"id":"stackpanel-os2.6","title":"Wire deploy/provision state into the Studio Deploy panel","description":"apps/web/src/components/studio/panels/deploy/deploy-panel.tsx is still Colmena-centric and does not appear to consume the CLI state tracked in .stack/state/deployments.json and .stack/state/machines.json. Update the Studio deploy experience so it reflects the same deploy/provision model and status that the CLI writes.","design":"Expose deploy/provision state through the agent/web API rather than teaching the browser to read local state files directly.","acceptance_criteria":"- The Deploy panel shows machine provisioning state and last deploy state from the supported agent/CLI APIs\n- Users can trigger deploy/provision actions from the panel with clear loading, success, and error states\n- Unsupported or partially configured backends degrade gracefully in the UI\n- Add frontend or integration coverage for the key panel states","status":"closed","priority":2,"issue_type":"task","owner":"me@cooperm.com","created_at":"2026-03-28T15:02:37Z","created_by":"Cooper Maruyama","updated_at":"2026-03-28T20:19:29Z","closed_at":"2026-03-28T20:19:29Z","close_reason":"Superseded by pluggable-deploy-backends restructure. Work absorbed into new phase-based tasks. See openspec/changes/pluggable-deploy-backends/","external_ref":"https://linear.app/darkmatterlabs/issue/ENG-383","labels":["deployment"],"dependencies":[{"issue_id":"stackpanel-os2.6","depends_on_id":"stackpanel-os2","type":"parent-child","created_at":"2026-03-28T08:02:37Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-os2.6","depends_on_id":"stackpanel-os2.3","type":"blocks","created_at":"2026-03-28T08:02:40Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-os2.6","depends_on_id":"stackpanel-os2.4","type":"blocks","created_at":"2026-03-28T08:02:41Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-os2.6","depends_on_id":"stackpanel-os2.5","type":"blocks","created_at":"2026-03-28T08:02:41Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":3,"dependent_count":1,"comment_count":0} {"id":"stackpanel-zhq","title":"Remove now-obsolete actions/cache@v4 of apps/{web,docs}/.alchemy from deploy workflows","description":"In the alchemy-effect → alchemy@2 migration (PR #16), all 5 deploy stacks switched from filesystem-based LocalState to Cloudflare-hosted state via Cloudflare.state(). The .alchemy/state/ directory is no longer used at deploy time.\n\nThe deploy workflows still cache it as a no-op:\n\n .github/workflows/deploy-web.yaml — Restore alchemy state (actions/cache@v4 on apps/web/.alchemy)\n .github/workflows/deploy-docs.yaml — Restore alchemy state (actions/cache@v4 on apps/docs/.alchemy)\n destroy job — actions/cache/restore@v4 of the same paths\n destroy job — Delete cached alchemy state (gh cache delete) cleanup\n\nPlus the explanatory comment block above each cache step describing the LocalState pattern is now misleading.\n\nCleanup:\n- Drop the cache@v4 + cache/restore@v4 steps from both workflows\n- Drop the gh cache delete cleanup step in the destroy jobs\n- Update or remove the now-misleading 'Persist alchemy's LocalState' comment blocks\n- Verify deploy still works without the cache (the Cloudflare state store is the new source of truth and is self-bootstrapping per Cloudflare.state())\n\nShould land after Cloudflare.state() is verified working in CI (depends on stackpanel-r7g / PR #16).","status":"open","priority":3,"issue_type":"chore","owner":"me@cooperm.com","created_at":"2026-04-29T09:17:13Z","created_by":"Cooper Maruyama","updated_at":"2026-04-29T09:17:13Z","dependencies":[{"issue_id":"stackpanel-zhq","depends_on_id":"stackpanel-r7g","type":"discovered-from","created_at":"2026-04-29T02:17:13Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0} -{"id":"stackpanel-49t","title":"Restore .open-next/cache asset overlay once alchemy@2 supports AssetsProps.sources","description":"In the alchemy-effect → alchemy@2 migration (PR #16), we deleted the vendored OpenNext asset overlay (vendor/alchemy-effect-opennext-overlay/, scripts/apply-alchemy-effect-opennext-assets.ts, root postinstall hook) because:\n\n1. It was tied to alchemy-effect@0.12.x's file structure and is incompatible with v2's restructured Worker.ts/Assets.ts\n2. The script's own self-disable path explicitly instructs maintainers to delete the hook when no alchemy-effect@0.12.x installs are found\n\napps/docs/alchemy.run.ts had its assets.sources field commented out with a TODO referencing this issue:\n\n assets: {\n directory: '.open-next/assets',\n // TODO(stackpanel): re-enable the .open-next/cache overlay once\n // alchemy@2 natively supports AssetsProps.sources …\n config: { … },\n }\n\nImpact: OpenNext incremental cache misses for cdn-cgi/_next_cache paths fall back to ISR revalidation. Cache hit-rate regression, not a hard breakage.\n\nscripts/ALCHEMY_EFFECT_OPENNEXT_UPSTREAM.md (also deleted) tracked the upstream PR for AssetsProps.sources support — verify whether it landed in alchemy@2's main branch and is just pending a release, or whether it needs to be re-pitched.\n\nResolution path:\n- Option A: wait for upstream alchemy to ship native AssetsProps.sources, then uncomment the field in apps/docs/alchemy.run.ts\n- Option B: re-vendor a v2-compatible overlay (risky — alchemy@2 has restructured Worker.ts/Assets.ts internals)\n- Option C: switch to a different OpenNext cache strategy that doesn't require the overlay\n\nAcceptance: docs deploy serves cdn-cgi/_next_cache assets from Workers Assets directly (Option A or B), or this issue is closed as won't-fix with a documented alternative.","status":"open","priority":3,"issue_type":"feature","owner":"me@cooperm.com","created_at":"2026-04-29T09:17:00Z","created_by":"Cooper Maruyama","updated_at":"2026-04-29T09:17:00Z","dependencies":[{"issue_id":"stackpanel-49t","depends_on_id":"stackpanel-r7g","type":"discovered-from","created_at":"2026-04-29T02:17:00Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":0,"dependent_count":0,"comment_count":0} {"id":"stackpanel-3vi","title":"Docs: module author guide + marketplace policies","description":"Docs that make it obvious how to build, test, price, and publish a module — plus the policies that keep the marketplace trustworthy.\n\n## Scope\n\n### Author guide (apps/docs/content/docs/modules/)\n- 'Build your first module' — scaffolding, module.nix structure, meta.nix fields, ui.nix if applicable\n- 'Test a module locally' — stackpanel link (local dev), running against sample .stack/config.nix\n- 'Package for publication' — tarball layout, signing, manifest requirements\n- 'Price and publish' — free vs paid tradeoffs, pricing UX tips\n- 'Get paid' — Polar Connect onboarding, tax docs, payout schedule\n- 'Versioning + updates' — semver discipline, deprecation policy\n\n### Policies\n- Acceptable use: no crypto miners, no telemetry without disclosure, no license keys hardcoded\n- Refund policy: 14-day no-questions-asked (author can opt into stricter)\n- Takedown policy: security issues → emergency delist within 24h\n- Revenue share + fee structure (the 15% sticker, transparent)\n- Intellectual property: developer retains ownership, grants distribution license","acceptance_criteria":"- Author guide builds with apps/docs\n- Policies are linked from dev portal's publish flow\n- Sample module repo referenced from the 'first module' page","status":"open","priority":3,"issue_type":"task","owner":"me@cooperm.com","created_at":"2026-04-24T03:45:46Z","created_by":"Cooper Maruyama","updated_at":"2026-04-24T03:45:46Z","dependencies":[{"issue_id":"stackpanel-3vi","depends_on_id":"stackpanel-02c","type":"blocks","created_at":"2026-04-23T20:46:16Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-3vi","depends_on_id":"stackpanel-c7t","type":"blocks","created_at":"2026-04-23T20:46:15Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-3vi","depends_on_id":"stackpanel-w3r","type":"blocks","created_at":"2026-04-23T20:46:17Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":3,"dependent_count":1,"comment_count":0} {"id":"stackpanel-l1q","title":"Module review workflow + automated Nix static analysis","description":"Prevent malicious or broken modules from reaching users. MVP manual, Phase 2 automated.\n\n## Scope\n\n### MVP: manual review\n- Admin tool (packages/api route + studio admin panel) showing pending listings\n- Reviewer sees: uploaded tarball contents, diff from previous version (if any), links to GitHub repo, automated scan results\n- Approve → listing goes live; Reject → listing status updated with reason visible to author\n- SLA target: 3 business days for initial review\n\n### Phase 2: automated scans\n- Static-analysis pass over module.nix + meta.nix:\n - Flag: import-from-derivation without explicit opt-in\n - Flag: builtins.fetchurl with non-allowlisted host\n - Flag: arbitrary path reads outside module dir\n - Flag: network calls during eval\n- Feed findings into review UI; author sees them pre-submit\n- Optionally: automatic 'verified pure' badge for modules with zero findings\n\n## Why not AI review\n\nPattern-match is more reliable for this than an LLM for the boring 'did they try to phone home during eval' checks. LLM review can come later for README/security claims.","acceptance_criteria":"- Reviewer can approve/reject pending listings\n- Rejected listings show reason to author with re-submit path\n- Static analysis surfaces known-bad patterns in a handful of test cases","status":"open","priority":3,"issue_type":"task","owner":"me@cooperm.com","created_at":"2026-04-24T03:45:37Z","created_by":"Cooper Maruyama","updated_at":"2026-04-24T03:45:37Z","dependencies":[{"issue_id":"stackpanel-l1q","depends_on_id":"stackpanel-c7t","type":"blocks","created_at":"2026-04-23T20:46:15Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":1,"dependent_count":1,"comment_count":0} {"id":"stackpanel-02c","title":"Developer payout: Polar Connect + KYC onboarding","description":"Pay developers their accrued balance via Polar Connect (Stripe Connect underneath), with KYC + tax form collection at onboarding.\n\n## Scope\n\n- Onboarding flow: first time creating a paid listing → prompt to connect Polar Connect account (redirect OAuth flow)\n- Collect tax info (W-9 US / W-8BEN international) via Polar's Connect UI\n- Payout job (scheduled): once per month, for each developer with balance \u003e= $50, trigger Polar payout; record payout_event(developer_id, amount_cents, polar_transfer_id, status)\n- Emails: onboarding done, first sale, monthly statement\n- Admin tool for manual payout holds (fraud, chargeback disputes)\n\n## Phase 1 fallback\n\nIf Polar Connect isn't ready: accumulate balances, issue manual Wise transfers quarterly while we collect via email. Works for ~20 developers, not for scale.","acceptance_criteria":"- Developer can connect payout account end-to-end\n- Monthly payout runs successfully against test Polar env\n- Balance decrements match transferred amount\n- Tax forms captured before first payout","status":"open","priority":3,"issue_type":"task","owner":"me@cooperm.com","created_at":"2026-04-24T03:45:28Z","created_by":"Cooper Maruyama","updated_at":"2026-04-24T03:45:28Z","dependencies":[{"issue_id":"stackpanel-02c","depends_on_id":"stackpanel-24e","type":"blocks","created_at":"2026-04-23T20:46:13Z","created_by":"Cooper Maruyama","metadata":"{}"},{"issue_id":"stackpanel-02c","depends_on_id":"stackpanel-c7t","type":"blocks","created_at":"2026-04-23T20:46:14Z","created_by":"Cooper Maruyama","metadata":"{}"}],"dependency_count":2,"dependent_count":2,"comment_count":0} diff --git a/.github/workflows/secrets-codegen-check.yml b/.github/workflows/secrets-codegen-check.yml new file mode 100644 index 00000000..c50f19e3 --- /dev/null +++ b/.github/workflows/secrets-codegen-check.yml @@ -0,0 +1,135 @@ +name: Secrets codegen drift check + +# Verifies that the SOPS-encrypted runtime payloads embedded in @gen/env are +# in sync with the source-of-truth SOPS YAMLs under .stack/secrets/vars/. +# +# Why this exists (see beads stackpanel-04d for full context): +# The runtime alchemy deploy reads the embedded payload at +# packages/gen/env/src/runtime/generated-payloads/_envs/.ts +# and the encrypted JSON at +# packages/gen/env/data/_envs/.sops.json +# NOT the source SOPS YAML directly. Those embedded files only get +# regenerated by `stackpanel codegen build env`, which runs as part of +# the devshell shell-hook. If a contributor edits a SOPS source (via +# `sops`, `chore: rekey`, `himitsu set`, etc.) and commits without +# re-entering the devshell, the embedded payload silently keeps shipping +# the old plaintext. This was the bug behind PR #15/#17: a Cloudflare +# API token rotation merged to main but the embedded payload still +# carried the under-scoped previous token, so every deploy 401'd. +# +# This workflow re-runs codegen in CI and fails if it produced any change +# under the embedded-payload tree. The remediation is printed inline so +# contributors don't need to dig through docs. +on: + pull_request: + types: [opened, reopened, synchronize] + paths: + # Source SOPS YAMLs (everything in .stack/secrets/ except generated/cached state) + - ".stack/secrets/vars/**" + - ".stack/secrets/apps/**" + - ".stack/secrets/**.yaml" + # Nix-side env declarations (apps..env / stackpanel.envs.) + - ".stack/config.nix" + - ".stack/data/**.nix" + - "nix/stackpanel/db/schemas/secrets**" + - "nix/stackpanel/lib/codegen/**" + - "nix/stackpanel/modules/env-codegen/**" + # The codegen implementation itself + - "apps/stackpanel-go/internal/codegen/**" + # The output tree (catches manual edits / accidental rollbacks) + - "packages/gen/env/data/**" + - "packages/gen/env/src/runtime/generated-payloads/**" + # The workflow file + - ".github/workflows/secrets-codegen-check.yml" + push: + branches: [main] + paths: + - ".stack/secrets/vars/**" + - ".stack/secrets/apps/**" + - ".stack/secrets/**.yaml" + - ".stack/config.nix" + - ".stack/data/**.nix" + - "nix/stackpanel/db/schemas/secrets**" + - "nix/stackpanel/lib/codegen/**" + - "nix/stackpanel/modules/env-codegen/**" + - "apps/stackpanel-go/internal/codegen/**" + - "packages/gen/env/data/**" + - "packages/gen/env/src/runtime/generated-payloads/**" + - ".github/workflows/secrets-codegen-check.yml" + workflow_dispatch: + +concurrency: + group: secrets-codegen-check-${{ github.ref }} + cancel-in-progress: true + +jobs: + verify: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Install Nix + uses: DeterminateSystems/nix-installer-action@main + with: + extra-conf: | + accept-flake-config = true + extra-substituters = https://devenv.cachix.org https://darkmatter.cachix.org + extra-trusted-public-keys = devenv.cachix.org-1:w1cLUi8dv3hnoSPGAuibQv+f9TZLr6cv/Hm9XgU50cw= darkmatter.cachix.org-1:7R5qAiOVHxDpFy7yguECfC1JqVDgMdckGc+CDKk2pWA= + + - name: Setup Cachix + uses: cachix/cachix-action@v16 + with: + name: darkmatter + authToken: ${{ secrets.CACHIX_AUTH_TOKEN }} + extraPullNames: devenv + + - name: Run codegen + env: + # Same key the deploy workflows use — it has decrypt access for every + # SOPS file under .stack/secrets/vars/ and packages/gen/env/data/. + # The codegen needs to *decrypt* the source YAMLs (to resolve env vars) + # and then re-*encrypt* the resolved plaintext back into the embedded + # payload, both of which require this key. + SOPS_AGE_KEY: ${{ secrets.SECRETS_AGE_KEY_DEV }} + run: | + set -euo pipefail + # Devshell entry runs the shell-hook which: + # 1. writes .stack/gen/codegen/env-manifest.json (the codegen input) + # 2. invokes `stackpanel codegen build` itself + # We re-invoke `stackpanel codegen build` explicitly afterwards as a + # belt-and-braces guarantee that the build was run with the same SOPS + # key the embedded payloads were encrypted with. + nix develop --impure --command bash -lc ' + set -euo pipefail + stackpanel codegen build + ' + + - name: Verify no drift + run: | + set -euo pipefail + # Limit the diff to the files that actually matter for runtime drift. + # We deliberately do NOT diff `packages/gen/env/src//...` because + # those are the typed env wrappers — codegen rewrites them on every + # devshell entry (timestamp/comment churn) and that churn is + # cosmetically noisy without affecting deploy behaviour. See + # stackpanel-04d for the rationale: the ONLY drift class that broke + # production was the embedded encrypted payload + its TS wrapper. + target_paths=( + packages/gen/env/data/_envs + packages/gen/env/src/runtime/generated-payloads/_envs + ) + + if git diff --quiet -- "${target_paths[@]}"; then + echo "OK: embedded SOPS payloads are in sync with source schemas." + exit 0 + fi + + echo "::error title=Embedded SOPS payloads are stale::Run \`nix develop --impure --command stackpanel codegen build\` and commit the resulting changes under packages/gen/env/." + echo + echo "===== drift detected in =====" >&2 + git diff --name-only -- "${target_paths[@]}" >&2 + echo + echo "===== diff (truncated to 200 lines) =====" >&2 + git diff -- "${target_paths[@]}" | head -200 >&2 + + exit 1