Skip to content

Add GitHub-hosted deploy fallback#682

Open
jaeyunha wants to merge 1 commit into
stagingfrom
issue-645-deploy-runner-spof
Open

Add GitHub-hosted deploy fallback#682
jaeyunha wants to merge 1 commit into
stagingfrom
issue-645-deploy-runner-spof

Conversation

@jaeyunha

@jaeyunha jaeyunha commented Jul 5, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds a manual github-hosted-oidc deploy path to the existing Deploy workflow while keeping the Mac mini as the push/default path.
  • Runs the existing deploy fallback preflight before the GitHub-hosted path mutates ECR/ECS.
  • Updates the deploy fallback runbook, static deployment coverage, and operational learning for issue [P2] Deploy pipeline single point of failure: Mac mini runner #645.

Verification

  • python3 YAML parse/assertions for .github/workflows/deploy.yml
  • bun run test tests/deploy.test.ts
  • make check
  • make test

Follow-up to fully close #645

Refs #645

@github-actions github-actions Bot added the size/M label Jul 5, 2026
@jaeyunha

jaeyunha commented Jul 5, 2026

Copy link
Copy Markdown
Member Author

Controller classification after worker handoff:

  • Current head bb4b1f08a65f63542443c7f4d454c463861abbb2 is non-draft, mergeStateStatus=CLEAN, and exact-head CI is green: Typecheck, lint, Vitest, migration parity, PHP/JVM SDK tests, and onboarding acceptance all succeeded.
  • Diff touches production deploy workflow/runbook coverage (.github/workflows/deploy.yml, deploy fallback runbook, tests/deploy.test.ts) and adds a GitHub-hosted OIDC deploy path capable of mutating ECR/ECS when manually dispatched.
  • Worker validation did not run a real no-op fallback deploy; the PR body correctly leaves that as the remaining [P2] Deploy pipeline single point of failure: Mac mini runner #645 evidence step after OIDC role/repo variables are configured.

Classification: human-risk-gate, not merge-now. This is delivered implementation, but deploy-path mutation plus missing real fallback exercise needs Jaeyun/operator review before staging merge. Formal GitHub review skipped because this controller account is the PR author; posting the same evidence as a normal comment.

@jaeyunha

jaeyunha commented Jul 5, 2026

Copy link
Copy Markdown
Member Author

Controller review after CI green on current head bb4b1f08a65f63542443c7f4d454c463861abbb2:

  • Checks: 7/7 required jobs passed (Typecheck, Lint, Unit tests, migration parity, onboarding, PHP SDK, JVM SDK).
  • Mergeability: PR is non-draft and mergeStateStatus=CLEAN.
  • Scope reviewed: .github/workflows/deploy.yml, deploy fallback runbook, deploy static tests, and issue [P2] Deploy pipeline single point of failure: Mac mini runner #645 learning note.
  • Deploy-safety notes: GitHub-hosted path is manual-only (workflow_dispatch + deploy_path=github-hosted-oidc), keeps Mac mini as push/default path, scopes id-token: write to the OIDC job, runs bun run deploy:fallback:preflight before scripts/deploy.sh all, and documents repo vars / no GetSecretValue expectation.

Verdict: implementation is CI-green and review-ready, but I am holding this as a human/infra merge gate because it changes the production deploy workflow and still needs the real no-op fallback deploy exercise recorded before issue #645 can be fully closed.

Note: authenticated GitHub user is the PR author, so I cannot leave a formal approving review from this account; recording the maintainer classification here as a normal PR comment instead.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb4b1f08a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +84 to +88
- name: Configure AWS credentials from OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.AWS_DEPLOY_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Set role-duration-seconds for long fallback deploys

This fallback job can run for 75 minutes, but aws-actions/configure-aws-credentials defaults assumed-role credentials to 1 hour (docs). If Docker builds, migrations, or ECS waits pass 60 minutes, later aws calls in scripts/deploy.sh all can fail with expired credentials after ECR/ECS mutations have already started; set role-duration-seconds and the IAM role max session to cover the job timeout.

Useful? React with 👍 / 👎.

Comment on lines +67 to +70
deploy-github-hosted-oidc:
name: Deploy via GitHub-hosted OIDC fallback
if: ${{ github.event_name == 'workflow_dispatch' && inputs.deploy_path == 'github-hosted-oidc' }}
runs-on: ubuntu-latest

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Let the fallback supersede a stuck Mac deploy

Because this new fallback job remains under the existing workflow-level concurrency: deploy-prod, it can be blocked when a push deploy is already waiting on the offline [self-hosted, opensend-deploy] runner; GitHub concurrency allows only one running workflow and one pending workflow per group by default (docs). In the exact Mac-runner-outage scenario this is meant to handle, the manual OIDC run may just sit pending unless an operator first cancels the stuck run, so the fallback needs an explicit safe supersede/cancel policy or that cancellation step should be codified.

Useful? React with 👍 / 👎.

@jaeyunha

jaeyunha commented Jul 5, 2026

Copy link
Copy Markdown
Member Author

Controller reclassification after fresh Codex review on current head bb4b1f08a65f63542443c7f4d454c463861abbb2:

  • Status: fix-needed, not merge-ready/human-risk-only anymore.
  • Fresh current-head P2 review blockers:
  • Source-of-truth still shows non-draft, mergeStateStatus=CLEAN, exact-head CI green, but green CI does not clear current-head P2 deploy/preflight safety comments.
  • Controller attempted to route the normal PR follow-up lane, but this host is missing the interactive GJC runtime (gjc: command not found), so I cannot validly launch/repair an OpenSend Clawhip/GJC fix lane from here. Next action is a valid GJC-equipped lane on branch issue-645-deploy-runner-spof to address only those two review comments, then push fresh head and wait for exact-head CI.

@jaeyunha

jaeyunha commented Jul 5, 2026

Copy link
Copy Markdown
Member Author

Controller refresh after current supervisor tick:

  • PR Add GitHub-hosted deploy fallback #682 is still fix-needed: head bb4b1f08a65f63542443c7f4d454c463861abbb2 remains non-draft, CLEAN, and exact-head CI is green, but the two unresolved current-head P2 review threads in .github/workflows/deploy.yml are still live:
  • Launch preflight update: gjc is now available on this host (/home/jaeyunha/.bun/bin/gjc, gjc/0.8.1), so runtime is no longer the blocker.
  • Fresh valid PR-fix lane spawn is still blocked by the lane-thread hard gate: the profile Discord helper returns success:false / DISCORD_BOT_TOKEN is missing from environment/profile .env, and OpenSend policy forbids degrading a fresh lane to the broad parent channel/origin thread.

Next safe action: restore Discord thread creation credentials, then launch the narrow opensend-pr-682-fallback-review-fix Clawhip/GJC lane on branch issue-645-deploy-runner-spof to address only those two review comments and wait for fresh exact-head CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant