Skip to content

feat: add self-upgrade subcommand (agent-init upgrade)#73

Open
Lillevang wants to merge 8 commits into
mainfrom
feat/self-upgrade
Open

feat: add self-upgrade subcommand (agent-init upgrade)#73
Lillevang wants to merge 8 commits into
mainfrom
feat/self-upgrade

Conversation

@Lillevang

Copy link
Copy Markdown
Owner

Why

A stale local binary looked like missing devcontainer features (it wasn't — just is installed in every Dockerfile and anthropic.claude-code is already in all four flavor devcontainers). The real gap is that there's no way to update the CLI. This adds an in-place self-upgrade.

What

agent-init upgrade — update the binary in place from the latest GitHub release.

  • --check — report whether a newer release exists; no download.
  • --dry-run — download and verify, but don't replace the binary.
  • --force — reinstall when already current, or upgrade a dev build (which has no release version to compare against).

Design decisions (confirmed up front):

  • Manual-only check — no per-invocation network call. Normal commands stay offline; you opt in by running upgrade (optionally --check).
  • SHA-256 verification before swap — the downloaded archive is checked against the release's checksums.txt; a mismatch aborts and leaves the existing binary untouched.

How

New internal/selfupdate package:

  • GitHub releases client (github.go) — reads GITHUB_TOKEN/GH_TOKEN to lift the anonymous rate limit; capped download size; overridable base URL for tests.
  • Engine (selfupdate.go) — semver compare, OS/arch asset selection matching the release-workflow names, checksum verification, tar.gz/zip extraction, and an atomic write-temp-then-rename swap with a move-aside fallback for platforms that can't rename over a running executable.
  • Install path resolved via os.Executable() (following symlinks); a non-writable install dir fails with a hint rather than attempting privilege escalation.

Wired into internal/cli/cli.go (dispatch + commands help table). Documented in docs/cli.md and the README.

Tests

  • internal/selfupdate: version compare, asset/binary naming, checksum verify (match / * prefix / mismatch / missing entry), archive extraction (tar.gz + zip), and full flows via an in-memory Source — upgrade, already-up-to-date, --force, --dry-run, checksum-mismatch-leaves-binary, no-asset-for-platform, missing-checksums — plus an httptest-backed GitHub client.
  • internal/cli: upgrade --help, dev-build refusal without --force, positional-arg rejection (all hermetic — no network).

Gate

./.agent/scripts/check.sh passes. The only advisory finding is a pre-existing Go 1.26.3 stdlib vuln (GO-2026-5037/5039) fixed by a 1.26.4 toolchain bump — surfaced more by the new net/http call path but not introduced by this change.

Follow-ups (not in this PR)

  • Bump the Go toolchain to 1.26.4 to clear the advisory vulns.
  • CLAUDE.md's "CLI surface" section still lists five subcommands; left untouched since it's under .agent/, but it could mention upgrade.

🤖 Generated with Claude Code

Add `agent-init upgrade` to update the binary in place from the latest
GitHub release. The check is manual-only — no per-invocation network
call. `--check` reports whether a newer version exists; `--dry-run`
downloads and verifies without replacing; `--force` reinstalls or
upgrades a dev build.

The new internal/selfupdate package fetches the latest release, selects
the asset matching the running OS/arch, verifies its SHA-256 against the
release checksums.txt before installing, and swaps the binary atomically
(write-temp-then-rename with a move-aside fallback for platforms that
can't rename over a running executable). The install path is resolved
via os.Executable (following symlinks); a non-writable install dir fails
with a hint rather than attempting privilege escalation.

Covers the package with unit tests (version compare, asset/checksum,
archive extraction, full upgrade/dry-run/mismatch/force flows via an
in-memory Source, and an httptest-backed GitHub client) plus CLI tests
for help, the dev-build refusal, and arg validation. Documents the
subcommand in docs/cli.md and the README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Lillevang

Copy link
Copy Markdown
Owner Author

Posting observations from a simplicity review pass for the record. Acknowledging up front that the missing linked issue is on me — I asked for this PR without filing one first, which is the soft gate CLAUDE.md describes. These are not blocking objections; treat them as a punch list to consider before merge or to convert into follow-up issues.

Scope vs. stated motivation

The body's stated goal is "there's no way to update the CLI." A minimum viable shape that delivers that is agent-init upgrade (no flags): fetch latest release → verify SHA-256 → atomic swap. Roughly 200 LoC. The PR ships ~960 LoC + 3 flags + a new internal package. Several pieces aren't load-bearing for the stated need:

  • --check and --dry-run aren't implied by the motivation. --check is a network call that prints whether one number is bigger than another; gh release view --repo Lillevang/agent-init already does that. --dry-run downloads + verifies and exits without installing — useful on release day for the maintainer, not the user. Consider shipping upgrade + upgrade --force only and letting future flag asks come from real user reports.

In-package semver reimplementation

internal/selfupdate/selfupdate.go:823-877 reimplements semver parsing/comparison (~55 LoC) with an 11-row TestCompareVersions table. golang.org/x/mod/semver.Compare already does this. The v prefix is what semver.Compare expects; the dev-is-older branch becomes a one-liner. Deleting parseSemver/compareVersions/cmpInt and that test table is a net subtraction.

Source seam — partial trim

The Source interface (selfupdate.go:533-540) has one production impl and one test fake; the seam pays for itself in test ergonomics. But GitHubSource.APIBaseURL/HTTPClient exist only because github_test.go drives them via httptest. Once the in-memory Source fake covers end-to-end paths, the httptest-backed tests in internal/selfupdate/github_test.go (3 tests, ~71 LoC) and those knobs can both go.

Download-size cap

maxDownloadBytes = 200 MiB (selfupdate.go:312) for release archives that are under 5 MB. The "generous ceiling" comment is honest about intent, but in practice it just means a streamed 199 MB blob still succeeds. 32 MiB (10× the largest plausible archive) is more honest.

Untested portability branch

replaceBinary has Windows rename-fallback logic (selfupdate.go:783-820) that no test forces. Either add a test that pre-creates target.old (or makes the temp-rename fail) to exercise the fallback path, or punt on Windows for now and exit with a "download from " message until someone reports a real Windows use case.

Docs drift into rationale

The docs/cli.md "Behavior" section for upgrade includes six bullets of implementation rationale (atomic-rename details, os.Executable() symlink resolution). Most of that is noise the user doesn't need — the code comments already carry it. Trim to: "downloads + verifies SHA-256 + swaps in place; fails closed on checksum mismatch; cannot upgrade through a root-owned directory without elevated access."

Stale CLI surface in .agent/CLAUDE.md

.agent/CLAUDE.md still lists five subcommands; this PR adds upgrade as the sixth (and #74 is concurrently adding status as a seventh). The "left untouched since it's under .agent/" reasoning is the wrong reason — .agent/CLAUDE.md is the canonical source and feature work that ships a new subcommand is itself the authorization to update it. Worth folding into this PR.

Latent unrelated finding

gen-codemap.sh doesn't honor .gitignore — the locally-built agent-init binary leaks into .agent/CODEBASE.md:18 despite being gitignored (.gitignore:28). Separate issue territory; not a blocker here.

Test surface (positive)

The end-to-end upgrade tests (replaces / already-up-to-date / force / dry-run / mismatch-leaves-binary / no-asset / missing-checksums) are exactly right coverage for a binary-swap path. Not bloat.

@Lillevang

Copy link
Copy Markdown
Owner Author

Pre-merge checklist for this PR

A review (see retroactive spec #76) found this feature complete, well-tested, and appropriately simple. Two items must be resolved before merge, listed here so a follow-up agent can pick them up:

1. Reconcile .agent/CODEBASE.md drift. The regenerated codemap in this PR removes three internal/scaffold/color_test.go test entries — TestColorDisabledForNonTTYOutputs, TestColorDisabledByEnvironment, TestColorEnabledForTerminalFile — with no corresponding source change in the diff. Either those tests were deleted elsewhere and the codemap is catching up, or the generator drifted. Regenerate cleanly (or confirm the deletion is intentional) so the codemap stays trustworthy. This is unrelated to the self-upgrade feature.

2. Confirm the CI gate runs green on the PR head. ./.agent/scripts/check.sh reportedly passes locally with one pre-existing advisory (Go 1.26.x stdlib vuln, GO-2026-5037/5039, surfaced by the new net/http path). That advisory is tracked separately in #77 and does not need to block this merge. Just confirm the checks pass on the PR.

Optional loose end (non-blocking): the move-aside fallback in replaceBinary can leave a stale agent-init.old if the final cleanup os.Remove fails (error is ignored). Reasonable cross-platform tradeoff; flagging only.

Follow-ups split out of this work (do not block merge)

The two design decisions made here (manual-only / no auto-update; in-place binary replacement) are recorded in the project decision log.

@Lillevang

Copy link
Copy Markdown
Owner Author

LOWinternal/selfupdate/selfupdate.go:258 (extractFromTarGz) and :280 (extractFromZip): the archive download is capped at 200 MiB (maxDownloadBytes), but the uncompressed per-entry read is unbounded (io.ReadAll(tr) / io.ReadAll(rc)). A 200 MiB archive of compressible bytes can decompress to many GBs — a release-pipeline-compromise scenario could OOM/disk-DoS the upgrading user, on top of whatever the malicious binary itself would do. Trust root (verified checksum from same release) bounds exploitability; this is defense-in-depth.

Fix: replace io.ReadAll(tr) / io.ReadAll(rc) with io.ReadAll(io.LimitReader(tr, maxBinaryBytes)) (and the same for zip), where maxBinaryBytes is a generous ceiling (e.g. 150 MiB — the current binary is tens of MB). Surface an explicit error if the limit is hit instead of silently truncating.

Lillevang and others added 7 commits June 21, 2026 12:15
The archive download was capped at 200 MiB, but the per-entry read of the
tar.gz/zip payload was unbounded. A release-pipeline-compromise (gzip/zip
bomb) could decompress to many GB and OOM the upgrading user. Trust root
(verified SHA-256 checksum from the same release) bounds exploitability;
this is defense-in-depth.

Wraps the per-entry io.ReadAll with io.LimitReader against a 64 MiB
ceiling (~10x the shipped binary). Truncation is treated as an explicit
error rather than a silent partial extract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ge impl

Replaces the hand-rolled parseSemver/compareVersions/cmpInt (and the
11-row table that tested them) with golang.org/x/mod/semver.Compare,
which already implements the full semver ordering and is the standard
Go module for this. A small normalizeVersion wrapper preserves the
existing caller contract (leading v optional, missing minor/patch
fill to .0, "dev" treated as older than any release per semver's
invalid-less-than-valid rule).

Net subtraction of ~55 LoC. Trims the table test to the cases that
exercise our normalization seam — semver-internal ordering is
upstream's job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Release archives are single-digit MiB; the previous 200 MiB ceiling
let a misbehaving or hostile endpoint stream up to 199 MiB before
the cap fired. 32 MiB is ~10x the largest plausible archive and a
more honest ceiling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous six bullets carried implementation rationale (atomic
rename details, os.Executable symlink resolution) that already lives
as code comments and is noise from a user's perspective. Cut to the
facts a user needs: network call to GitHub, SHA-256 verification with
fail-closed semantics, write permission required on the install path,
dev-build needs --force.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.agent/AGENTS.md still listed five subcommands; this PR adds upgrade
as the sixth. The canonical agent-facing CLI reference has to track
the binary's actual surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fallback that handles the Windows "can't rename over a running
executable" case had no test driving it. Force the primary
os.Rename(file, dir) to fail with EISDIR on POSIX by pre-creating
target as a directory; the fallback then renames it aside and installs
the new binary. Asserts the final binary contents match the new
release, which the previous tests only confirmed on the happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up go.sum (added by the semver dep) and the file-count tick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Lillevang

Copy link
Copy Markdown
Owner Author

Review-comment follow-ups landed on this branch:

  1. MUST — security finding (gzip/zip-bomb). Per-entry archive read is now io.LimitReader-wrapped against a 64 MiB ceiling with an explicit error on overflow; test in TestExtractBinaryRejectsOversizedEntry. Addressed in 553926e.
  2. MUST — stale .agent/CLAUDE.md subcommand list. Added upgrade to the canonical list at .agent/AGENTS.md (the file .agent/CLAUDE.md symlinks to). Addressed in 39848e1.
  3. SHOULD — in-package semver. Replaced parseSemver/compareVersions/cmpInt with golang.org/x/mod/semver.Compare via a small normalizeVersion wrapper (preserves the v-optional / missing-component / dev semantics). Trimmed the table test to the cases that exercise our seam. Addressed in bb8e28e.
  4. SHOULD — 200 MiB download cap. Lowered to 32 MiB. Addressed in 95a27e9.
  5. SHOULD — docs/cli.md upgrade Behavior trim. Cut six implementation-rationale bullets to a tight user-facing four. Addressed in a2e77a0.
  6. CONSIDER — drop --check / --dry-run. Deferred — surfacing to you as a CLI-surface judgment call. Both flags have non-zero utility (--check for a quick "is there a release?" without download; --dry-run for verifying a release without swapping the binary), but neither is required by the stated motivation. Happy to cut in a follow-up if you'd rather.
  7. CONSIDER — Windows rename-fallback test. Added TestUpgradeReplaceBinaryFallback: forces the primary os.Rename(file, dir) to fail with EISDIR on POSIX by pre-creating target as a directory, exercising the fallback path. Addressed in 07448ca.
  8. CONSIDER — httptest-backed github_test.go trim. Deferred — same kind of CLI-surface judgment as (6). The three tests are small and exercise real HTTP plumbing the in-memory Source fake can't.

Plus 4e1def4 picks up the regenerated codemap (adds go.sum, drops the gitignored agent-init binary leak from a separate codemap-bug issue).

Out of scope per the original PR / your guidance: missing linked issue, stdlib vulncheck advisory (#77), gen-codemap.sh .gitignore leak.

./.agent/scripts/check.sh is green (vulncheck advisory only).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant