Skip to content

feat: add repo source browser API#581

Merged
mariusvniekerk merged 24 commits into
mainfrom
repo-browser-api
Jun 25, 2026
Merged

feat: add repo source browser API#581
mariusvniekerk merged 24 commits into
mainfrom
repo-browser-api

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

The UI needs a repo-code API that preserves provider identity all the way to clone and cache selection; owner/name route placeholders are not enough for nested repos, self-hosted hosts, or default-host routing. This PR adds the read-only backend foundation so later UI branches can depend on generated, bounded contracts instead of inventing filesystem access in the frontend.

The API keeps risky behavior out of request hot paths: reads come from middleman-owned local clones, refs resolve with explicit stale-token metadata, large/binary/unsupported assets become typed states, and tag pruning is left to separate maintenance rather than source-browser refreshes.

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (26b31c0)

High-risk issue found: the repo browser asset endpoint can serve untrusted repository content as same-origin executable code.

High

  • internal/server/repo_browser.go:265
    The /browser/asset endpoint serves arbitrary repo-controlled blobs using executable content types such as text/html, application/javascript, and image/svg+xml. Malicious repository content opened through this route can execute script in the middleman origin and call local console APIs, including mutation endpoints.
    Fix: Restrict raw assets to inert image types, or serve unsafe types as downloads/application/octet-stream with appropriate headers. Add tests for HTML/SVG/script rejection or forced download behavior.

Medium

  • internal/server/repo_browser.go:388
    Browser routes ignore {owner} and {name} path parameters and only look up repositories by repo_path, so canonical routes like /api/v1/repo/github/acme/widgets/browser/refs fail unless clients also pass the optional repo_path query.
    Fix: Pass owner/name into clone lookup and fall back to provider-route lookup when repo_path is empty. Add a server test without repo_path.

  • internal/gitclone/repo_browser.go:511
    Branch/tag browser requests ignore the supplied ref_sha and resolve the current remote ref after each fetch. The stale flag is discarded, so clients can request an old SHA but receive newer branch-tip content while the response echoes the old ref.
    Fix: Pin reads to ref.SHA when provided or reject stale branch/tag refs, and return the actual resolved ref SHA.


Panel: ci_default_security | Synthesis: codex, 12s | Members: codex_default (codex/default, done, 5m55s), codex_security (codex/security, done, 3m8s) | Total: 9m15s

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (8c79567)

Summary verdict: changes need follow-up on medium-risk performance and API contract issues; no exploitable security issue was reported.

Medium

  • internal/gitclone/clone.go:328
    --prune-tags now runs on every shared clone fetch, so normal sync, diff, and workspace refreshes fetch and prune the full tag namespace even when the repo browser is not used. This can slow tag-heavy repositories and may fail on rewritten remote tags because tag updates are not forced.
    Fix: Keep the general fetch path branch/PR-focused, and add a repo-browser-specific tag refresh before listing refs, using an explicit forced tag refspec if moved tags must be reflected.

  • internal/gitclone/repo_browser.go:158 and internal/gitclone/repo_browser.go:364
    The tree and last-changed caps do not actually bound Git work or memory use. ls-tree -r is read fully before truncating, and RepoBrowserLastChanged runs an unbounded git log --name-only, so large repos or old/missing paths can still produce huge outputs.
    Fix: Stream and stop tree reads after RepoBrowserTreeEntryLimit+1, and bound or stream/cancel the last-changed history walk instead of collecting the full log output.

  • frontend/openapi/openapi.yaml:11149
    The asset byte endpoint is documented as an application/json string, while the handler serves raw image bytes with an image content type. Generated TypeScript and Go clients therefore model the 200 response as JSON text instead of binary data.
    Fix: Adjust the Huma output/OpenAPI declaration to describe a binary response content type and regenerate the API artifacts.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 7m36s), codex_security (codex/security, done, 7m38s) | Total: 15m24s

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (2a26169)

Medium-risk issues remain in the repo browser implementation.

Medium

  • internal/gitclone/repo_browser.go:469: User-supplied paths are passed to Git as pathspecs rather than literals. Inputs such as :(glob)** can pass cleanRepoBrowserPath and make history or commit-scope checks operate on a glob instead of the selected file.

    • Fix: Use literal pathspec handling for all user paths, such as --literal-pathspecs or a safe :(literal) prefix, and add regression tests for pathspec-magic inputs.
  • internal/gitclone/repo_browser.go:98: /browser/refs reads and returns every branch/tag with no cap, unlike bounded tree/blob/history reads. Repositories with very large ref sets can produce unbounded stdout, memory use, and response size.

    • Fix: Add a ref limit with truncation metadata or stream parsing with an over-limit error, plus coverage for the cap.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 5m38s), codex_security (codex/security, done, 3m45s) | Total: 9m30s

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (f678969)

Medium issue found; security review found no additional Medium-or-higher findings.

Medium

  • internal/gitclone/repo_browser.go:585 - RepoBrowserCommit.Body is never populated even though it is part of the API schema. The git format only captures %s, and parseRepoBrowserCommitLine never assigns Body, so commits with multi-line descriptions return an empty body from history and commit-detail endpoints.
    • Suggested fix: Use a commit format with an unambiguous record delimiter that includes %b or %B, parse it safely, and add coverage for a commit with a body.

Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m7s), codex_security (codex/security, done, 4m7s) | Total: 9m22s

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (7eb7196)

Summary verdict: PR needs changes for two medium-severity repo browser correctness issues; no medium-or-higher security findings were reported.

Medium

  • internal/gitclone/repo_browser.go:705: Repo browser clone identity still resolves to Host/Owner/Name, ignoring both provider and RepoPath. Supported DB identity allows the same host/repo path under different providers, so those repos can share a bare clone and return source from the wrong configured repository.
    Fix: Add provider-aware repo-browser clone identity keyed by (provider, platform_host, repo_path), use it for clone paths and singleflight keys, and add a server/API test with two provider rows that read distinct repo contents.

  • internal/gitclone/repo_browser.go:465: RepoBrowserLastChanged parses line-oriented git log --name-only output with commit: as an in-band marker. A valid tracked path like commit:notes.md is treated as a commit marker and causes parsing to fail instead of returning metadata.
    Fix: Use an unambiguous NUL-delimited log format, for example git log -z --name-only with NUL-separated commit records and paths, and cover marker-like/newline path names.


Panel: ci_default_security | Synthesis: codex, 10s | Members: codex_default (codex/default, done, 5m9s), codex_security (codex/security, done, 7m21s) | Total: 12m40s

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (c0e55e2)

Summary verdict: 2 medium findings remain; no high or critical issues were reported.

Medium

  • internal/gitclone/repo_browser.go:684
    Branch and tag names are passed into rev-parse as revision expressions, so Git suffixes like ~1, ^, or ^{} in ref_name can resolve commits that are not literal branch/tag refs.
    Fix: Validate branch/tag names as literal ref names and resolve them with exact-ref APIs such as show-ref --verify or for-each-ref; only pass resolved object IDs into later reads.

  • internal/server/repo_browser.go:432
    Repo browser clone identity uses only (platform_host, owner, name), omitting provider and repo_path, so two configured providers with the same host/repo path can share a clone and serve or fetch the wrong repository contents.
    Fix: Carry provider into RepoBrowserRepoRef and use a repo-browser clone namespace/key derived from (provider, platform_host, repo_path) for ensure, path lookup, and singleflight.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 5m20s), codex_security (codex/security, done, 6m54s) | Total: 12m22s

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (8c7f768)

Medium-risk issues remain in the repo browser API; no Critical or High findings were reported.

Medium

  • internal/gitclone/repo_browser.go:668 - Repo-browser commit timestamps preserve the Git author timezone and are serialized directly, so authored_at can be emitted with non-UTC offsets despite the project’s UTC API boundary convention. Convert to authoredAt.UTC() before returning API data, and cover a non-UTC author date in tests.

  • internal/gitclone/repo_browser.go:468 - The last-changed parser treats any output line beginning with commit: as a commit marker, so a valid repo file named like commit:notes.md can make the batch fail with a parse error or wrong metadata. Use an unambiguous NUL-delimited git log -z format or another marker scheme that cannot collide with path names.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 7m35s), codex_security (codex/security, done, 5m33s) | Total: 13m16s

@roborev-ci

roborev-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown

roborev: Combined Review (262f01f)

Summary verdict: changes need fixes for three medium-severity correctness issues; no high or critical findings reported.

Medium

  • internal/gitclone/clone.go:328
    git fetch --prune --tags origin can reject a moved remote tag with a clobber error, causing the shared clone refresh to fail and blocking normal branch/pull updates for that repo.
    Fix: Fetch tags separately with an explicit non-pruning force tag refspec, or otherwise tolerate tag update rejection without failing the branch/pull fetch; add coverage for retagged remotes.

  • internal/server/repo_browser.go:274
    The raw asset endpoint accepts branch/tag refs and serves cacheable bytes even though the response contains no resolved-ref metadata, so mutable ref URLs can render stale or wrong assets after a branch/tag moves.
    Fix: Require ref_type=commit with a full SHA for asset bytes, return a validation error such as mutable_ref_not_allowed for mutable refs, and generate byte URLs from asset metadata using the resolved commit SHA.

  • internal/gitclone/repo_browser.go:539
    File history never verifies that the requested path exists at the selected commit; git log -- path succeeds for absent or deleted paths, so the API can return 200 with empty or old history instead of a missing-path problem.
    Fix: Check the selected tree for the path before returning history, return a typed missing-path error, and cover the behavior with an API test.


Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 7m29s), codex_security (codex/security, done, 5m41s) | Total: 13m19s

@mariusvniekerk

Copy link
Copy Markdown
Collaborator Author

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (0c79ef1)

Medium concern found: tag fetching appears incorrectly scoped to the shared clone sync path; security review found no issues.

Medium

  • internal/gitclone/clone.go:333-341 — The new force-fetch of all tags (fetch origin +refs/tags/*:refs/tags/*) was added to the shared fetch() helper, which is used by ensureCloneNowInNamespace and cloneBare for all clone/sync paths, not just repo-browser clones. This adds an extra network round trip per repo per sync cycle and introduces a new hard-failure point after branch/pull-ref fetches already succeeded. Scope this tag fetch to repo-browser clone refreshes, or document and accept the broader sync cost/failure coupling if intentional.

Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (claude-code/default, done, 24m50s), codex_security (claude-code/security, done, 15m38s) | Total: 40m37s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (a2e11eb)

Medium-risk issues remain around tag refresh behavior in gitclone; no security vulnerabilities were identified.

Medium

  • internal/gitclone/clone.go:328 - Adding --no-tags to the shared clone fetch means newly-created provider tags/releases are not fetched into the clone used by CommitTimelineSinceTag, so repo overview commit counts/timelines can fail or stay stale until a fresh clone is made. Fetch the selected release/tag refs explicitly before building the overview timeline, or compute the timeline from the provider-supplied tag SHA instead of relying on locally refreshed tag names.

  • internal/gitclone/repo_browser.go:737 - EnsureRepoBrowserClone runs the tag fetch outside the existing clone singleflight, so concurrent browser requests for the same repo can launch overlapping git fetch operations against the same bare clone and fail on ref locks; asset/blob reads also pay this tag-refresh cost even when they only use a commit SHA. Deduplicate the branch+tag refresh as one repo-browser ensure operation, and avoid tag refresh for endpoints that do not need tag resolution.


Panel: ci_default_security | Synthesis: codex, 13s | Members: codex_default (codex/default, done, 5m9s), codex_security (codex/security, done, 8m51s) | Total: 14m13s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (85f2874)

Summary verdict: one Medium issue should be addressed before merge; no Critical or High findings.

Medium

  • internal/gitclone/repo_browser.go:802: Existing repo-browser clones are treated as ready without fetching. Since the refresh registry is only in memory, the first browser request after a server restart can serve stale branch/tag refs and file contents from the on-disk clone until the next refresh tick.
    • Fix: Refresh existing repo-browser clones before serving reads, or persist/seed the refresh registry so startup refreshes existing clones before they are used.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 8m46s), codex_security (codex/security, done, 5m19s) | Total: 14m12s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (374950b)

Summary verdict: Changes have medium-risk correctness issues in tag freshness and commit scope validation; no security issues were found.

Medium

  • internal/gitclone/clone.go:337
    Shared clone fetches now use --no-tags, but repo overview timelines still resolve provider release/tag names through the local clone. New or moved release tags after the initial clone will be missing or stale, causing commits_since_release and commit timelines to fail or use old tag targets.
    Fix: Fetch only the bounded release/tag refs needed for overview calculation, or pass resolved provider tag SHAs/targets into the timeline code instead of relying on local tag refs.

  • internal/gitclone/repo_browser.go:608
    Commit detail scope checks only the first RepoBrowserHistoryLimit commits for a path, so valid commits older than the latest 50 file changes are reported as commit_out_of_scope. This can reject commits that the last-changed fallback can legitimately return.
    Fix: Replace the page-limited log scan with an exact check that the commit is reachable from the selected root and touches the requested path.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 9m54s), codex_security (codex/security, done, 6m51s) | Total: 16m53s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (9041097)

Summary verdict: one medium lifecycle issue remains; no high or critical findings were reported.

Medium

  • internal/gitclone/repo_browser.go:752: Repo browser refresh detaches clone/fetch work from the caller with context.WithoutCancel(ctx). The scheduled refresh loop passes the server background context here, so shutdown cancellation can make RefreshRepoBrowserClone return while the singleflight worker and git subprocess continue running for up to 15 minutes, after the server believes background work has drained.
    • Fix: Keep scheduled refresh work tied to the server context, or explicitly track and wait for the detached singleflight worker before background refresh/shutdown returns.

Panel: ci_default_security | Synthesis: codex, 22s | Members: codex_default (codex/default, done, 9m48s), codex_security (codex/security, done, 3m57s) | Total: 14m7s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (a1da6ef)

Repo browser changes have one medium issue to address before merge.

Medium

  • internal/gitclone/repo_browser.go:606RepoBrowserCommitDetail accepts any 40-character hex SHA, then passes it to git merge-base --is-ancestor. If the SHA is missing or not a commit object, Git returns an operational error that is surfaced through repoBrowserProblem as a 500 instead of a typed not_found client error.

    Suggested fix: Resolve the requested commit with the existing commit resolver before the ancestry check, or normalize missing/invalid commit errors from the ancestry check to ErrNotFound. Add an HTTP test for an unknown full-length SHA.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 7m14s), codex_security (codex/security, done, 8m20s) | Total: 15m42s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (756c000)

No Medium, High, or Critical findings were reported.

All reported findings were Low severity and omitted per instructions.


Panel: ci_default_security | Synthesis: codex, 4s | Members: codex_default (codex/default, done, 13m26s), codex_security (codex/security, done, 4m57s) | Total: 18m27s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (3e03f54)

Medium issue found; security review reported no additional findings.

Medium

  • Location: File and line number not provided
  • Problem: RepoBrowserCommit.Body is exposed in the API schema, but the git format only emits SHA, author, date, and subject. As a result, commit bodies are always returned as "", including from the commit detail endpoint.
  • Fix: Include the commit body in the git format and parser for detail responses, or remove the field from the API type if commit bodies are intentionally unsupported.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m51s), codex_security (codex/security, done, 6m15s) | Total: 13m13s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (33be681)

Medium issue found; no Critical or High findings.

Medium

  • internal/gitclone/repo_browser.go:787 - Scheduled repo-browser refreshes pass a cancellable context, but EnsureCloneInNamespace detaches its git work with context.WithoutCancel, so shutdown can return while the underlying clone/fetch subprocess continues for up to ensureCloneTimeout.

    Suggested fix: Add a non-detached ensure/refresh path for scheduled refreshes, or parameterize EnsureCloneInNamespace so background refresh can respect shutdown cancellation end to end.


Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 10m4s), codex_security (codex/security, done, 5m39s) | Total: 15m49s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (5ded08d)

Medium confidence with one actionable Medium finding; no Critical or High issues reported.

Medium

  • internal/gitclone/repo_browser.go:784 - Shared repo-browser refresh work is created with the first caller’s ctx. If that caller is an HTTP request that disconnects, cancellation aborts the singleflight operation for every concurrent waiter, including scheduled refreshes that joined the same key.
    • Fix: Avoid using a request-cancelable context for shared singleflight work. Detach request-triggered refresh work with a bounded timeout, or plumb a server shutdown context separately from per-request cancellation.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 9m34s), codex_security (codex/security, done, 8m48s) | Total: 18m29s

@roborev-ci

roborev-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

roborev: Combined Review (e4c044c)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 12m53s), codex_security (codex/security, done, 6m7s) | Total: 19m0s

Base automatically changed from repo-file-browser to main June 25, 2026 15:49
mariusvniekerk and others added 24 commits June 25, 2026 11:49
Maintainers need the repo browser UI to read files from the local clone through provider-aware repository identity, including nested repo paths and non-default hosts. This creates the backend contract first so the later UI branches can reuse generated clients and shared route helpers instead of inventing ad hoc fetch paths.

The API intentionally stays read-only and stateless: refs resolve from the refreshed clone, tree and history work is bounded, blob responses preserve binary and too-large states, and raw assets are served with explicit content headers for markdown previews.

Validation: make test-short; go test ./internal/gitclone ./internal/server -run 'TestRepoBrowser' -shuffle=on; go test ./internal/github -run 'TestSyncMRDiffPreservesCloneContextCancellation' -shuffle=on; node node_modules/vite-plus/bin/vp lint packages/ui/src/api/provider-routes.ts --no-error-on-unmatched-pattern --threads=1; git diff --check.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Resolve mutable browser refs once per request and return stale metadata instead of echoing the requested SHA. Route repo browser requests through canonical owner/name identity when repo_path is absent, and restrict raw asset bytes to inert image media types so repository HTML, SVG, and scripts are not served as same-origin executable content.
Commit detail reads now prove the requested SHA appears in the selected path history at the resolved ref before returning details. This prevents arbitrary clone commits from rendering in a file-history detail view and documents resolved ref metadata in the generated API schema.
The repo browser API should not make shared clone fetches prune tags or let large repositories force unbounded tree and history reads from request handlers. This keeps the hot path bounded while preserving the middleman-owned clone state outside explicit maintenance work.

Asset responses are raw image bytes, so the OpenAPI contract and generated clients need to model them as binary image content instead of JSON strings.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Repo browser ref enumeration and file history queries run on request paths, so they need bounded output and literal handling for caller-selected file names. This keeps large ref sets from bloating the refs response and prevents Git pathspec magic in file names from widening history or commit-detail scope.

The hot clone fetch path still avoids tag pruning; deleted-tag cleanup remains outside this request-time flow.

Validation: go test ./internal/gitclone -run 'TestRepoBrowser' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser|TestDocsBlobOpenAPIResponseIsBinary' -shuffle=on; go test ./internal/apiclient/generated -shuffle=on; node node_modules/vite-plus/bin/vp run ui-package-check; git diff --check.\n\nGenerated with Codex\nCo-authored-by: Codex <codex@openai.com>
Repo browser metadata needs to stay bounded without becoming misleading. Ref enumeration now excludes non-display refs before applying the cap, last-changed falls back per missing path after the bounded batch scan, and tag refresh uses non-pruning tag fetch semantics so new release tags appear without deleting cached tags during request-time refresh.\n\nThe plan text now documents literal pathspec handling and partial-result semantics so future UI work does not infer stronger guarantees than the API provides.\n\nValidation: go test ./internal/gitclone -run 'TestRepoBrowser' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser|TestDocsBlobOpenAPIResponseIsBinary' -shuffle=on; go test ./internal/github -run 'TestSyncMRDiffPreservesCloneContextCancellation' -shuffle=on; go test ./internal/gitclone -run 'TestRepoBrowserLastChangedFallsBackPastBatchLogLimit|TestRepoBrowserFetchDoesNotPruneTagsOnHotPath|TestRepoBrowserHistoryTreatsPathspecMagicAsLiteral' -shuffle=on; git diff --check.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The repo browser plan needs to state where expensive or stale metadata behavior is intentionally bounded. This documents the last-changed fallback process cap, the remaining deep-history cost tradeoff, visible-row caller expectation, and explicit ownership for deleted-tag cleanup outside hot fetch paths.\n\nValidation: git diff --check.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
The last-changed fallback changes the API-observable result for files older than the bounded batch scan. Covering it through the server route keeps clone fetch, ref resolution, repeated path query parsing, and JSON response shape tied to the contract.\n\nValidation: go test ./internal/server -run 'TestRepoBrowserLastChangedFallsBackPastBatchLogLimit|TestRepoBrowserTreeAssetLastChangedAndHistory' -shuffle=on; go test ./internal/gitclone -run 'TestRepoBrowserLastChangedFallsBackPastBatchLogLimit' -shuffle=on; git diff --check.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Repo browser reads must treat branch and tag names as exact refs, not revision expressions supplied by the caller. The clone cache also needs provider and repo_path in its namespace so repositories sharing host/owner/name cannot leak content across provider identities.

Validation: go test ./internal/gitclone -run 'TestRepoBrowser' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser' -shuffle=on; go test ./internal/ptyowner -run TestOwnerQuickExitRemainsAttachable -short -shuffle=on; git diff --check.\n\nGenerated with Codex\nCo-authored-by: Codex <codex@openai.com>
Repo browser API metadata is serialized across an API boundary, so commit author timestamps need to be normalized to UTC. Last-changed batch parsing also cannot use a textual commit marker that can collide with valid repository paths.

The batch parser now consumes NUL-delimited git output and the regression test covers a commit:prefixed path plus a non-UTC author date.

Validation: go test ./internal/gitclone -run 'TestRepoBrowser' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser' -shuffle=on; git diff --check.\n\nGenerated with Codex\nCo-authored-by: Codex <codex@openai.com>
Repo browser reads are served from mutable local clones, so request-time refresh and raw asset endpoints need stricter invariants than a normal file browser route. Moved remote tags should not poison the shared clone refresh, raw bytes should only be served for immutable commit refs, and file history should fail when the selected tree does not contain the requested file.

This keeps the hot path from pruning tags while still tolerating retags, and makes the API return explicit errors instead of cacheable bytes or misleading empty history for mutable or missing inputs.

Validation: go test ./internal/gitclone -run 'TestRepoBrowser' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser' -shuffle=on; go test ./internal/github -run 'TestSyncMRDiffPreservesCloneContextCancellation' -shuffle=on; go test -tags integration ./internal/gitclone -run 'TestEnsureCloneToleratesMovedRemoteTags' -shuffle=on; git diff --check.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Repo browser API hardening left a few contract edges that reviewers could still trip over: last-changed parsing needed an unambiguous commit-record delimiter, exact ref lookup needed to distinguish user ref misses from operational git errors, and the raw asset endpoint needed its immutable-ref requirement documented in the generated API contract.

This also records the intended deleted-tag behavior: hot fetches may force-update moved tags, but they do not prune tags from middleman-owned clones. Stale tag cleanup belongs in explicit cache maintenance, not normal sync, diff, or repo-browser refresh paths.

Validation: make api-generate; go test ./internal/gitclone -run 'TestRepoBrowser|TestEnsureCloneToleratesMovedRemoteTags' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser' -shuffle=on; git diff --check.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Shared clone refreshes run on normal sync and diff paths, so force-fetching every tag there couples hot-path work to remote tag namespace size and moved-tag failures. That is broader than the repo-browser requirement.

Fetch shared clones with --no-tags and refresh tags only for the repo-browser namespaced clone. Repo-browser refs still see moved tags, while middleman-owned sync/diff clones avoid tag refresh as part of ordinary fetch.

Validation: go test ./internal/github -run TestSyncMRDiffPreservesCloneContextCancellation -shuffle=on; go test ./internal/gitclone -shuffle=on; go test ./internal/server -run RepoBrowser -shuffle=on.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
Repo-browser requests were still paying for network fetch work whenever an existing local clone was opened. That made tag refresh part of a UI hot path and kept the same risk profile as the earlier shared-clone tag fetch problem.

Keep request-path ensure local-only for existing repo-browser clones, register opened repos, and let the server background loop refresh registered clones on the normal sync interval. The explicit refresh path still force-updates tags without pruning deleted tags, so moved tags update while stale deleted tags remain outside the hot path.

Validation: go test ./internal/gitclone -shuffle=on; go test ./internal/server -run 'TestRepoBrowser|TestAPI.*RepoBrowser|TestNonExistent' -shuffle=on; git diff --check.

Generated with Codex

Co-authored-by: Codex <codex@openai.com>
Scheduled repo-browser refreshes moved tag fetches out of the request path, but explicit refresh still needs the same stampede protection as clone ensure. Concurrent refresh callers for the same namespaced clone should share one branch-and-tag refresh rather than racing on FETCH_HEAD or ref locks.

Wrap repo-browser refresh in its own singleflight slot while preserving caller cancellation and the bounded detached operation context used by clone ensure.

Validation: go test ./internal/gitclone -shuffle=on; go test ./internal/server -run 'TestRepoBrowser|TestAPI.*RepoBrowser|TestNonExistent' -shuffle=on.

Generated with Codex

Co-authored-by: Codex <codex@openai.com>
The repo-browser refresh loop is background network work, so controlled test and dev server runs that disable background monitors should not start it implicitly.

Keep the scheduled refresh enabled for normal servers, but gate it with the same background-disable option already used for other server-owned loops.

Validation: go test ./internal/server -run 'TestRepoBrowser|TestAPI.*RepoBrowser|TestNonExistent' -shuffle=on.

Generated with Codex

Co-authored-by: Codex <codex@openai.com>
Existing repo-browser clones could remain stale after a server restart because the scheduled refresh loop only knew about repos opened in the current process. Seeding only clones already present on disk lets the immediate background refresh update those repos without cloning every configured repository or moving tag fetch back into the request path.

The refresh interval is also captured at server construction so the background loop does not read mutable config while reload tests rewrite the in-memory config. That closes the race reported by the CI go test -race lane.

Validation: go test ./internal/gitclone -run 'TestEnsureRepoBrowserCloneDoesNotFetchTagsForExistingClone|TestRefreshRepoBrowserClonesRefreshesRegisteredRepos|TestRefreshRepoBrowserClonesUsesSeededExistingClones|TestRepoBrowserRefreshFetchesTagsWithoutPruning' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser|TestAPI.*RepoBrowser|TestNonExistent' -shuffle=on; go test -race ./internal/server -run 'TestAPISharedHostCloneFetchFollowsReloadedHostToken|TestSSHFleetWebSocketTerminalUsesAttachSpecCommand|TestSSHFleetWebSocketTerminalHonorsResizeActive' -shuffle=on; go test ./internal/server -run TestWorkspaceResponseProbesStoredRuntimeTmuxSessionWithoutBaseE2E -short -shuffle=on. A broader local go test -race ./... run reached an internal/server timeout/goroutine dump under local runner pressure, so it was not a clean validation signal.

Generated with Codex

Co-authored-by: Codex <codex@openai.com>
Repo overview timelines still need fresh release tag targets even though shared clone fetches no longer pull the full tag namespace. Fetching only the requested release tag keeps the normal clone path bounded while allowing moved provider release tags to update before timeline calculation.

Repo browser commit detail also should not reject valid older file commits just because they fall outside the paginated history response. Checking ancestry and the exact commit diff preserves the selected-root constraint without tying commit detail correctness to the UI history limit.

Validation: go test ./internal/gitclone -run 'TestCommitTimelineSinceTag|TestRepoBrowserCommitDetail' -shuffle=on; go test ./internal/server -run 'TestRepoBrowser|TestAPI.*RepoBrowser|TestNonExistent' -shuffle=on; go test ./internal/gitclone -shuffle=on.

Generated with Codex

Co-authored-by: Codex <codex@openai.com>
Roborev found that the restart refresh path was only covered at the clone-manager level, and commit detail still rejected merge commits that changed the selected file. These are user-visible repo-browser paths, so keep merge commits in scope and pin the startup refresh behavior through the real server/API harness.

The release timeline test now also proves moved latest and timeline tags are refreshed by targeted tag fetches, without putting eager tag fetching back into the normal hot path.

Validation: go test ./internal/gitclone -run 'TestRepoBrowserCommitDetailAcceptsMergeCommitTouchingPath|TestRepoBrowserCommitDetailAcceptsOlderFileHistory|TestCommitTimelineSinceTagFetchesMovedTag' -shuffle=on; go test ./internal/server -run 'TestRepoBrowserCommitAcceptsOlderFileHistoryThroughHTTP|TestRepoBrowserStartupRefreshSeedsExistingClone|TestRepoBrowserStartupRefreshHonorsDisabledBackgroundMonitors|TestAPIListRepoSummariesIncludesSyncedReleaseTimeline' -shuffle=on; go test ./internal/gitclone -shuffle=on; go test ./internal/server -run 'TestRepoBrowser|TestAPIListRepoSummariesIncludesSyncedReleaseTimeline' -shuffle=on; go test -race ./internal/server -run 'TestRepoBrowser|TestAPIListRepoSummariesIncludesSyncedReleaseTimeline' -shuffle=on.\n\nGenerated with Codex\n\nCo-authored-by: Codex <codex@openai.com>
Scheduled repo-browser refreshes run under the server background context, so their git work needs to observe shutdown cancellation. Detaching every refresh made the waiter return while the singleflight worker could keep running until the clone timeout.

Keep explicit one-off refreshes protected from individual caller cancellation, but make scheduled refresh operations inherit their caller context so server shutdown can actually drain them.

Validation: go test ./internal/gitclone -run 'TestRepoBrowserScheduledRefreshContextStaysCancelable|TestRefreshRepoBrowserClones|TestRepoBrowserRefreshFetchesTagsWithoutPruning|TestRepoBrowserCommitDetailAcceptsMergeCommitTouchingPath' -shuffle=on; go test ./internal/gitclone -shuffle=on.\n\nGenerated with Codex\n\nCo-authored-by: Codex <codex@openai.com>
Resolve commit-detail SHAs before ancestry checks so missing full-length object IDs map to the repo-browser not_found response instead of surfacing git merge-base failures as internal errors.
The repo browser commit detail schema exposes body text, but the detail path reused the history-list format that only captured subjects. That made every selected commit detail appear bodyless even when Git had a multi-line description.

Use a detail-only Git format and parser so body text is available without changing the line-oriented history and last-changed parsing paths.

Validation: go test ./internal/gitclone -run 'TestRepoBrowserCommitDetail' -shuffle=on.\n\nGenerated with Codex\nCo-authored-by: Codex <codex@openai.com>
Repo browser read handlers should only create a clone when one is missing. Existing clone freshness now belongs to the scheduled refresh path, which avoids hot-path fetches and prevents disabled-monitor tests from racing clone cleanup.

Refresh operations now run the cancellable clone/fetch implementation directly instead of delegating through the detached generic clone singleflight. HTTP tests cover merge commits and commit bodies through the repo browser API.
Repo-browser refresh singleflight should not let a canceled HTTP request abort shared clone work that another waiter, such as scheduled refresh, joined. Request-triggered missing-clone refreshes now detach the worker with the existing bounded timeout.

Scheduled refreshes still use the caller context so server shutdown cancellation remains respected, preserving the earlier lifecycle fix while avoiding request-cancellation poisoning.

Validation: go test ./internal/gitclone -run 'TestRepoBrowser(RequestRefreshWorkDetachesCallerCancellation|ScheduledRefreshContextStaysCancelable|RefreshFetchesTagsWithoutPruning|EnsureRepoBrowserCloneDoesNotRefreshExistingClone)' -shuffle=on; go test ./internal/gitclone -shuffle=on.

Generated with Codex
Co-authored-by: Codex <codex@openai.com>
@mariusvniekerk mariusvniekerk merged commit dc638a6 into main Jun 25, 2026
11 of 21 checks passed
@mariusvniekerk mariusvniekerk deleted the repo-browser-api branch June 25, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant