Skip to content

Input::getAccessor(): Try substitution before fetching#380

Merged
edolstra merged 2 commits intomainfrom
eelcodolstra/nix-333-restore-substitution-before-fetching
Mar 11, 2026
Merged

Input::getAccessor(): Try substitution before fetching#380
edolstra merged 2 commits intomainfrom
eelcodolstra/nix-333-restore-substitution-before-fetching

Conversation

@edolstra
Copy link
Collaborator

@edolstra edolstra commented Mar 10, 2026

Motivation

Substitution is generally faster than fetching. However, in case the underlying fetcher already has the input locally (e.g. in the tarball/Git cache, or if it's a local Git repo), we don't want to substitute. So InputScheme::getAccessor() now has a fastOnly argument to return an accessor only if it doesn't involve any slow, remote fetching.

So Input::getAccessor() now does the following:

  1. InputScheme::getAccessor(fastOnly=true)
  2. Try substitution
  3. InputScheme::getAccessor(fastOnly=false)

Context

Summary by CodeRabbit

  • Improvements

    • Cache-first retrieval for faster hits and fewer full fetches
    • New fast-only early exits to avoid unnecessary network/disk work
    • More graceful substitution when cached data isn't available; fewer abrupt failures
  • Refactor

    • Unified accessor behavior to support conditional fast-paths
    • Streamlined control flow for accessor finalization, fingerprinting, and provenance
    • Reduced error propagation in favor of fallback/substitution flows

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Adds a cache-first fast path and a fastOnly flag across input schemes. Many getAccessor signatures now return std::optional<std::pair<...>> and accept bool fastOnly. Core fetch flow tries scheme fast-accessor, then store substitution, and falls back to full fetch when needed.

Changes

Cohort / File(s) Summary
Fetcher core
src/libfetchers/fetchers.cc
Implements cache-first fast-path via scheme getAccessor(..., fastOnly=true), consolidates finalization in a fixupAccessor, attempts store substitution when storePath exists, and falls back to normal fetch without swapping exception/control-flow semantics.
InputScheme interface
src/libfetchers/include/nix/fetchers/fetchers.hh
Adds virtual overload getAccessor(..., bool fastOnly) returning std::optional<std::pair<...>>; existing 3-arg getAccessor delegates to the new overload with fastOnly=false for compatibility.
Git scheme
src/libfetchers/git.cc
getAccessorFromCommit and getAccessor now accept fastOnly and return std::optional<std::pair<...>>; introduces fast-only early exits to avoid network/disk work when requested.
GitHub archive scheme
src/libfetchers/github.cc
downloadArchive and getAccessor accept fastOnly and return optionals; short-circuits tarball download/streaming on fast-only cache misses.
Tarball scheme
src/libfetchers/tarball.cc
downloadTarball_ returns std::optional<DownloadTarballResult> and accepts fastOnly; TarballInputScheme::getAccessor now returns optional and abandons full processing on fast-only miss.
Path scheme
src/libfetchers/path.cc
getAccessor updated to accept fastOnly and return optional; flag is ignored (path is always fast) and returns are wrapped in optionals.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Fetchers as Fetcher System
    participant Scheme as Input Scheme (git/github/path/tarball)
    participant Store as Store/Cache

    Caller->>Fetchers: getAccessor(input, fastOnly=true)
    Fetchers->>Scheme: getAccessor(settings, store, input, fastOnly=true)

    alt Scheme returns accessor (fast/cache hit)
        Scheme-->>Fetchers: {{accessor, input}}
        Fetchers->>Fetchers: fixupAccessor (fingerprint/provenance)
        Fetchers-->>Caller: accessor (fast result)
    else Scheme returns nullopt (fast miss)
        Scheme-->>Fetchers: nullopt
        Fetchers->>Store: attempt substitution (makeStoreAccessor)
        alt substitution succeeds
            Store-->>Fetchers: substituted accessor
            Fetchers->>Fetchers: fixupAccessor
            Fetchers-->>Caller: substituted accessor
        else substitution fails
            Fetchers->>Scheme: getAccessor(settings, store, input, fastOnly=false)
            Scheme-->>Fetchers: {{accessor, input}} (full fetch)
            Fetchers->>Fetchers: fixupAccessor
            Fetchers-->>Caller: fetched accessor
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

flake-regression-test

Suggested reviewers

  • grahamc
  • cole-h

Poem

🐇 I searched the cache with nimble toes,
fastOnly set — no needless rows.
If stash won't show, I try to slide,
then fetch the rest with gentle pride.
Fingerprints stamped, provenance sewn — a rabbit's hop, the change is known.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: attempting substitution before fetching in Input::getAccessor().

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch eelcodolstra/nix-333-restore-substitution-before-fetching

Comment @coderabbitai help to get the list of available commands and usage tips.

@edolstra edolstra force-pushed the eelcodolstra/nix-333-restore-substitution-before-fetching branch from 0c1d355 to 83c7503 Compare March 10, 2026 17:40
Substitution is generally faster. However, in case the underlying
fetcher already has the input locally (e.g. in the tarball/Git cache,
or if it's a local Git repo), we don't want to substitute. So
InputScheme::getAccessor() now has a `fastOnly` argument to return an
accessor only if it doesn't involve any slow, remote fetching.

So Input::getAccessor() now does the following:

1. InputScheme::getAccessor(fastOnly=true)
2. Try substitution
3. InputScheme::getAccessor(fastOnly=false)
@edolstra edolstra force-pushed the eelcodolstra/nix-333-restore-substitution-before-fetching branch from 83c7503 to a6af221 Compare March 10, 2026 17:44
@github-actions
Copy link

github-actions bot commented Mar 10, 2026

@github-actions github-actions bot temporarily deployed to pull request March 10, 2026 17:51 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/libfetchers/github.cc (1)

282-287: ⚠️ Potential issue | 🟠 Major

fastOnly still performs remote ref resolution.

When rev is absent, getRevFromRef() runs before the fastOnly return at Line 310. That means the "fast" probe can still hit the forge API and then repeat the same lookup on the slow path. Return std::nullopt before resolving floating refs here, or consult only a local ref-to-rev cache.

Also applies to: 310-311

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/libfetchers/github.cc` around lines 282 - 287, The fast-only probe is
doing remote ref resolution because when input.getRev() is empty you call
getRevFromRef(settings, store, input) (setting rev and upstreamTreeHash) before
checking the fastOnly flag; move the fastOnly early-return so you return
std::nullopt (or consult a local ref-to-rev cache) immediately when fastOnly is
set and rev is not provided, instead of invoking getRevFromRef(); update the
logic around input.getRev(), getRevFromRef(), and the fastOnly check so the fast
path never calls the forge API (also apply the same change to the similar code
at the 310-311 site).
src/libfetchers/git.cc (1)

859-862: ⚠️ Potential issue | 🟠 Major

Defer default-HEAD resolution in fastOnly mode.

For remote inputs without an explicit ref, getDefaultRef() can invoke readHeadCached(), which falls back to git ls-remote, before the fastOnly exit at Line 921. That reintroduces network I/O into the local-only phase and can repeat the lookup on the slow path. In fastOnly mode this should use cached HEAD metadata only, or return std::nullopt and let the slow path refresh it.

Also applies to: 920-923

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/libfetchers/git.cc` around lines 859 - 862, When running in fastOnly mode
and the input has no explicit ref (input.getRef() is empty), avoid calling
getDefaultRef(repoInfo, shallow) because it may call readHeadCached() which can
fall back to network I/O; instead, if fastOnly is true and the repo is remote,
do not compute/insert a default "ref" (leave input.attrs without "ref" or set it
to nullopt) so the slow path can resolve HEAD later. Update the block that
currently does auto ref = originalRef ? *originalRef : getDefaultRef(...) and
the similar logic at the later site (around the fastOnly exit) to short-circuit:
if fastOnly && !originalRef then skip getDefaultRef/readHeadCached and preserve
the absence of ref. Ensure you reference input.getRef(), canDoShallow(),
getDefaultRef(...), readHeadCached(), input.attrs.insert_or_assign("ref", ...)
and the fastOnly flag when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/libfetchers/fetchers.cc`:
- Around line 371-375: The try/catch around the fast-path accessor lookup in
fetchers.cc is swallowing all exceptions from scheme->getAccessor and hiding
real failures; remove the catch-all so exceptions propagate, or replace it with
a narrow catch for the specific cache-miss/optional-empty signal returned by
getAccessor (i.e., only handle the explicit "no accessor" case via the
std::nullopt/return path), and otherwise let errors from scheme->getAccessor
propagate so fixupAccessor(res->first, std::move(res->second)) is only reached
when a valid accessor is returned.

In `@src/libfetchers/include/nix/fetchers/fetchers.hh`:
- Around line 243-255: The two getAccessor overloads are mutually recursive and
can recurse forever if neither is overridden; change the non-fast overload to
fail explicitly instead of calling the bool overload. Replace the body of
std::pair<ref<SourceAccessor>, Input> getAccessor(const Settings&, Store&, const
Input&) to throw a clear exception (e.g. std::runtime_error("getAccessor not
implemented for this scheme")) or call std::terminate, so missing
implementations are detected deterministically; keep the bool fastOnly overload
returning std::nullopt as the default behavior. Ensure the thrown message
references getAccessor so it's easy to locate in logs.

---

Outside diff comments:
In `@src/libfetchers/git.cc`:
- Around line 859-862: When running in fastOnly mode and the input has no
explicit ref (input.getRef() is empty), avoid calling getDefaultRef(repoInfo,
shallow) because it may call readHeadCached() which can fall back to network
I/O; instead, if fastOnly is true and the repo is remote, do not compute/insert
a default "ref" (leave input.attrs without "ref" or set it to nullopt) so the
slow path can resolve HEAD later. Update the block that currently does auto ref
= originalRef ? *originalRef : getDefaultRef(...) and the similar logic at the
later site (around the fastOnly exit) to short-circuit: if fastOnly &&
!originalRef then skip getDefaultRef/readHeadCached and preserve the absence of
ref. Ensure you reference input.getRef(), canDoShallow(), getDefaultRef(...),
readHeadCached(), input.attrs.insert_or_assign("ref", ...) and the fastOnly flag
when making the change.

In `@src/libfetchers/github.cc`:
- Around line 282-287: The fast-only probe is doing remote ref resolution
because when input.getRev() is empty you call getRevFromRef(settings, store,
input) (setting rev and upstreamTreeHash) before checking the fastOnly flag;
move the fastOnly early-return so you return std::nullopt (or consult a local
ref-to-rev cache) immediately when fastOnly is set and rev is not provided,
instead of invoking getRevFromRef(); update the logic around input.getRev(),
getRevFromRef(), and the fastOnly check so the fast path never calls the forge
API (also apply the same change to the similar code at the 310-311 site).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a52ffb75-7edb-4994-b956-b4677ddc3ea2

📥 Commits

Reviewing files that changed from the base of the PR and between 6362dda and 83c7503.

📒 Files selected for processing (4)
  • src/libfetchers/fetchers.cc
  • src/libfetchers/git.cc
  • src/libfetchers/github.cc
  • src/libfetchers/include/nix/fetchers/fetchers.hh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
src/libfetchers/include/nix/fetchers/fetchers.hh (1)

243-255: ⚠️ Potential issue | 🟠 Major

Mutual recursion between default overloads can cause infinite recursion.

If a derived InputScheme overrides neither getAccessor overload:

  • getAccessor(settings, store, input, false) calls getAccessor(settings, store, input) (line 248)
  • getAccessor(settings, store, input) calls getAccessor(settings, store, input, false).value() (line 254)

This results in infinite recursion and a stack overflow. Break the cycle by having the 3-parameter overload throw an explicit error instead of delegating.

,

🔧 Proposed fix
     virtual std::pair<ref<SourceAccessor>, Input>
     getAccessor(const Settings & settings, Store & store, const Input & input) const
     {
-        return getAccessor(settings, store, input, false).value();
+        throw Error("getAccessor not implemented for input scheme '%s'", schemeName());
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/libfetchers/include/nix/fetchers/fetchers.hh` around lines 243 - 255, The
two default overloads of getAccessor are mutually recursive and can
infinite-loop if a derived InputScheme overrides neither; change the 3-parameter
overload getAccessor(const Settings&, Store&, const Input&) to not call the
4-parameter version but instead throw an explicit exception (e.g.,
std::runtime_error) with a clear message that this overload must be implemented
by the derived class (or that fastOnly behavior is unsupported), so callers get
a deterministic failure instead of stack overflow; update the function body of
getAccessor(const Settings&, Store&, const Input&) accordingly and keep the
existing 4-parameter default that returns std::nullopt for fastOnly.
src/libfetchers/fetchers.cc (2)

370-375: ⚠️ Potential issue | 🟠 Major

Don't swallow all exceptions from the fast-path.

The catch (...) {} block silently hides any exception from the fast-path accessor lookup. Since std::nullopt already signals "no cached accessor available," real errors (permission failures, corrupt cache, OOM, etc.) should propagate rather than being silently ignored.

,

🔧 Proposed fix — remove the try/catch or catch only expected signals
     /* See if the input is in the fetcher cache. */
-    try {
-        if (auto res = scheme->getAccessor(settings, store, *this, true))
-            return fixupAccessor(res->first, std::move(res->second));
-    } catch (...) {
-    }
+    if (auto res = scheme->getAccessor(settings, store, *this, true))
+        return fixupAccessor(res->first, std::move(res->second));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/libfetchers/fetchers.cc` around lines 370 - 375, The empty catch-all is
hiding real errors from the fast-path accessor lookup; remove the surrounding
try { ... } catch (...) { } around the call to scheme->getAccessor(settings,
store, *this, true) so that genuine exceptions propagate, or replace it with a
narrow catch that only handles the specific expected, non-fatal signal (e.g., a
custom "not found" exception) if one exists; ensure the code still returns
std::nullopt when getAccessor returns no cached accessor and that
fixupAccessor(res->first, std::move(res->second)) is unchanged.

384-387: ⚠️ Potential issue | 🟡 Minor

Bare catch (...) also swallows unexpected exceptions.

While catching Error & for substitution failures is reasonable, the subsequent catch (...) {} can hide non-Error exceptions (e.g., std::bad_alloc). Consider removing or logging it.

🔧 Proposed fix
         catch (Error & e2) {
             debug("substitution of input '%s' failed: %s", to_string(), e2.info().msg);
-        } catch (...) {
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/libfetchers/fetchers.cc` around lines 384 - 387, The bare catch(...)
after the Error & e2 handler swallows non-Error exceptions (e.g.,
std::bad_alloc); in the function containing the substitution code that calls
to_string() and uses debug(...) with Error & e2, remove the empty catch(...) or
replace it with a handler that logs the unexpected exception (using debug or
another logger) and then rethrows (or std::terminate) so fatal errors aren't
silently ignored; ensure the Error & e2 block remains to handle expected
substitution failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/libfetchers/fetchers.cc`:
- Around line 370-375: The empty catch-all is hiding real errors from the
fast-path accessor lookup; remove the surrounding try { ... } catch (...) { }
around the call to scheme->getAccessor(settings, store, *this, true) so that
genuine exceptions propagate, or replace it with a narrow catch that only
handles the specific expected, non-fatal signal (e.g., a custom "not found"
exception) if one exists; ensure the code still returns std::nullopt when
getAccessor returns no cached accessor and that fixupAccessor(res->first,
std::move(res->second)) is unchanged.
- Around line 384-387: The bare catch(...) after the Error & e2 handler swallows
non-Error exceptions (e.g., std::bad_alloc); in the function containing the
substitution code that calls to_string() and uses debug(...) with Error & e2,
remove the empty catch(...) or replace it with a handler that logs the
unexpected exception (using debug or another logger) and then rethrows (or
std::terminate) so fatal errors aren't silently ignored; ensure the Error & e2
block remains to handle expected substitution failures.

In `@src/libfetchers/include/nix/fetchers/fetchers.hh`:
- Around line 243-255: The two default overloads of getAccessor are mutually
recursive and can infinite-loop if a derived InputScheme overrides neither;
change the 3-parameter overload getAccessor(const Settings&, Store&, const
Input&) to not call the 4-parameter version but instead throw an explicit
exception (e.g., std::runtime_error) with a clear message that this overload
must be implemented by the derived class (or that fastOnly behavior is
unsupported), so callers get a deterministic failure instead of stack overflow;
update the function body of getAccessor(const Settings&, Store&, const Input&)
accordingly and keep the existing 4-parameter default that returns std::nullopt
for fastOnly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 234a975e-7aa4-4d44-a8c8-28281172b695

📥 Commits

Reviewing files that changed from the base of the PR and between 83c7503 and a6af221.

📒 Files selected for processing (5)
  • src/libfetchers/fetchers.cc
  • src/libfetchers/git.cc
  • src/libfetchers/github.cc
  • src/libfetchers/include/nix/fetchers/fetchers.hh
  • src/libfetchers/path.cc

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/libfetchers/fetchers.cc (1)

371-375: ⚠️ Potential issue | 🟠 Major

Do not hide real fast-path failures.

std::nullopt already covers the "no local accessor" case. Catching ... here also suppresses genuine fast-path errors and can silently fall through to substitution or a remote fetch instead of surfacing the real failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/libfetchers/fetchers.cc` around lines 371 - 375, The catch-all block
around the fast-path call masks real errors: remove the try { ... } catch(...) {
} wrapper so exceptions from scheme->getAccessor(...) and fixupAccessor(...)
propagate instead of being swallowed; keep the existing optional handling
(std::nullopt) to represent "no local accessor" and only add targeted exception
handling if you intend to handle a specific, documented exception type from
getAccessor or fixupAccessor.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/libfetchers/fetchers.cc`:
- Around line 371-375: The catch-all block around the fast-path call masks real
errors: remove the try { ... } catch(...) { } wrapper so exceptions from
scheme->getAccessor(...) and fixupAccessor(...) propagate instead of being
swallowed; keep the existing optional handling (std::nullopt) to represent "no
local accessor" and only add targeted exception handling if you intend to handle
a specific, documented exception type from getAccessor or fixupAccessor.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8fc8252-99be-4c44-8015-b49bdaa00cb4

📥 Commits

Reviewing files that changed from the base of the PR and between a6af221 and 41746d0.

📒 Files selected for processing (2)
  • src/libfetchers/fetchers.cc
  • src/libfetchers/tarball.cc

@github-actions github-actions bot temporarily deployed to pull request March 11, 2026 15:54 Inactive
@edolstra edolstra added this pull request to the merge queue Mar 11, 2026
Merged via the queue into main with commit 2581135 Mar 11, 2026
28 checks passed
@edolstra edolstra deleted the eelcodolstra/nix-333-restore-substitution-before-fetching branch March 11, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants