`mode=max` cache export silently drops build-stage layers with a lazy snapshotter + multi-stage `COPY --from`

### Contributing guidelines and issue reporting guide

- [x] I've read the [contributing guidelines](https://github.com/moby/buildkit/blob/master/.github/CONTRIBUTING.md) and wholeheartedly agree. I've also read the [issue reporting guide](https://github.com/moby/buildkit/blob/master/.github/issue_reporting_guide.md).

### Well-formed report checklist

- [x] I have found a bug that the documentation does not mention anything about my problem
- [x] I have found a bug that there are no open or closed issues that are related to my problem
- [x] I have provided version/information about my environment and done my best to provide a reproducer

### Description of bug

**Disclaimer**: this bug report was written via Claude, but the behavior matches my experience. I'm have a custom snapshotter (among other things), and this build https://github.com/clipper-registry/blog-buildkit-benchmark/actions/runs/27998059001/job/82864320244 _should_ have many cached layers, but does not.

## Bug description

With a lazy/remote snapshotter (stargz/eStargz, overlaybd, …) and a multi-stage
build where a later stage consumes an earlier, **different-base** stage via
`COPY --from`, `--cache-to mode=max` **silently drops the consumed stage's cache
layers**. On a later build, a step in that stage that should be restored from cache
**re-runs** instead. No error is reported — the export "succeeds" with a degraded
cache.

## Reproduction

Requires a lazy snapshotter; this uses the upstream stargz snapshotter and the
public `ghcr.io/stargz-containers` eStargz images.

```bash
# 1. buildkit with the stargz snapshotter
docker buildx create --name stargz --driver docker-container \
  --driver-opt image=moby/buildkit:latest \
  --buildkitd-flags "--oci-worker-snapshotter=stargz"

# 2. Multi-stage: build stage on one esgz base, final stage on a DIFFERENT esgz
#    base, COPY --from, plus a downstream step (ARG BUST) that forces a re-run.
cat > Dockerfile <<'EOF'
FROM ghcr.io/stargz-containers/ubuntu:22.04-esgz AS build
RUN echo stable > /stable
ARG BUST=0
RUN echo "$BUST" > /bust && cat /stable > /combined
FROM ghcr.io/stargz-containers/alpine:3.15.3-esgz
COPY --from=build /combined /combined
EOF

# 3. First build — exports the cache
docker buildx build --builder stargz --platform linux/amd64 --build-arg BUST=1 \
  --cache-to type=local,dest=./cache,mode=max --output type=cacheonly .

# 4. Drop local state, then rebuild changing only BUST. The second RUN must
#    re-run, which requires the first RUN's filesystem state to be restored.
docker buildx prune -af --builder stargz
docker buildx build --builder stargz --platform linux/amd64 --build-arg BUST=2 \
  --cache-from type=local,src=./cache --output type=cacheonly .
```

### Expected

On the second build, `RUN echo stable > /stable` (unchanged) is `CACHED`.

### Actual

```
#8 [build 2/3] RUN echo stable > /stable
#8 DONE 0.4s          <-- re-runs instead of CACHED
```

Its cache layer was dropped during the first build's export, so it cannot be
restored when the downstream `BUST` step re-runs. (Inspecting `./cache`'s
`application/vnd.buildkit.cacheconfig.v0` config confirms the build-stage record
has an empty `layers` field; only the final `COPY` record has a layer.)

The same Dockerfile with an eager snapshotter, or with the same base image in both
stages, caches correctly.

### Root cause

1. The cache exporter recurses **per record** into cross-stage deps
   (`solver/exporter.go` `ExportTo`).
2. Loading a record's result (`worker/cacheresult.go` `LoadRemotes` →
   `Worker.LoadRef` → `CacheManager.Get`) runs `checkLazyProviders`
   (`cache/manager.go`), which returns `NeedsRemoteProviderError` for any **lazy**
   ancestor blob lacking a `DescHandler`.
3. `LoadRef` recovers those handlers from `CacheOptGetterOf(ctx)` — the
   `withAncestorCacheOpts` getter, which resolves handlers from the **ancestor
   states** of whatever state it was rooted at.
4. **The regression (`051818cf3`)**: the exporter now sets that getter **once, at
   the outermost export**, and all nested records inherit it
   (`if CacheOptGetterOf(ctx) == nil && e.recordCtxOpts != nil`). Since
   `withAncestorCacheOpts` walks only one record's ancestry, the getter rooted at
   the *final* stage's result does not reach the *build* stage's source op, so the
   build-stage result's lazy base-layer handlers are unresolvable.
5. `LoadRef` therefore still fails with `NeedsRemoteProviderError`; `ExportTo`
   returns it; the deps loop swallows it (`if err != nil { continue }`, added in
   the same PR series to avoid failing export on subbranch errors). The
   build-stage result is dropped from the cache.

Eager snapshotters are unaffected (base blobs are materialized, `isLazy == false`,
so no handler is needed).

### Suggested fix

Re-root the opt-getter at each record's own state (the pre-`051818cf3` behavior),
matching the exporter's per-record recursion:

```diff
 	mainCtx := ctx
-	if CacheOptGetterOf(ctx) == nil && e.recordCtxOpts != nil {
+	if e.recordCtxOpts != nil {
 		ctx = e.recordCtxOpts(ctx)
 	}
```

A getter is only ever queried for a record's own ancestor blobs, so per-record
re-rooting resolves exactly what each record needs. (If the perf intent of
`051818cf3` matters, an alternative is to keep "set once" but make
`withAncestorCacheOpts` traverse cross-stage edges.) It would also help to log,
rather than silently swallow, the subbranch export error so this failure mode is
visible.

With the above change, the reproduction's second build reports
`RUN echo stable > /stable ... CACHED`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`mode=max` cache export silently drops build-stage layers with a lazy snapshotter + multi-stage `COPY --from` #6893

Contributing guidelines and issue reporting guide

Well-formed report checklist

Description of bug

Bug description

Reproduction

Expected

Actual

Root cause

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

mode=max cache export silently drops build-stage layers with a lazy snapshotter + multi-stage COPY --from #6893

Description

Contributing guidelines and issue reporting guide

Well-formed report checklist

Description of bug

Bug description

Reproduction

Expected

Actual

Root cause

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`mode=max` cache export silently drops build-stage layers with a lazy snapshotter + multi-stage `COPY --from` #6893