Skip to content

chunked: use reflinks for chunk deduplication#892

Open
giuseppe wants to merge 3 commits into
podman-container-tools:mainfrom
giuseppe:chunked-preopen-files
Open

chunked: use reflinks for chunk deduplication#892
giuseppe wants to merge 3 commits into
podman-container-tools:mainfrom
giuseppe:chunked-preopen-files

Conversation

@giuseppe
Copy link
Copy Markdown
Contributor

@giuseppe giuseppe commented Jun 5, 2026

When reusing chunks from other layers, reflink the source file into a temporary directory under the staging dir. The reflinked copy shares data blocks (CoW, O(1)) but is a separate inode, so itsurvives concurrent deletion of the source layer.

If the filesystem does not support reflinks, chunk deduplication is skipped and all chunks are fetched from the network.

Follow-up for #890

@github-actions github-actions Bot added the storage Related to "storage" package label Jun 5, 2026
Copy link
Copy Markdown
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would work individually and cleanly resolve the race, but I don’t think it scales: a layer can be thousands or millions of files (i.e. possibly even an order of magnitude more roll-sum chunks), and the process has a file descriptor limit.

(I didn’t review the details of the implementation.)

Comment thread storage/pkg/chunked/storage_linux.go Outdated
srcDirfd, err := unix.Open(source, unix.O_RDONLY|unix.O_CLOEXEC, 0)
if err != nil {
// The source layer may have been deleted concurrently.
if errors.Is(err, unix.ENOENT) || errors.Is(err, unix.ENOTDIR) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ENOTDIR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong assumption that it could happen when a parent directory is deleted. That still maps to ENOENT.

Honny1 and others added 3 commits June 5, 2026 18:52
When `copyFileFromOtherLayer` opens a source layer directory that was
removed by a concurrent `rmi`, return "not found" instead of a fatal
error so the differ fetches content from the remote blob instead.

Fixes podman flake: "quadlet - kube build from unavailable image with no tag

Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add a Reflink() function that attempts a CoW file clone without
falling back to io.Copy.  Callers that need to know whether the
filesystem supports reflinks can use this instead of ReflinkOrCopy.

ReflinkOrCopy is refactored to call Reflink internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When reusing chunks from other layers, reflink the source file into
a temporary directory under the staging dir.  The reflinked copy
shares data blocks (CoW, O(1)) but is a separate inode, so it
survives concurrent deletion of the source layer.

If the filesystem does not support reflinks, chunk deduplication is
skipped and all chunks are fetched from the network.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the chunked-preopen-files branch from 89cac1b to a09e9f6 Compare June 5, 2026 16:52
@giuseppe giuseppe marked this pull request as ready for review June 5, 2026 17:24
@giuseppe giuseppe changed the title [WIP] chunked: handle concurrent layer deletion during dedup chunked: follow-up for https://github.com/podman-container-tools/container-libs/pull/890 Jun 5, 2026
Copy link
Copy Markdown
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a literally one-minute skim, but looks reasonable — and actually much smaller than I expected. Nice!


wg.Wait()

chunkRefsDir := filepath.Join(filepath.Dir(dest), "chunk-refs")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever deleted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is under the staging directory, it will all be cleaned up

@giuseppe giuseppe changed the title chunked: follow-up for https://github.com/podman-container-tools/container-libs/pull/890 chunked: use reflinks for chunk deduplication Jun 5, 2026
@giuseppe
Copy link
Copy Markdown
Contributor Author

giuseppe commented Jun 5, 2026

I've tested this manually and seems to work fine

@cgwalters
Copy link
Copy Markdown
Contributor

Code changes offhand look sane to me.

One thing I don't understand - why can't we lock the source layer?


Side note: this is very different in a composefs-rs, because we don't have unpacked files per layer at all, just an object store.

@giuseppe
Copy link
Copy Markdown
Contributor Author

giuseppe commented Jun 6, 2026

One thing I don't understand - why can't we lock the source layer?

the network code calls into containers/image that in turn can call back into c/storage so we might end up with some deadlocks.

On top of that, the network calls could be slow and they would block any write access to the store in the meanwhile (possibly this could be abused)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants