chunked: use reflinks for chunk deduplication#892
Conversation
mtrmac
left a comment
There was a problem hiding this comment.
I think this would work individually and cleanly resolve the race, but I don’t think it scales: a layer can be thousands or millions of files (i.e. possibly even an order of magnitude more roll-sum chunks), and the process has a file descriptor limit.
(I didn’t review the details of the implementation.)
| srcDirfd, err := unix.Open(source, unix.O_RDONLY|unix.O_CLOEXEC, 0) | ||
| if err != nil { | ||
| // The source layer may have been deleted concurrently. | ||
| if errors.Is(err, unix.ENOENT) || errors.Is(err, unix.ENOTDIR) { |
There was a problem hiding this comment.
wrong assumption that it could happen when a parent directory is deleted. That still maps to ENOENT.
When `copyFileFromOtherLayer` opens a source layer directory that was removed by a concurrent `rmi`, return "not found" instead of a fatal error so the differ fetches content from the remote blob instead. Fixes podman flake: "quadlet - kube build from unavailable image with no tag Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Add a Reflink() function that attempts a CoW file clone without falling back to io.Copy. Callers that need to know whether the filesystem supports reflinks can use this instead of ReflinkOrCopy. ReflinkOrCopy is refactored to call Reflink internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When reusing chunks from other layers, reflink the source file into a temporary directory under the staging dir. The reflinked copy shares data blocks (CoW, O(1)) but is a separate inode, so it survives concurrent deletion of the source layer. If the filesystem does not support reflinks, chunk deduplication is skipped and all chunks are fetched from the network. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
89cac1b to
a09e9f6
Compare
mtrmac
left a comment
There was a problem hiding this comment.
This is a literally one-minute skim, but looks reasonable — and actually much smaller than I expected. Nice!
|
|
||
| wg.Wait() | ||
|
|
||
| chunkRefsDir := filepath.Join(filepath.Dir(dest), "chunk-refs") |
There was a problem hiding this comment.
that is under the staging directory, it will all be cleaned up
|
I've tested this manually and seems to work fine |
|
Code changes offhand look sane to me. One thing I don't understand - why can't we lock the source layer? Side note: this is very different in a composefs-rs, because we don't have unpacked files per layer at all, just an object store. |
the network code calls into containers/image that in turn can call back into c/storage so we might end up with some deadlocks. On top of that, the network calls could be slow and they would block any write access to the store in the meanwhile (possibly this could be abused) |
When reusing chunks from other layers, reflink the source file into a temporary directory under the staging dir. The reflinked copy shares data blocks (CoW, O(1)) but is a separate inode, so itsurvives concurrent deletion of the source layer.
If the filesystem does not support reflinks, chunk deduplication is skipped and all chunks are fetched from the network.
Follow-up for #890