Skip to content

storage: new sync option for overlay/vfs#622

Open
giuseppe wants to merge 6 commits intocontainers:mainfrom
giuseppe:sync-policy
Open

storage: new sync option for overlay/vfs#622
giuseppe wants to merge 6 commits intocontainers:mainfrom
giuseppe:sync-policy

Conversation

@giuseppe
Copy link
Member

@giuseppe giuseppe commented Feb 2, 2026

Add a new driver-specific sync configuration option that controls filesystem synchronization during layer operations. When set to "filesystem", the driver ensures all pending writes are flushed to the file system before marking a layer as complete.

This helps prevent data corruption in scenarios where the system crashes or loses power before the filesystem has finished writing layer data.

@github-actions github-actions bot added the storage Related to "storage" package label Feb 2, 2026
@giuseppe
Copy link
Member Author

giuseppe commented Feb 2, 2026

@mrunalp PTAL

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an initial skim.

doptions = append(doptions, fmt.Sprintf("%s.use_composefs=%s", driverName, options.Overlay.UseComposefs))
}
if options.Overlay.SyncPolicy != "" {
doptions = append(doptions, fmt.Sprintf("%s.sync_policy=%s", driverName, options.Overlay.SyncPolicy))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seriously dislike this kind of turning of structured data back into strings.

I realize that’s not the fault of this PR, but can we stop doing that some time? There will never be a convenient time.

I suppose OptionsConfig, or something, could be stored in StoreOptions, and perhaps added to drivers.Options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would be nice to parse the value into an enum already here.

@giuseppe giuseppe changed the title [RFC] new sync_policy option for overlay/vf [RFC] new sync option for overlay/vf Feb 2, 2026
@giuseppe giuseppe changed the title [RFC] new sync option for overlay/vf [RFC] new sync option for overlay/vfs Feb 2, 2026
@giuseppe giuseppe marked this pull request as ready for review February 3, 2026 11:58
@mrunalp
Copy link
Contributor

mrunalp commented Feb 3, 2026

👍 to adding this option.

@giuseppe giuseppe changed the title [RFC] new sync option for overlay/vfs storage: new sync option for overlay/vfs Feb 3, 2026
@mrunalp
Copy link
Contributor

mrunalp commented Feb 3, 2026

@nalind thoughts?

bitoku added a commit to bitoku/storage that referenced this pull request Feb 4, 2026
bitoku added a commit to bitoku/storage that referenced this pull request Feb 4, 2026
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really fully review just some drive by comments I might be missing the context of the origin of this.

Comment on lines +222 to +223
**sync**=""
Filesystem synchronization mode for layer creation. (default: "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default: "" does not say what the default actually is, it should say none or filesystem.

I think it default to none? Is that the right default? I mean practically speaking should we not do the safe thing as default. I guess the concern would be performance? But the people needing that could opt. Without sync we need to worry about more store corruption which seems worse for us?

logrus.Debugf("Error Syncfs(%s) - %v", mountpoint, err)
}
unix.Close(fd)
if err := system.Syncfs(mountpoint); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct to not use the syncmode setting here and always sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this was already done before, I've only reused the new function now.

If you kill a FUSE file system, all the data in memory will be lost, even if the system didn't crash. This is a fallback path taken when doing it correctly through fusermount failed.

The syncfs here prevents that when fusermount failed (or it is not present), we do not lose data on fuse-overlayfs when we unmount it.

@cgwalters
Copy link
Contributor

Of reference too composefs/composefs-rs#185 (comment) Basically ostree keeps track of the boot id optionally - another option here is to have a periodic "walk layer dirs and fsync()" which is syscall expensive (without io-uring) but avoids flushing unrelated I/O.

This of course relates to "containers-storage on top of composefs(rs)" in that this option would today always be on by default...

@giuseppe giuseppe force-pushed the sync-policy branch 2 times, most recently from 747478c to 2bc826e Compare February 16, 2026 09:41
@giuseppe
Copy link
Member Author

@mrunalp @bitoku got any feedback on this feature?

@bitoku
Copy link
Contributor

bitoku commented Feb 16, 2026

Can we keep it open for a while until we get any feedback?

@bitoku
Copy link
Contributor

bitoku commented Feb 19, 2026

we haven't got any feedback from the customer, but they'd like us not to wait for them to get it released.
If it doesn't hurt, can we proceed with this change?

@giuseppe
Copy link
Member Author

@containers/storage-maintainers PTAL

@giuseppe
Copy link
Member Author

giuseppe commented Mar 4, 2026

@mtrmac @kolyshkin @Luap99 PTAL

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK overall.

@giuseppe giuseppe force-pushed the sync-policy branch 2 times, most recently from 63ba009 to 31330a9 Compare March 6, 2026 09:29
t.Fatalf("Expected to find overlay sync option in %v", doptions)
}

// Make sure no sync option when not configured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests already start with “empty should mean 0 options”, this is unnecessary. Same with VFS.

modifiedLocations |= layer.location
}
if err := r.saveLayers(modifiedLocations); err != nil {
if err := r.saveLayers(modifiedLocations, true); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is after deleting layers and BigData… I am really unsure whether we need to sync.

I suppose any future operation will sync that?!

At least, it is not “metadata only” so we might need a more precise naming of the option/situation.


// saveFor saves the contents of the store relevant for modifiedLayer to disk.
// If metadataOnly is true, only the layer metadata (db) is being changed (e.g. setting a flag,
// changing names), and the sync of the storage filesystem is skipped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(After the API settles) please document this also in the other save… methods.

}
}

// Write the tar-split file before UpdateLayerIDMap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This hints at the order being important but it doesn’t say what to pay attention to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unnecessary now as the sync happens later.

On the other hand, the AtomicWriteFile is not needed here. I think we need to drop it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general principle has been “writes of our metadata always atomic, writes of layer data are atomic only with the new opt-in”. In that sense, tar-split probably counts as “metadata” — OTOH, if either the layer data or the tar-split is corrupted, the outcome is ~the same, so I don’t feel strongly about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree with the "always write medata atomically". I am only proposing tar-split as an exception because it still needs the data to be meaningful

}
// If the underlying storage driver is using sync, make sure we sync everything before
// saving the layer data, this ensures that all files/directories are properly created and written.
if r.driver.SyncMode() != drivers.SyncModeNone {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. For safety, I think I’d prefer a private enum over just a bool. (And I didn’t now carefully review every call site, assuming that it would be easier with the enum.)

giuseppe added 2 commits March 6, 2026 16:47
there is no reason to sync the tarsplit data if the rest of the layer
was not sync'ed.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
save the layer info only after the tarsplit file and big data was
written.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added 4 commits March 6, 2026 16:47
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
it is a preparatory commit.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Add a new driver-specific `sync` configuration option that controls
filesystem synchronization during layer operations.  When set to
"filesystem", the driver ensures all pending writes are flushed to the
file system before marking a layer as complete.

This helps prevent data corruption in scenarios where the system crashes
or loses power before the filesystem has finished writing layer data.

Only the vfs and overlay backends support the new flag.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

giuseppe commented Mar 6, 2026

I've left a bool but renamed it to needsSyncfs so the intention is clearer

@giuseppe
Copy link
Member Author

@mtrmac is there anything more left or can we move forward with this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

storage Related to "storage" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants