Skip to content

vfs-7.0.misc-bpffs-delegatable#10970

Closed
mihalicyn wants to merge 17 commits intokernel-patches:bpf-next_basefrom
mihalicyn:vfs-7.0.misc-bpffs-delegatable
Closed

vfs-7.0.misc-bpffs-delegatable#10970
mihalicyn wants to merge 17 commits intokernel-patches:bpf-next_basefrom
mihalicyn:vfs-7.0.misc-bpffs-delegatable

Conversation

@mihalicyn
Copy link
Copy Markdown
Contributor

@mihalicyn mihalicyn commented Feb 5, 2026

realwujing and others added 17 commits January 23, 2026 11:31
In close_range(), the kernel traditionally performs a linear scan over the
[fd, max_fd] range, resulting in O(N) complexity where N is the range size.
For processes with sparse FD tables, this is inefficient as it checks many
unallocated slots.

This patch optimizes __range_close() by using find_next_bit() on the
open_fds bitmap to skip holes. This shifts the algorithmic complexity from
O(Range Size) to O(Active FDs), providing a significant performance boost
for large-range close operations on sparse file descriptor tables.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn>
Link: https://patch.msgid.link/20260123081221.659125-1-realwujing@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
While setting file attributes, the read-only flags are reset
for ->xflags, but not for ->flags if flag is shared between both. This
is fine for now as all read-only xflags don't overlap with flags.
However, for any read-only shared flag this will create inconsistency
between xflags and flags. The non-shared flag will be reset in
vfs_fileattr_set() to the current value, but shared one is past further
to ->fileattr_set.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://patch.msgid.link/20260121193645.3611716-1-aalbersh@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Syzbot reported a KMSAN uninit-value issue in ovl_fill_real.

This iusse's call chain is:
__do_sys_getdents64()
    -> iterate_dir()
        ...
            -> ext4_readdir()
                -> fscrypt_fname_alloc_buffer() // alloc
                -> fscrypt_fname_disk_to_usr // write without tail '\0'
                -> dir_emit()
                    -> ovl_fill_real() // read by strcmp()

The string is used to store the decrypted directory entry name for an
encrypted inode. As shown in the call chain, fscrypt_fname_disk_to_usr()
write it without null-terminate. However, ovl_fill_real() uses strcmp() to
compare the name against "..", which assumes a null-terminated string and
may trigger a KMSAN uninit-value warning when the buffer tail contains
uninit data.

Reported-by: syzbot+d130f98b2c265fae5297@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d130f98b2c265fae5297
Fixes: 4edb83b ("ovl: constant d_ino for non-merge dirs")
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-2-amir73il@gmail.com
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Rename the helper is_dot_dotdot() into the name_ namespace
and add complementary helpers to check for dot and dotdot
names individually.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-3-amir73il@gmail.com
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use the helpers in place of all the different open coded variants.
This makes the code more readable and robust.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-4-amir73il@gmail.com
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Amir Goldstein <amir73il@gmail.com> says:

Following the syzbot ovl bug report and a fix by Qing Wang,
I decided to follow up with a small vfs cleanup of some
open coded version of checking "." and ".." name in readdir.

The fix patch is applied at the start of this cleanup series to allow
for easy backporting, but it is not an urgent fix so I don't think
there is a need to fast track it.

* patches from https://patch.msgid.link/20260128132406.23768-1-amir73il@gmail.com:
  ovl: use name_is_dot* helpers in readdir code
  fs: add helpers name_is_dot{,dot,_dotdot}
  ovl: Fix uninit-value in ovl_fill_real

Link: https://patch.msgid.link/20260128132406.23768-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Fix minor spelling and indentation errors in the
documentation comments.

Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com>
Link: https://patch.msgid.link/20260128143150.3674284-1-chelsyratnawat2001@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
C-String literals were added in Rust 1.77. Replace instances of
`kernel::c_str!` with C-String literals where possible.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251222-cstr-vfs-v1-1-18e3d327cbd7@gmail.com
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
fs-verity introduced inode flag for inodes with enabled fs-verity on
them. This patch adds FS_XFLAG_VERITY file attribute which can be
retrieved with FS_IOC_FSGETXATTR ioctl() and file_getattr() syscall.

This flag is read-only and can not be set with corresponding set ioctl()
and file_setattr(). The FS_IOC_SETFLAGS requires file to be opened for
writing which is not allowed for verity files. The FS_IOC_FSSETXATTR and
file_setattr() clears this flag from the user input.

As this is now common flag for both flag interfaces (flags/xflags) add
it to overlapping flags list to exclude it from overwrite.

Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://patch.msgid.link/20260126115658.27656-2-aalbersh@kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
fs-verity previously had debug printk but it was removed. This patch
adds trace points to similar places, as a better alternative.

Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix formatting]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/20260126115658.27656-3-aalbersh@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Andrey Albershteyn <aalbersh@kernel.org> says:

This two small patches grew from fs-verity XFS patchset. I think they're
self-contained improvements which could go without XFS implementation.

* patches from https://patch.msgid.link/20260126115658.27656-1-aalbersh@kernel.org:
  fsverity: add tracepoints
  fs: add FS_XFLAG_VERITY for fs-verity files

Link: https://patch.msgid.link/20260126115658.27656-1-aalbersh@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
pidfs and nsfs recently gained support for encode/decode of file handles
via name_to_handle_at(2)/open_by_handle_at(2).

These special kernel filesystems have custom ->open() and ->permission()
export methods, which nfsd does not respect and it was never meant to be
used for exporting those filesystems by nfsd.

Update kernel-doc comments to express the fact the those methods are for
open_by_handle(2) system only and not compatible with nfsd.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260129100212.49727-2-amir73il@gmail.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
pidfs and nsfs recently gained support for encode/decode of file handles
via name_to_handle_at(2)/open_by_handle_at(2).

These special kernel filesystems have custom ->open() and ->permission()
export methods, which nfsd does not respect and it was never meant to be
used for exporting those filesystems by nfsd.

Therefore, do not allow nfsd to export filesystems with custom ->open()
or ->permission() methods.

Fixes: b3caba8 ("pidfs: implement file handle support")
Fixes: 5222470 ("nsfs: support file handles")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260129100212.49727-3-amir73il@gmail.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Currently it is not possible to distinguish between the case where a
process has already exited and the case where a process is in a
different namespace, as both return -ESRCH.
glibc's pidfd_getpid() procfs-based implementation returns -EREMOTE
in the latter, so that distinguishing the two is possible, as the
fdinfo in procfs will list '0' as the PID in that case:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/pidfd_getpid.c;h=860829cf07da2267484299ccb02861822c0d07b4;hb=HEAD#l121

Change the error code so that the kernel also returns -EREMOTE in
that case.

Fixes: 7477d7d ("pidfs: allow to retrieve exit information")

Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com>
Link: https://patch.msgid.link/20260127225209.2293342-1-luca.boccassi@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Calling convention has changed in  ea38219 ("vfs: support caching symlink lengths in inodes")

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20260203130032.315177-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Commit e1c5ae5 ("fs: don't allow non-init s_user_ns for filesystems
without FS_USERNS_MOUNT") prevents the mount of any filesystem inside a
container that doesn't have FS_USERNS_MOUNT set.

This broke NFS mounts in our containerized environment. We have a daemon
somewhat like systemd-mountfsd running in the init_ns. A process does a
fsopen() inside the container and passes it to the daemon via unix
socket.

The daemon then vets that the request is for an allowed NFS server and
performs the mount. This now fails because the fc->user_ns is set to the
value in the container and NFS doesn't set FS_USERNS_MOUNT.  We don't
want to add FS_USERNS_MOUNT to NFS since that would allow the container
to mount any NFS server (even malicious ones).

Add a new FS_USERNS_DELEGATABLE flag, and enable it on NFS.

Fixes: e1c5ae5 ("fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260129-twmount-v1-1-4874ed2a15c4@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Instead of FS_USERNS_MOUNT we should use recently introduced
FS_USERNS_DELEGATABLE cause it better expresses what we
really want to get there. Filesystem should not be allowed
to be mounted by an unprivileged user, but at the same time
we want to have sb->s_user_ns to point to the container's
user namespace, at the same time superblock can only
be created if capable(CAP_SYS_ADMIN) check is successful.

Tested and no regressions noticed.

No functional change intended.

Link: https://lore.kernel.org/linux-fsdevel/6dd181bf9f6371339a6c31f58f582a9aac3bc36a.camel@kernel.org [1]
Fixes: 6fe01d3 ("bpf: Add BPF token delegation mount options to BPF FS")
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
- RWB-tag from Jeff [1]
Reviewed-by: Jeff Layton <jlayton@kernel.org>
@kernel-patches-daemon-bpf kernel-patches-daemon-bpf Bot force-pushed the bpf-next_base branch 11 times, most recently from 254af9f to 25c770c Compare February 12, 2026 01:01
@kernel-patches-daemon-bpf
Copy link
Copy Markdown

Automatically cleaning up stale PR; feel free to reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants