vfs-7.0.misc-bpffs-delegatable#10970
Closed
mihalicyn wants to merge 17 commits intokernel-patches:bpf-next_basefrom
Closed
vfs-7.0.misc-bpffs-delegatable#10970mihalicyn wants to merge 17 commits intokernel-patches:bpf-next_basefrom
mihalicyn wants to merge 17 commits intokernel-patches:bpf-next_basefrom
Conversation
In close_range(), the kernel traditionally performs a linear scan over the [fd, max_fd] range, resulting in O(N) complexity where N is the range size. For processes with sparse FD tables, this is inefficient as it checks many unallocated slots. This patch optimizes __range_close() by using find_next_bit() on the open_fds bitmap to skip holes. This shifts the algorithmic complexity from O(Range Size) to O(Active FDs), providing a significant performance boost for large-range close operations on sparse file descriptor tables. Signed-off-by: Qiliang Yuan <realwujing@gmail.com> Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> Link: https://patch.msgid.link/20260123081221.659125-1-realwujing@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
While setting file attributes, the read-only flags are reset for ->xflags, but not for ->flags if flag is shared between both. This is fine for now as all read-only xflags don't overlap with flags. However, for any read-only shared flag this will create inconsistency between xflags and flags. The non-shared flag will be reset in vfs_fileattr_set() to the current value, but shared one is past further to ->fileattr_set. Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Link: https://patch.msgid.link/20260121193645.3611716-1-aalbersh@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
Syzbot reported a KMSAN uninit-value issue in ovl_fill_real.
This iusse's call chain is:
__do_sys_getdents64()
-> iterate_dir()
...
-> ext4_readdir()
-> fscrypt_fname_alloc_buffer() // alloc
-> fscrypt_fname_disk_to_usr // write without tail '\0'
-> dir_emit()
-> ovl_fill_real() // read by strcmp()
The string is used to store the decrypted directory entry name for an
encrypted inode. As shown in the call chain, fscrypt_fname_disk_to_usr()
write it without null-terminate. However, ovl_fill_real() uses strcmp() to
compare the name against "..", which assumes a null-terminated string and
may trigger a KMSAN uninit-value warning when the buffer tail contains
uninit data.
Reported-by: syzbot+d130f98b2c265fae5297@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d130f98b2c265fae5297
Fixes: 4edb83b ("ovl: constant d_ino for non-merge dirs")
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://patch.msgid.link/20260128132406.23768-2-amir73il@gmail.com
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Rename the helper is_dot_dotdot() into the name_ namespace and add complementary helpers to check for dot and dotdot names individually. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260128132406.23768-3-amir73il@gmail.com Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
Use the helpers in place of all the different open coded variants. This makes the code more readable and robust. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260128132406.23768-4-amir73il@gmail.com Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
Amir Goldstein <amir73il@gmail.com> says: Following the syzbot ovl bug report and a fix by Qing Wang, I decided to follow up with a small vfs cleanup of some open coded version of checking "." and ".." name in readdir. The fix patch is applied at the start of this cleanup series to allow for easy backporting, but it is not an urgent fix so I don't think there is a need to fast track it. * patches from https://patch.msgid.link/20260128132406.23768-1-amir73il@gmail.com: ovl: use name_is_dot* helpers in readdir code fs: add helpers name_is_dot{,dot,_dotdot} ovl: Fix uninit-value in ovl_fill_real Link: https://patch.msgid.link/20260128132406.23768-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Fix minor spelling and indentation errors in the documentation comments. Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com> Link: https://patch.msgid.link/20260128143150.3674284-1-chelsyratnawat2001@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
C-String literals were added in Rust 1.77. Replace instances of `kernel::c_str!` with C-String literals where possible. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Benno Lossin <lossin@kernel.org> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Link: https://patch.msgid.link/20251222-cstr-vfs-v1-1-18e3d327cbd7@gmail.com Acked-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Daniel Almeida <daniel.almeida@collabora.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
fs-verity introduced inode flag for inodes with enabled fs-verity on them. This patch adds FS_XFLAG_VERITY file attribute which can be retrieved with FS_IOC_FSGETXATTR ioctl() and file_getattr() syscall. This flag is read-only and can not be set with corresponding set ioctl() and file_setattr(). The FS_IOC_SETFLAGS requires file to be opened for writing which is not allowed for verity files. The FS_IOC_FSSETXATTR and file_setattr() clears this flag from the user input. As this is now common flag for both flag interfaces (flags/xflags) add it to overlapping flags list to exclude it from overwrite. Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Link: https://patch.msgid.link/20260126115658.27656-2-aalbersh@kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
fs-verity previously had debug printk but it was removed. This patch adds trace points to similar places, as a better alternative. Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: fix formatting] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20260126115658.27656-3-aalbersh@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
Andrey Albershteyn <aalbersh@kernel.org> says: This two small patches grew from fs-verity XFS patchset. I think they're self-contained improvements which could go without XFS implementation. * patches from https://patch.msgid.link/20260126115658.27656-1-aalbersh@kernel.org: fsverity: add tracepoints fs: add FS_XFLAG_VERITY for fs-verity files Link: https://patch.msgid.link/20260126115658.27656-1-aalbersh@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
pidfs and nsfs recently gained support for encode/decode of file handles via name_to_handle_at(2)/open_by_handle_at(2). These special kernel filesystems have custom ->open() and ->permission() export methods, which nfsd does not respect and it was never meant to be used for exporting those filesystems by nfsd. Update kernel-doc comments to express the fact the those methods are for open_by_handle(2) system only and not compatible with nfsd. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260129100212.49727-2-amir73il@gmail.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
pidfs and nsfs recently gained support for encode/decode of file handles via name_to_handle_at(2)/open_by_handle_at(2). These special kernel filesystems have custom ->open() and ->permission() export methods, which nfsd does not respect and it was never meant to be used for exporting those filesystems by nfsd. Therefore, do not allow nfsd to export filesystems with custom ->open() or ->permission() methods. Fixes: b3caba8 ("pidfs: implement file handle support") Fixes: 5222470 ("nsfs: support file handles") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260129100212.49727-3-amir73il@gmail.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Currently it is not possible to distinguish between the case where a process has already exited and the case where a process is in a different namespace, as both return -ESRCH. glibc's pidfd_getpid() procfs-based implementation returns -EREMOTE in the latter, so that distinguishing the two is possible, as the fdinfo in procfs will list '0' as the PID in that case: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/pidfd_getpid.c;h=860829cf07da2267484299ccb02861822c0d07b4;hb=HEAD#l121 Change the error code so that the kernel also returns -EREMOTE in that case. Fixes: 7477d7d ("pidfs: allow to retrieve exit information") Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com> Link: https://patch.msgid.link/20260127225209.2293342-1-luca.boccassi@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Calling convention has changed in ea38219 ("vfs: support caching symlink lengths in inodes") Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20260203130032.315177-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
Commit e1c5ae5 ("fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT") prevents the mount of any filesystem inside a container that doesn't have FS_USERNS_MOUNT set. This broke NFS mounts in our containerized environment. We have a daemon somewhat like systemd-mountfsd running in the init_ns. A process does a fsopen() inside the container and passes it to the daemon via unix socket. The daemon then vets that the request is for an allowed NFS server and performs the mount. This now fails because the fc->user_ns is set to the value in the container and NFS doesn't set FS_USERNS_MOUNT. We don't want to add FS_USERNS_MOUNT to NFS since that would allow the container to mount any NFS server (even malicious ones). Add a new FS_USERNS_DELEGATABLE flag, and enable it on NFS. Fixes: e1c5ae5 ("fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260129-twmount-v1-1-4874ed2a15c4@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
Instead of FS_USERNS_MOUNT we should use recently introduced FS_USERNS_DELEGATABLE cause it better expresses what we really want to get there. Filesystem should not be allowed to be mounted by an unprivileged user, but at the same time we want to have sb->s_user_ns to point to the container's user namespace, at the same time superblock can only be created if capable(CAP_SYS_ADMIN) check is successful. Tested and no regressions noticed. No functional change intended. Link: https://lore.kernel.org/linux-fsdevel/6dd181bf9f6371339a6c31f58f582a9aac3bc36a.camel@kernel.org [1] Fixes: 6fe01d3 ("bpf: Add BPF token delegation mount options to BPF FS") Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io> - RWB-tag from Jeff [1] Reviewed-by: Jeff Layton <jlayton@kernel.org>
254af9f to
25c770c
Compare
|
Automatically cleaning up stale PR; feel free to reopen if needed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test for https://lore.kernel.org/linux-fsdevel/20260205104541.171034-1-alexander@mihalicyn.com
Thanks to Daniel @borkmann for suggestion