summaryrefslogtreecommitdiff
path: root/include/linux/filelock.h
AgeCommit message (Collapse)Author
2026-02-09Merge tag 'vfs-7.0-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains a mix of VFS cleanups, performance improvements, API fixes, documentation, and a deprecation notice. Scalability and performance: - Rework pid allocation to only take pidmap_lock once instead of twice during alloc_pid(), improving thread creation/teardown throughput by 10-16% depending on false-sharing luck. Pad the namespace refcount to reduce false-sharing - Track file lock presence via a flag in ->i_opflags instead of reading ->i_flctx, avoiding false-sharing with ->i_readcount on open/close hot paths. Measured 4-16% improvement on 24-core open-in-a-loop benchmarks - Use a consume fence in locks_inode_context() to match the store-release/load-consume idiom, eliminating a hardware fence on some architectures - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent false-sharing - Remove a redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu() that never fires since the caller already verifies it, eliminating a 100% mispredicted branch - Fix a 100% mispredicted likely() in devcgroup_inode_permission() that became wrong after a prior code reorder Bug fixes and correctness: - Make insert_inode_locked() wait for inode destruction instead of skipping, fixing a corner case where two matching inodes could exist in the hash - Move f_mode initialization before file_ref_init() in alloc_file() to respect the SLAB_TYPESAFE_BY_RCU ordering contract - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with no buffers attached, preventing a null pointer dereference when AS_RELEASE_ALWAYS is set but no release_folio op exists - Fix select restart_block to store end_time as timespec64, avoiding truncation of tv_sec on 32-bit architectures - Make dump_inode() use get_kernel_nofault() to safely access inode and superblock fields, matching the dump_mapping() pattern API modernization: - Make posix_acl_to_xattr() allocate the buffer internally since every single caller was doing it anyway. Reduces boilerplate and unnecessary error checking across ~15 filesystems - Replace deprecated simple_strtoul() with kstrtoul() for the ihash_entries, dhash_entries, mhash_entries, and mphash_entries boot parameters, adding proper error handling - Convert chardev code to use guard(mutex) and __free(kfree) cleanup patterns - Replace min_t() with min() or umin() in VFS code to avoid silently truncating unsigned long to unsigned int - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers already check the flag Deprecation: - Begin deprecating legacy BSD process accounting (acct(2)). The interface has numerous footguns and better alternatives exist (eBPF) Documentation: - Fix and complete kernel-doc for struct export_operations, removing duplicated documentation between ReST and source - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait() Testing: - Add a kunit test for initramfs cpio handling of entries with filesize > PATH_MAX Misc: - Add missing <linux/init_task.h> include in fs_struct.c" * tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) posix_acl: make posix_acl_to_xattr() alloc the buffer fs: make insert_inode_locked() wait for inode destruction initramfs_test: kunit test for cpio.filesize > PATH_MAX fs: improve dump_inode() to safely access inode fields fs: add <linux/init_task.h> for 'init_fs' docs: exportfs: Use source code struct documentation fs: move initializing f_mode before file_ref_init() exportfs: Complete kernel-doc for struct export_operations exportfs: Mark struct export_operations functions at kernel-doc exportfs: Fix kernel-doc output for get_name() acct(2): begin the deprecation of legacy BSD process accounting device_cgroup: remove branch hint after code refactor VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS select: store end_time as timespec64 in restart block chardev: Switch to guard(mutex) and __free(kfree) namespace: Replace simple_strtoul with kstrtoul to parse boot params dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries ...
2025-12-15filelock: allow lease_managers to dictate what qualifies as a conflictJeff Layton
Requesting a delegation on a file from the userland fcntl() interface currently succeeds when there are conflicting opens present. This is because the lease handling code ignores conflicting opens for FL_LAYOUT and FL_DELEG leases. This was a hack put in place long ago, because nfsd already checks for conflicts in its own way. The kernel needs to perform this check for userland delegations the same way it is done for leases, however. Make this dependent on the lease_manager by adding a new ->lm_open_conflict() lease_manager operation and have generic_add_lease() call that instead of check_conflicting_open(). Morph check_conflicting_open() into a ->lm_open_conflict() op that is only called for userland leases/delegations. Set the ->lm_open_conflict() operations for nfsd to trivial functions that always return 0. Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251204-dir-deleg-ro-v2-2-22d37f92ce2c@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15fs: track the inode having file locks with a flag in ->i_opflagsMateusz Guzik
Opening and closing an inode dirties the ->i_readcount field. Depending on the alignment of the inode, it may happen to false-share with other fields loaded both for both operations to various extent. This notably concerns the ->i_flctx field. Since most inodes don't have the field populated, this bit can be managed with a flag in ->i_opflags instead which bypasses the problem. Here are results I obtained while opening a file read-only in a loop with 24 cores doing the work on Sapphire Rapids. Utilizing the flag as opposed to reading ->i_flctx field was toggled at runtime as the benchmark was running, to make sure both results come from the same alignment. before: 3233740 after: 3373346 (+4%) before: 3284313 after: 3518711 (+7%) before: 3505545 after: 4092806 (+16%) Or to put it differently, this varies wildly depending on how (un)lucky you get. The primary bottleneck before and after is the avoidable lockref trip in do_dentry_open(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251203094837.290654-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-15filelock: use a consume fence in locks_inode_context()Mateusz Guzik
Matches the idiom of storing a pointer with a release fence and safely getting the content with a consume fence after. Eliminates an actual fence on some archs. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251203094837.290654-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12vfs: expose delegation support to userlandJeff Layton
Now that support for recallable directory delegations is available, expose this functionality to userland with new F_SETDELEG and F_GETDELEG commands for fcntl(). Note that this also allows userland to request a FL_DELEG type lease on files too. Userland applications that do will get signalled when there are metadata changes in addition to just data changes (which is a limitation of FL_LEASE leases). These commands accept a new "struct delegation" argument that contains a flags field for future expansion. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-17-52f3feebb2f2@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12filelock: add struct delegated_inodeJeff Layton
The current API requires a pointer to an inode pointer. It's easy for callers to get this wrong. Add a new delegated_inode structure and use that to pass back any inode that needs to be waited on. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-3-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-12filelock: rework the __break_lease API to use flagsJeff Layton
Currently __break_lease takes both a type and an openmode. With the addition of directory leases, that makes less sense. Declare a set of LEASE_BREAK_* flags that can be used to control how lease breaks work instead of requiring a type and an openmode. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-2-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-10filelock: add new locks_wake_up_waiter() helperJeff Layton
Currently the function that does this takes a struct file_lock, but __locks_wake_up_blocks() deals with both locks and leases. Currently this works because both file_lock and file_lease have the file_lock_core at the beginning of the struct, but it's fragile to rely on that. Add a new locks_wake_up_waiter() function and call that from __locks_wake_up_blocks(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/20250602-filelock-6-16-v1-1-7da5b2c930fd@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-02Merge patch series "Fixup NLM and kNFSD file lock callbacks"Christian Brauner
Benjamin Coddington <bcodding@redhat.com> says: Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients. This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock, but now that I've noticed it I can't help but try to fix it because there are big advantages for setups that might depend on timely lock notifications, and we've supported that as a feature for a long time. Its important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation. However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS. I tried to fix this by simply including the old check for lock(), but the resulting include mess and layering violations was more than I could accept. There's a much cleaner way presented here using an fop_flag, which while potentially flag-greedy, greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike. * patches from https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com: exportfs: Remove EXPORT_OP_ASYNC_LOCK NLM/NFSD: Fix lock notifications for async-capable filesystems gfs2/ocfs2: set FOP_ASYNC_LOCK fs: Introduce FOP_ASYNC_LOCK NFS: trace: show TIMEDOUT instead of 0x6e nfsd: use system_unbound_wq for nfsd_file_gc_worker() nfsd: count nfsd_file allocations nfsd: fix refcount leak when file is unhashed after being found nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire nfsd: add list_head nf_gc to struct nfsd_file Link: https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12fs: Introduce FOP_ASYNC_LOCKBenjamin Coddington
Some lock managers (NLM, kNFSD) fastidiously avoid blocking their kernel threads while servicing blocking locks. If a filesystem supports asynchronous lock requests those lock managers can use notifications to quickly inform clients they have acquired a file lock. Historically, only posix_lock_file() was capable of supporting asynchronous locks so the check for support was simply file_operations->lock(), but with recent changes in DLM, both GFS2 and OCFS2 also support asynchronous locks and have started signalling their support with EXPORT_OP_ASYNC_LOCK. We recently noticed that those changes dropped the checks for whether a filesystem simply defaults to posix_lock_file(), so async lock notifications have not been attempted for NLM and NFSv4.1+ for most filesystems. While trying to fix this it has become clear that testing both the export flag combined with testing ->lock() creates quite a layering mess. It seems appropriate to signal support with a fop_flag. Add FOP_ASYNC_LOCK so that filesystems with ->lock() can signal their capability to handle lock requests asynchronously. Add a helper for lock managers to properly test that support. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/3330d5a324abe2ce9c1dafe89cacdc6db41945d1.1726083391.git.bcodding@redhat.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30vfs: dodge smp_mb in break_lease and break_deleg in the common caseMateusz Guzik
These inlines show up in the fast path (e.g., in do_dentry_open()) and induce said full barrier regarding i_flctx access when in most cases the pointer is NULL. The pointer can be safely checked before issuing the barrier, dodging it in most cases as a result. It is plausible the consume fence would be sufficient, but I don't want to go audit all callers regarding what they before calling here. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240806172846.886570-1-mjguzik@gmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-12filelock: always define for_each_file_lock()Jeff Layton
...and eliminate the stub version when CONFIG_FILE_LOCKING is disabled. This silences the following warning that crept in recently: fs/ceph/locks.c: In function 'ceph_count_locks': fs/ceph/locks.c:380:27: error: unused variable 'lock' [-Werror=unused-variable] 380 | struct file_lock *lock; Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402062210.3YyBVGF1-lkp@intel.com/ Fixes: 75cabec0111b ("filelock: add some new helper functions") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240212-flsplit3-v1-1-019f0ad6bf69@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: don't do security checks on nfsd setlease callsJeff Layton
Zdenek reported seeing some AVC denials due to nfsd trying to set delegations: type=AVC msg=audit(09.11.2023 09:03:46.411:496) : avc: denied { lease } for pid=5127 comm=rpc.nfsd capability=lease scontext=system_u:system_r:nfsd_t:s0 tcontext=system_u:system_r:nfsd_t:s0 tclass=capability permissive=0 When setting delegations on behalf of nfsd, we don't want to do all of the normal capabilty and LSM checks. nfsd is a kernel thread and runs with CAP_LEASE set, so the uid checks end up being a no-op in most cases anyway. Some nfsd functions can end up running in normal process context when tearing down the server. At that point, the CAP_LEASE check can fail and cause the client to not tear down delegations when expected. Also, the way the per-fs ->setlease handlers work today is a little convoluted. The non-trivial ones are wrappers around generic_setlease, so when they fail due to permission problems they usually they end up doing a little extra work only to determine that they can't set the lease anyway. It would be more efficient to do those checks earlier. Transplant the permission checking from generic_setlease to vfs_setlease, which will make the permission checking happen earlier on filesystems that have a ->setlease operation. Add a new kernel_setlease function that bypasses these checks, and switch nfsd to use that instead of vfs_setlease. There is one behavioral change here: prior this patch the setlease_notifier would fire even if the lease attempt was going to fail the security checks later. With this change, it doesn't fire until the caller has passed them. I think this is a desirable change overall. nfsd is the only user of the setlease_notifier and it doesn't benefit from being notified about failed attempts. Cc: Ondrej Mosnáček <omosnacek@gmail.com> Reported-by: Zdenek Pytela <zpytela@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2248830 Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240205-bz2248830-v1-1-d0ec0daecba1@kernel.org Acked-by: Tom Talpey <tom@talpey.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: split leases out of struct file_lockJeff Layton
Add a new struct file_lease and move the lease-specific fields from struct file_lock to it. Convert the appropriate API calls to take struct file_lease instead, and convert the callers to use them. There is zero overlap between the lock manager operations for file locks and the ones for file leases, so split the lease-related operations off into a new lease_manager_operations struct. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-47-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: remove temporary compatibility macrosJeff Layton
Everything has been converted to access fl_core fields directly, so we can now drop these. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-46-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert fl_blocker to file_lock_coreJeff Layton
Both locks and leases deal with fl_blocker. Switch the fl_blocker pointer in struct file_lock_core to point to the file_lock_core of the blocker instead of a file_lock structure. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-26-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: split common fields into struct file_lock_coreJeff Layton
In a future patch, we're going to split file leases into their own structure. Since a lot of the underlying machinery uses the same fields move those into a new file_lock_core, and embed that inside struct file_lock. For now, add some macros to ensure that we can continue to build while the conversion is in progress. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: add some new helper functionsJeff Layton
In later patches we're going to embed some common fields into a new structure inside struct file_lock. Smooth the transition by adding some new helper functions, and converting the core file locking code to use them. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-4-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-02filelock: fl_pid field should be signed intJeff Layton
This field has been unsigned for a very long time, but most users of the struct file_lock and the file locking internals themselves treat it as a signed value. Change it to be pid_t (which is a signed int). Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-1-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-10fs: Pass argument to fcntl_setlease as intLuca Vizzarro
The interface for fcntl expects the argument passed for the command F_SETLEASE to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: David Laight <David.Laight@ACULAB.com> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-nfs@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com> Message-Id: <20230414152459.816046-3-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-01-11fs: remove locks_inodeJeff Layton
locks_inode was turned into a wrapper around file_inode in de2a4a501e71 (Partially revert "locks: fix file locking on overlayfs"). Finish replacing locks_inode invocations everywhere with file_inode. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-11filelock: move file locking definitions to separate header fileJeff Layton
The file locking definitions have lived in fs.h since the dawn of time, but they are only used by a small subset of the source files that include it. Move the file locking definitions to a new header file, and add the appropriate #include directives to the source files that need them. By doing this we trim down fs.h a bit and limit the amount of rebuilding that has to be done when we make changes to the file locking APIs. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Steve French <stfrench@microsoft.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>