summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2026-01-27wifi: nl80211/cfg80211: add new FTM capabilitiesAvraham Stern
Add new capabilities to the PMSR FTM capabilities list. The new capabilities include 6 GHz support, supported number of spatial streams and supported number of LTF repetitions. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Tested-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260111190221.bf43785c18f6.Ic98cf9790ddee84bf88e5720b93c46c23af3c96c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-26mm/mempolicy: fix mpol_rebind_nodemask() for MPOL_F_NUMA_BALANCINGJinjiang Tu
commit bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") adds new flag MPOL_F_NUMA_BALANCING to enable NUMA balancing for MPOL_BIND memory policy. When the cpuset of tasks changes, the mempolicy of the task is rebound by mpol_rebind_nodemask(). When MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES are both not set, the behaviour of rebinding should be same whenever MPOL_F_NUMA_BALANCING is set or not. So, when an application calls set_mempolicy() with MPOL_F_NUMA_BALANCING set but both MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES cleared, mempolicy.w.cpuset_mems_allowed should be set to cpuset_current_mems_allowed nodemask. However, in current implementation, mpol_store_user_nodemask() wrongly returns true, causing mempolicy->w.user_nodemask to be incorrectly set to the user-specified nodemask. Later, when the cpuset of the application changes, mpol_rebind_nodemask() ends up rebinding based on the user-specified nodemask rather than the cpuset_mems_allowed nodemask as intended. I can reproduce with the following steps in qemu with 4 NUMA nodes: 1. echo '+cpuset' > /sys/fs/cgroup/cgroup.subtree_control 2. mkdir /sys/fs/cgroup/test 3. ./reproducer & 4. cat /proc/$pid/numa_maps, the task is bound to NUMA 1 5. echo $pid > /sys/fs/cgroup/test/cgroup.procs 6. cat /proc/$pid/numa_maps, the task is bound to NUMA 0 now. The reproducer code: int main() { struct bitmask *bmp; int ret; bmp = numa_parse_nodestring("1"); ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING, bmp->maskp, bmp->size + 1); if (ret < 0) { perror("Failed to call set_mempolicy"); exit(-1); } while (1); return 0; } If I call set_mempolicy() without MPOL_F_NUMA_BALANCING in the reproducer code. After step 5, the task is still bound to NUMA 1. To fix this, only set mempolicy->w.user_nodemask to the user-specified nodemask if MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES is present. Link: https://lkml.kernel.org/r/20260120011018.1256654-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20251223110523.1161421-1-tujinjiang@huawei.com Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Mel Gorman <mgorman <mgorman@suse.de> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26ipc/shm: uapi: remove dependency on libcThomas Weißschuh
Using libc types and headers from the UAPI headers is problematic as it introduces a dependency on a full C toolchain. shm.h does not even use any symbols from the libc header as the usage of getpagesize() was removed a decade ago in commit 060028bac94b ("ipc/shm.c: increase the defaults for SHMALL, SHMMAX") Drop the unnecessary inclusion. Link: https://lkml.kernel.org/r/20251222-uapi-shm-v1-1-270bb7f75d97@linutronix.de Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26NFS: NFSERR_INVAL is not defined by NFSv2Chuck Lever
A documenting comment in include/uapi/linux/nfs.h claims incorrectly that NFSv2 defines NFSERR_INVAL. There is no such definition in either RFC 1094 or https://pubs.opengroup.org/onlinepubs/9629799/chap7.htm NFS3ERR_INVAL is introduced in RFC 1813. NFSD returns NFSERR_INVAL for PROC_GETACL, which has no specification (yet). However, nfsd_map_status() maps nfserr_symlink and nfserr_wrong_type to nfserr_inval, which does not align with RFC 1094. This logic was introduced only recently by commit 438f81e0e92a ("nfsd: move error choice for incorrect object types to version-specific code."). Given that we have no INVAL or SERVERFAULT status in NFSv2, probably the only choice is NFSERR_IO. Fixes: 438f81e0e92a ("nfsd: move error choice for incorrect object types to version-specific code.") Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26Merge 6.19-rc7 into char-misc-nextGreg Kroah-Hartman
We need the char/misc/iio fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-25Merge tag 'char-misc-6.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc/iio driver fixes from Greg KH: "Here are some small char/misc/iio and some other minor driver subsystem fixes for 6.19-rc7. Nothing huge here, just some fixes for reported issues including: - lots of little iio driver fixes - comedi driver fixes - mux driver fix - w1 driver fixes - uio driver fix - slimbus driver fixes - hwtracing bugfix - other tiny bugfixes All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (36 commits) comedi: dmm32at: serialize use of paged registers mei: trace: treat reg parameter as string uio: pci_sva: correct '-ENODEV' check logic uacce: ensure safe queue release with state management uacce: implement mremap in uacce_vm_ops to return -EPERM uacce: fix isolate sysfs check condition uacce: fix cdev handling in the cleanup path slimbus: core: clean up of_slim_get_device() slimbus: core: fix of_slim_get_device() kernel doc slimbus: core: amend slim_get_device() kernel doc slimbus: core: fix device reference leak on report present slimbus: core: fix runtime PM imbalance on report present slimbus: core: fix OF node leak on registration failure intel_th: rename error label intel_th: fix device leak on output open() comedi: Fix getting range information for subdevices 16 to 255 mux: mmio: Fix IS_ERR() vs NULL check in probe() interconnect: debugfs: initialize src_node and dst_node to empty strings iio: dac: ad3552r-hs: fix out-of-bound write in ad3552r_hs_write_data_source iio: accel: iis328dq: fix gain values ...
2026-01-24bpf: add fsession supportMenglong Dong
The fsession is something that similar to kprobe session. It allow to attach a single BPF program to both the entry and the exit of the target functions. Introduce the struct bpf_fsession_link, which allows to add the link to both the fentry and fexit progs_hlist of the trampoline. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24io_uring/zcrx: implement large rx buffer supportPavel Begunkov
There are network cards that support receive buffers larger than 4K, and that can be vastly beneficial for performance, and benchmarks for this patch showed up to 30% CPU util improvement for 32K vs 4K buffers. Allows zcrx users to specify the size in struct io_uring_zcrx_ifq_reg::rx_buf_len. If set to zero, zcrx will use a default value. zcrx will check and fail if the memory backing the area can't be split into physically contiguous chunks of the required size. It's more restrictive as it only needs dma addresses to be contig, but that's beyond this series. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: kill duplicate netdev_queues.h include] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-23Merge tag 'block-6.19-20260122' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - A set of selftest fixes for ublk - Fix for a pid mismatch in ublk, comparing PIDs in different namespaces if run inside a namespace - Fix for a regression added in this release with polling, where the nvme tcp connect code would spin forever - Zoned device error path fix - Tweak the blkzoned uapi additions from this kernel release, making them more easily discoverable - Fix for a regression in bcache with bio endio handling added in this release * tag 'block-6.19-20260122' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: bcache: use bio cloning for detached device requests blk-mq: use BLK_POLL_ONESHOT for synchronous poll completion selftests/ublk: fix garbage output in foreground mode selftests/ublk: fix error handling for starting device selftests/ublk: fix IO thread idle check block: make the new blkzoned UAPI constants discoverable ublk: fix ublksrv pid handling for pid namespaces block: Fix an error path in disk_update_zone_resources()
2026-01-23geneve: add netlink support for GRO hintPaolo Abeni
Allow configuring and dumping the new device option, and cache its value into the geneve socket itself. The new option is not tie to it any code yet. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/2295d4e4d1e919a3189425141bbc71c7850a2de0.1769011015.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23KVM: Introduce KVM_EXIT_SNP_REQ_CERTS for SNP certificate-fetchingMichael Roth
For SEV-SNP, the host can optionally provide a certificate table to the guest when it issues an attestation request to firmware (see GHCB 2.0 specification regarding "SNP Extended Guest Requests"). This certificate table can then be used to verify the endorsement key used by firmware to sign the attestation report. While it is possible for guests to obtain the certificates through other means, handling it via the host provides more flexibility in being able to keep the certificate data in sync with the endorsement key throughout host-side operations that might resulting in the endorsement key changing. In the case of KVM, userspace will be responsible for fetching the certificate table and keeping it in sync with any modifications to the endorsement key by other userspace management tools. Define a new KVM_EXIT_SNP_REQ_CERTS event where userspace is provided with the GPA of the buffer the guest has provided as part of the attestation request so that userspace can write the certificate data into it while relying on filesystem-based locking to keep the certificates up-to-date relative to the endorsement keys installed/utilized by firmware at the time the certificates are fetched. [Melody: Update the documentation scheme about how file locking is expected to happen.] Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Dionna Glaze <dionnaglaze@google.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Melody Wang <huibo.wang@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Link: https://patch.msgid.link/20260109231732.1160759-2-michael.roth@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-23Merge branch arm64/for-next/cpufeature into kvmarm-master/nextMarc Zyngier
Merge arm64/for-next/cpufeature in to resolve conflicts resulting from the removal of CONFIG_PAN. * arm64/for-next/cpufeature: arm64: Add support for FEAT_{LS64, LS64_V} KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1 KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots arm64: Unconditionally enable PAN support arm64: Unconditionally enable LSE support arm64: Add support for TSV110 Spectre-BHB mitigation Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.19-rc7). Conflicts: drivers/net/ethernet/huawei/hinic3/hinic3_irq.c b35a6fd37a00 ("hinic3: Add adaptive IRQ coalescing with DIM") fb2bb2a1ebf7 ("hinic3: Fix netif_queue_set_napi queue_index input parameter error") https://lore.kernel.org/fc0a7fdf08789a52653e8ad05281a0a849e79206.1768915707.git.zhuyikai1@h-partners.com drivers/net/wireless/ath/ath12k/mac.c drivers/net/wireless/ath/ath12k/wifi7/hw.c 31707572108d ("wifi: ath12k: Fix wrong P2P device link id issue") c26f294fef2a ("wifi: ath12k: Move ieee80211_ops callback to the arch specific module") https://lore.kernel.org/20260114123751.6a208818@canb.auug.org.au Adjacent changes: drivers/net/wireless/ath/ath12k/mac.c 8b8d6ee53dfd ("wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel") 914c890d3b90 ("wifi: ath12k: Add framework for hardware specific ieee80211_ops registration") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-22ublk: add new feature UBLK_F_BATCH_IOMing Lei
Add new feature UBLK_F_BATCH_IO which replaces the following two per-io commands: - UBLK_U_IO_FETCH_REQ - UBLK_U_IO_COMMIT_AND_FETCH_REQ with three per-queue batch io uring_cmd: - UBLK_U_IO_PREP_IO_CMDS - UBLK_U_IO_COMMIT_IO_CMDS - UBLK_U_IO_FETCH_IO_CMDS Then ublk can deliver batch io commands to ublk server in single multishort uring_cmd, also allows to prepare & commit multiple commands in batch style via single uring_cmd, communication cost is reduced a lot. This feature also doesn't limit task context any more for all supported commands, so any allowed uring_cmd can be issued in any task context. ublk server implementation becomes much easier. Meantime load balance becomes much easier to support with this feature. The command `UBLK_U_IO_FETCH_IO_CMDS` can be issued from multiple task contexts, so each task can adjust this command's buffer length or number of inflight commands for controlling how much load is handled by current task. Later, priority parameter will be added to command `UBLK_U_IO_FETCH_IO_CMDS` for improving load balance support. UBLK_U_IO_NEED_GET_DATA isn't supported in batch io yet, but it may be enabled in future via its batch pair. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22ublk: add UBLK_U_IO_FETCH_IO_CMDS for batch I/O processingMing Lei
Add UBLK_U_IO_FETCH_IO_CMDS command to enable efficient batch processing of I/O requests. This multishot uring_cmd allows the ublk server to fetch multiple I/O commands in a single operation, significantly reducing submission overhead compared to individual FETCH_REQ* commands. Key Design Features: 1. Multishot Operation: One UBLK_U_IO_FETCH_IO_CMDS can fetch many I/O commands, with the batch size limited by the provided buffer length. 2. Dynamic Load Balancing: Multiple fetch commands can be submitted simultaneously, but only one is active at any time. This enables efficient load distribution across multiple server task contexts. 3. Implicit State Management: The implementation uses three key variables to track state: - evts_fifo: Queue of request tags awaiting processing - fcmd_head: List of available fetch commands - active_fcmd: Currently active fetch command (NULL = none active) States are derived implicitly: - IDLE: No fetch commands available - READY: Fetch commands available, none active - ACTIVE: One fetch command processing events 4. Lockless Reader Optimization: The active fetch command can read from evts_fifo without locking (single reader guarantee), while writers (ublk_queue_rq/ublk_queue_rqs) use evts_lock protection. The memory barrier pairing plays key role for the single lockless reader optimization. Implementation Details: - ublk_queue_rq() and ublk_queue_rqs() save request tags to evts_fifo - __ublk_acquire_fcmd() selects an available fetch command when events arrive and no command is currently active - ublk_batch_dispatch() moves tags from evts_fifo to the fetch command's buffer and posts completion via io_uring_mshot_cmd_post_cqe() - State transitions are coordinated via evts_lock to maintain consistency Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22ublk: handle UBLK_U_IO_COMMIT_IO_CMDSMing Lei
Handle UBLK_U_IO_COMMIT_IO_CMDS by walking the uring_cmd fixed buffer: - read each element into one temp buffer in batch style - parse and apply each element for committing io result Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22ublk: handle UBLK_U_IO_PREP_IO_CMDSMing Lei
This commit implements the handling of the UBLK_U_IO_PREP_IO_CMDS command, which allows userspace to prepare a batch of I/O requests. The core of this change is the `ublk_walk_cmd_buf` function, which iterates over the elements in the uring_cmd fixed buffer. For each element, it parses the I/O details, finds the corresponding `ublk_io` structure, and prepares it for future dispatch. Add per-io lock for protecting concurrent delivery and committing. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22ublk: add new batch command UBLK_U_IO_PREP_IO_CMDS & UBLK_U_IO_COMMIT_IO_CMDSMing Lei
Add new command UBLK_U_IO_PREP_IO_CMDS, which is the batch version of UBLK_IO_FETCH_REQ. Add new command UBLK_U_IO_COMMIT_IO_CMDS, which is for committing io command result only, still the batch version. The new command header type is `struct ublk_batch_io`. This patch doesn't actually implement these commands yet, just validates the SQE fields. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22io_uring: introduce non-circular SQPavel Begunkov
Outside of SQPOLL, normally SQ entries are consumed by the time the submission syscall returns. For those cases we don't need a circular buffer and the head/tail tracking, instead the kernel can assume that entries always start from the beginning of the SQ at index 0. This patch introduces a setup flag doing exactly that. It's a simpler and helps to keeps SQEs hot in cache. The feature is optional and enabled by setting IORING_SETUP_SQ_REWIND. The flag is rejected if passed together with SQPOLL as it'd require waiting for SQ before each submission. It also requires IORING_SETUP_NO_SQARRAY, which can be supported but it's unlikely there will be users, so leave more space for future optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-22PCI: Introduce pcie_is_cxl()Terry Bowman
CXL is a protocol that runs on top of PCIe electricals. Its error model also runs on top of the PCIe AER error model by standardizing "internal" errors as "CXL" errors. Linux has historically ignored internal errors. CXL protocol error handling is then a task of enhancing the PCIe AER core to understand that PCIe ports (upstream and downstream) and endpoints may throw internal errors that represent standard CXL protocol errors. The proposed method to make that determination is to teach 'struct pci_dev' to cache when its link has trained the CXL.mem and/or CXL.cache protocols and then treat all internal errors as CXL errors. A design goal is to not burden the PCIe AER core with CXL knowledge beyond just enough to forward error notifications to the CXL RAS core. The forwarded notification looks up a 'struct cxl_port' or 'struct cxl_dport' companion device to the PCI device. Introduce set_pcie_cxl() with logic checking for CXL.mem or CXL.cache status in the CXL Flex Bus DVSEC status register. The CXL Flex Bus DVSEC presence is used because it is required for all the CXL PCIe devices.[1] [1] CXL 3.1 Spec, 8.1.1 PCIe Designated Vendor-Specific Extended Capability (DVSEC) ID Assignment, Table 8-2 Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260114182055.46029-4-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-22PCI: Update CXL DVSEC definitionsTerry Bowman
CXL DVSEC definitions were recently moved into uapi/pci_regs.h, but the newly added macros do not follow the file's existing naming conventions. The current format uses CXL_DVSEC_XYZ, while the new CXL entries must instead use the PCI_DVSEC_CXL_XYZ prefix to match the conventions already established in pci_regs.h. The new CXL DVSEC macros also introduce _MASK and _OFFSET suffixes, which are not used anywhere else in the file. These suffixes lengthen the identifiers and reduce readability. Remove _MASK and _OFFSET from the recently added definitions. Additionally, remove PCI_DVSEC_HEADER1_LENGTH, as it duplicates the existing PCI_DVSEC_HEADER1_LEN() macro. Update all existing references to use the new macro names. Finally, update the inline documentation to reference the latest revision of the CXL specification. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260114182055.46029-3-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-22PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.hTerry Bowman
The CXL DVSECs are currently defined in cxl/core/cxlpci.h. These are not accessible to other subsystems. Move these to uapi/linux/pci_regs.h. The CXL DVSEC definitions will be renamed and reformatted to fit better with existing defines. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260114182055.46029-2-terry.bowman@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-22Merge tag 'net-6.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from CAN and wireless. Pretty big, but hard to make up any cohesive story that would explain it, a random collection of fixes. The two reverts of bad patches from this release here feel like stuff that'd normally show up by rc5 or rc6. Perhaps obvious thing to say, given the holiday timing. That said, no active investigations / regressions. Let's see what the next week brings. Current release - fix to a fix: - can: alloc_candev_mqs(): add missing default CAN capabilities Current release - regressions: - usbnet: fix crash due to missing BQL accounting after resume - Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not ... Previous releases - regressions: - Revert "nfc/nci: Add the inconsistency check between the input ... Previous releases - always broken: - number of driver fixes for incorrect use of seqlocks on stats - rxrpc: fix recvmsg() unconditional requeue, don't corrupt rcv queue when MSG_PEEK was set - ipvlan: make the addrs_lock be per port avoid races in the port hash table - sched: enforce that teql can only be used as root qdisc - virtio: coalesce only linear skb - wifi: ath12k: fix dead lock while flushing management frames - eth: igc: reduce TSN TX packet buffer from 7KB to 5KB per queue" * tag 'net-6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (96 commits) Octeontx2-af: Add proper checks for fwdata dpll: Prevent duplicate registrations net/sched: act_ife: avoid possible NULL deref hinic3: Fix netif_queue_set_napi queue_index input parameter error vsock/test: add stream TX credit bounds test vsock/virtio: cap TX credit to local buffer size vsock/test: fix seqpacket message bounds test vsock/virtio: fix potential underflow in virtio_transport_get_credit() net: fec: account for VLAN header in frame length calculations net: openvswitch: fix data race in ovs_vport_get_upcall_stats octeontx2-af: Fix error handling net: pcs: pcs-mtk-lynxi: report in-band capability for 2500Base-X rxrpc: Fix data-race warning and potential load/store tearing net: dsa: fix off-by-one in maximum bridge ID determination net: bcmasp: Fix network filter wake for asp-3.0 bonding: provide a net pointer to __skb_flow_dissect() selftests: net: amt: wait longer for connection before sending packets be2net: Fix NULL pointer dereference in be_cmd_get_mac_from_list Revert "net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning" netrom: fix double-free in nr_route_frame() ...
2026-01-22Merge tag 'wireless-2026-11-22' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Another set of updates: - various small fixes for ath10k/ath12k/mwifiex/rsi - cfg80211 fix for HE bitrate overflow - mac80211 fixes - S1G beacon handling in scan - skb tailroom handling for HW encryption - CSA fix for multi-link - handling of disabled links during association * tag 'wireless-2026-11-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: cfg80211: ignore link disabled flag from userspace wifi: mac80211: apply advertised TTLM from association response wifi: mac80211: parse all TTLM entries wifi: mac80211: don't increment crypto_tx_tailroom_needed_cnt twice wifi: mac80211: don't perform DA check on S1G beacon wifi: ath12k: Fix wrong P2P device link id issue wifi: ath12k: fix dead lock while flushing management frames wifi: ath12k: Fix scan state stuck in ABORTING after cancel_remain_on_channel wifi: ath12k: cancel scan only on active scan vdev wifi: mwifiex: Fix a loop in mwifiex_update_ampdu_rxwinsize() wifi: mac80211: correctly check if CSA is active wifi: cfg80211: Fix bitrate calculation overflow for HE rates wifi: rsi: Fix memory corruption due to not set vif driver data size wifi: ath12k: don't force radio frequency check in freq_to_idx() wifi: ath12k: fix dma_free_coherent() pointer wifi: ath10k: fix dma_free_coherent() pointer ==================== Link: https://patch.msgid.link/20260122110248.15450-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-22KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslotsMarc Zyngier
The main use of {LD,ST}64B* is to talk to a device, which is hopefully directly assigned to the guest and requires no additional handling. However, this does not preclude a VMM from exposing a virtual device to the guest, and to allow 64 byte accesses as part of the programming interface. A direct consequence of this is that we need to be able to forward such access to userspace. Given that such a contraption is very unlikely to ever exist, we choose to offer a limited service: userspace gets (as part of a new exit reason) the ESR, the IPA, and that's it. It is fully expected to handle the full semantics of the instructions, deal with ACCDATA, the return values and increment PC. Much fun. A canonical implementation can also simply inject an abort and be done with it. Frankly, don't try to do anything else unless you have time to waste. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22rseq: Allow registering RSEQ with slice extensionPeter Zijlstra
Since glibc cares about the number of syscalls required to initialize a new thread, allow initializing rseq with slice extension on. This avoids having to do another prctl(). Requested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260121143207.814193010@infradead.org
2026-01-22rseq: Add prctl() to enable time slice extensionsThomas Gleixner
Implement a prctl() so that tasks can enable the time slice extension mechanism. This fails, when time slice extensions are disabled at compile time or on the kernel command line and when no rseq pointer is registered in the kernel. That allows to implement a single trivial check in the exit to user mode hotpath, to decide whether the whole mechanism needs to be invoked. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.858717691@linutronix.de
2026-01-22rseq: Add fields and constants for time slice extensionThomas Gleixner
Aside of a Kconfig knob add the following items: - Two flag bits for the rseq user space ABI, which allow user space to query the availability and enablement without a syscall. - A new member to the user space ABI struct rseq, which is going to be used to communicate request and grant between kernel and user space. - A rseq state struct to hold the kernel state of this - Documentation of the new mechanism Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.669472597@linutronix.de
2026-01-21block: make the new blkzoned UAPI constants discoverableChristoph Hellwig
The Linux 6.19 merge window added the new BLKREPORTZONESV2 ioctl, and with it the new BLK_ZONE_REP_CACHED and BLK_ZONE_COND_ACTIVE constants. The two constants are defined as part of enums, which makes it very painful for userspace to discover if they are present in the installed system headers. Use the #define to the same name trick to make them trivially discoverable using CPP directives. Fixes: 0bf0e2e46668 ("block: track zone conditions") Fixes: b30ffcdc0c15 ("block: introduce BLKREPORTZONESV2 ioctl") Reported-by: Andrey Albershteyn <aalbersh@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-21media: v4l2-ctrls: Add hevc_ext_sps_[ls]t_rps controlsDetlev Casanova
The vdpu381 decoder found on newer Rockchip SoC need the information from the long term and short term ref pic sets from the SPS. So far, it wasn't included in the v4l2 API, so add it with new dynamic sized controls. Each element of the hevc_ext_sps_lt_rps array contains the long term ref pic set at that index. Each element of the hevc_ext_sps_st_rps contains the short term ref pic set at that index, as the raw data. It is the role of the drivers to calculate the reference sets values. Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2026-01-20mm/block/fs: remove laptop_modeJohannes Weiner
Laptop mode was introduced to save battery, by delaying and consolidating writes and thereby maximize the time rotating hard drives wouldn't have to spin. Luckily, rotating hard drives, with their high spin-up times and power draw, are a thing of the past for battery-powered devices. Reclaim has also since changed to not write single filesystem pages anymore, and regular filesystem writeback is lumpy by design. The juice doesn't appear worth the squeeze anymore. The footprint of the feature is small, but nevertheless it's a complicating factor in mm, block, filesystems. Developers don't think about it, and it likely hasn't been tested with new reclaim and writeback changes in years. Let's sunset it. Keep the sysctl with a deprecation warning around for a few more cycles, but remove all functionality behind it. [akpm@linux-foundation.org: fix Documentation/admin-guide/laptops/index.rst] Link: https://lkml.kernel.org/r/20251216185201.GH905277@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'"Jakub Kicinski
This reverts commit 77b9c4a438fc66e2ab004c411056b3fb71a54f2c, reversing changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497: 931420a2fc36 ("selftests/net: Add netkit container tests") ab771c938d9a ("selftests/net: Make NetDrvContEnv support queue leasing") 6be87fbb2776 ("selftests/net: Add env for container based tests") 61d99ce3dfc2 ("selftests/net: Add bpf skb forwarding program") 920da3634194 ("netkit: Add xsk support for af_xdp applications") eef51113f8af ("netkit: Add netkit notifier to check for unregistering devices") b5ef109d22d4 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create") b5c3fa4a0b16 ("netkit: Add single device mode for netkit") 0073d2fd679d ("xsk: Proxy pool management for leased queues") 1ecea95dd3b5 ("xsk: Extend xsk_rcv_check validation") 804bf334d08a ("net: Proxy netdev_queue_get_dma_dev for leased queues") 0caa9a8ddec3 ("net: Proxy net_mp_{open,close}_rxq for leased queues") ff8889ff9107 ("net, ethtool: Disallow leased real rxqs to be resized") 9e2103f36110 ("net: Add lease info to queue-get response") 31127deddef4 ("net: Implement netdev_nl_queue_create_doit") a5546e18f77c ("net: Add queue-create operation") The series will conflict with io_uring work, and the code needs more polish. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20netkit: Add single device mode for netkitDaniel Borkmann
Add a single device mode for netkit instead of netkit pairs. The primary target for the paired devices is to connect network namespaces, of course, and support has been implemented in projects like Cilium [0]. For the rxq leasing the plan is to support two main scenarios related to single device mode: * For the use-case of io_uring zero-copy, the control plane can either set up a netkit pair where the peer device can perform rxq leasing which is then tied to the lifetime of the peer device, or the control plane can use a regular netkit pair to connect the hostns to a Pod/container and dynamically add/remove rxq leasing through a single device without having to interrupt the device pair. In the case of io_uring, the memory pool is used as skb non-linear pages, and thus the skb will go its way through the regular stack into netkit. Things like the netkit policy when no BPF is attached or skb scrubbing etc apply as-is in case the paired devices are used, or if the backend memory is tied to the single device and traffic goes through a paired device. * For the use-case of AF_XDP, the control plane needs to use netkit in the single device mode. The single device mode currently enforces only a pass policy when no BPF is attached, and does not yet support BPF link attachments for AF_XDP. skbs sent to that device get dropped at the moment. Given AF_XDP operates at a lower layer of the stack tying this to the netkit pair did not make sense. In future, the plan is to allow BPF at the XDP layer which can: i) process traffic coming from the AF_XDP application (e.g. QEMU with AF_XDP backend) to filter egress traffic or to push selected egress traffic up to the single netkit device to the local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the single netkit into the AF_XDP application (e.g. DHCP replies). Also, the control-plane can dynamically manage rxq leasing for the single netkit device without having to interrupt (e.g. down/up cycle) the main netkit pair for the Pod which has traffic going in and out. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jordan Rife <jordan@jrife.io> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode [0] Link: https://patch.msgid.link/20260115082603.219152-10-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20net: Add queue-create operationDaniel Borkmann
Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice: name: queue-create attribute-set: queue flags: [admin-perm] do: request: attributes: - ifindex - type - lease reply: &queue-create-op attributes: - id This is a generic operation such that it can be extended for various use cases in future. Right now it is mandatory to specify ifindex, the queue type which is enforced to rx and a lease. The newly created queue id is returned to the caller. A queue from a virtual device can have a lease which refers to another queue from a physical device. This is useful for memory providers and AF_XDP operations which take an ifindex and queue id to allow applications to bind against virtual devices in containers. The lease couples both queues together and allows to proxy the operations from a virtual device in a container to the physical device. In future, the nested lease attribute can be lifted and made optional for other use-cases such as dynamic queue creation for physical netdevs. The lack of lease and the specification of the physical device as an ifindex will imply that we need a real queue to be allocated. Similarly, the queue type enforcement to rx can then be lifted as well to support tx. An early implementation had only driver-specific integration [0], but in order for other virtual devices to reuse, it makes sense to have this as a generic API in core net. For leasing queues, the virtual netdev must have real_num_rx_queue less than num_rx_queues at the time of calling queue-create. The queue-type must be rx as only rx queues are supported for leasing for now. We also enforce that the queue-create ifindex must point to a virtual device, and that the nested lease attribute's ifindex must point to a physical device. The nested lease attribute set contains a netns-id attribute which is currently only intended for dumping as part of the queue-get operation. Also, it is modeled as an s32 type similarly as done elsewhere in the stack. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-20wifi: cfg80211: ignore link disabled flag from userspaceBenjamin Berg
When the AP has an advertised TID to Link Mapping (TTLM) it shall include the element in the association response. As such, when this element is present it needs to be used for the currently dormant links. See Draft P802.11REVmf_D1.0 section 35.3.7.2.3 ("Negotiation of TTLM") for the details. The flag is also not usable in case userspace wants to specify a negotiated TTLM during association. Note that for the link reconfiguration case, mac80211 did not use the information. Draft P802.11REVmf_D1.0 states in section 35.3.6.4 ("Link reconfiguration to the setup links) that we "shall operate with all the TIDs mapped to the newly added links ..." All this means that the flag is not needed. The implementation should parse the information from the association response. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260118093904.754e057896a5.Ifd06f5ef839a93bfd54d0593dc932870f95f3242@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-19net: ethtool: Add support for 80Gbps speedMika Westerberg
USB4 v2 link used in peer-to-peer networking is symmetric 80Gbps so in order to support reading this link speed, add support for it to ethtool. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260115115646.328898-3-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-19wifi: nl80211: ignore cluster id after NAN startedMiri Korenblit
After NAN was started, cluster id updates from the user space should not happen, since the device already started a cluster with the previousely provided id. Since NL80211_CMD_CHANGE_NAN_CONFIG requires to set the full NAN configuration, we can't require that NL80211_NAN_CONF_CLUSTER_ID won't be included in this command, and keeping the last confgiured value just to be able to compare it against the new one seems a bit overkill. Therefore, just ignore cluster id in this command and clarify the documentation. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260107142229.fb55e5853269.I10d18c8f69d98b28916596d6da4207c15ea4abb5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-01-18Merge tag 'landlock-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fixes from Mickaël Salaün: "This fixes TCP handling, tests, documentation, non-audit elided code, and minor cosmetic changes" * tag 'landlock-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Clarify documentation for the IOCTL access right selftests/landlock: Properly close a file descriptor landlock: Improve the comment for domain_is_scoped selftests/landlock: Use scoped_base_variants.h for ptrace_test selftests/landlock: Fix missing semicolon selftests/landlock: Fix typo in fs_test landlock: Optimize stack usage when !CONFIG_AUDIT landlock: Fix spelling landlock: Clean up hook_ptrace_access_check() landlock: Improve erratum documentation landlock: Remove useless include landlock: Fix wrong type usage selftests/landlock: NULL-terminate unix pathname addresses selftests/landlock: Remove invalid unix socket bind() selftests/landlock: Add missing connect(minimal AF_UNSPEC) test selftests/landlock: Fix TCP bind(AF_UNSPEC) test case landlock: Fix TCP handling of short AF_UNSPEC addresses landlock: Fix formatting
2026-01-18Merge tag 'ext4_for_linus-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: - Fix an inconsistency in structure size on 32-bit platforms caused by padding differences for the new EXT4_IOC_[GS]ET_TUNE_SB_PARAM ioctls - Fix a buffer leak on the error path when dropping the refcount an xattr value stored in an inode - Fix missing locking on the error path for the file defragmentation ioctl leading to a BUG * tag 'ext4_for_linus-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix iloc.bh leak in ext4_xattr_inode_update_ref ext4: add missing down_write_data_sem in mext_move_extent(). ext4: fix ext4_tune_sb_params padding
2026-01-18ext4: fix ext4_tune_sb_params paddingArnd Bergmann
The padding at the end of struct ext4_tune_sb_params is architecture specific and in particular is different between x86-32 and x86-64, since the __u64 member only enforces struct alignment on the latter. This shows up as a new warning when test-building the headers with -Wpadded: include/linux/ext4.h:144:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] All members inside the structure are naturally aligned, so the only difference here is the amount of padding at the end. Make the padding explicit, to have a consistent sizeof(struct ext4_tune_sb_params) of 232 on all architectures and avoid adding compat ioctl handling for EXT4_IOC_GET_TUNE_SB_PARAM/EXT4_IOC_SET_TUNE_SB_PARAM. This is an ABI break on x86-32 but hopefully this can go into 6.18.y early enough as a fixup so no actual users will be affected. Alternatively, the kernel could handle the ioctl commands for both sizes (232 and 228 bytes) on all architectures. Fixes: 04a91570ac67 ("ext4: implemet new ioctls to set and get superblock parameters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20251204101914.1037148-1-arnd@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-01-18iommufd: Introduce data struct for AMD nested domain allocationSuravee Suthikulpanit
Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table, which is used for stage-1 in nested translation. The data structure contains information necessary for setting up the AMD HW-vIOMMU support. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-18iommu/amd: Add support for hw_info for iommu capability querySuravee Suthikulpanit
AMD IOMMU Extended Feature (EFR) and Extended Feature 2 (EFR2) registers specify features supported by each IOMMU hardware instance. The IOMMU driver checks each feature-specific bits before enabling each feature at run time. For IOMMUFD, the hypervisor passes the raw value of amd_iommu_efr and amd_iommu_efr2 to VMM via iommufd IOMMU_DEVICE_GET_HW_INFO ioctl. Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-17netfilter: uapi: Use UAPI definition of INT_MAX and INT_MINThomas Weißschuh
Using <limits.h> to gain access to INT_MAX and INT_MIN introduces a dependency on a libc, which UAPI headers should not do. Use the equivalent UAPI constants. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-3-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17ethtool: uapi: Use UAPI definition of INT_MAXThomas Weißschuh
Using <limits.h> to gain access to INT_MAX introduces a dependency on a libc, which UAPI headers should not do. Use the equivalent UAPI constant. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-2-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17uapi: add INT_MAX and INT_MIN constantsThomas Weißschuh
Some UAPI headers use INT_MAX and INT_MIN. Currently they include <limits.h> for their definitions, which introduces a problematic dependency on libc. Add custom, namespaced definitions of INT_MAX and INT_MIN using the same values as the regular kernel code. These definitions are not added to uapi/linux/limits.h, as that header will conflict with libc definitions on some platforms. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260113-uapi-limits-v2-1-93c20f4b2c1a@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-17compiler_types.h: Attributes: Add __counted_by_ptr macroBill Wendling
Introduce __counted_by_ptr(), which works like __counted_by(), but for pointer struct members. struct foo { int a, b, c; char *buffer __counted_by_ptr(bytes); short nr_bars; struct bar *bars __counted_by_ptr(nr_bars); size_t bytes; }; Because "counted_by" can only be applied to pointer members in very recent compiler versions, its application ends up needing to be distinct from flexibe array "counted_by" annotations, hence a separate macro. This is a reworking of Kees' previous patch [1]. Link: https://lore.kernel.org/all/20251020220118.1226740-1-kees@kernel.org/ [1] Co-developed-by: Kees Cook <kees@kernel.org> Signed-off-by: Bill Wendling <morbo@google.com> Link: https://patch.msgid.link/20260116005838.2419118-1-morbo@google.com Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-16virt: vbox: uapi: Mark inner unions in packed structs as packedThomas Weißschuh
The unpacked unions within a packed struct generates alignment warnings on clang for 32-bit ARM: ./usr/include/linux/vbox_vmmdev_types.h:239:4: error: field u within 'struct vmmdev_hgcm_function_parameter32' is less aligned than 'union (unnamed union at ./usr/include/linux/vbox_vmmdev_types.h:223:2)' and is usually due to 'struct vmmdev_hgcm_function_parameter32' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] 239 | } u; | ^ ./usr/include/linux/vbox_vmmdev_types.h:254:6: error: field u within 'struct vmmdev_hgcm_function_parameter64::(anonymous union)::(unnamed at ./usr/include/linux/vbox_vmmdev_types.h:249:3)' is less aligned than 'union (unnamed union at ./usr/include/linux/vbox_vmmdev_types.h:251:4)' and is usually due to 'struct vmmdev_hgcm_function_parameter64::(anonymous union)::(unnamed at ./usr/include/linux/vbox_vmmdev_types.h:249:3)' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] With the recent changes to compile-test the UAPI headers in more cases, these warning in combination with CONFIG_WERROR breaks the build. Fix the warnings. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512140314.DzDxpIVn-lkp@intel.com/ Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/linux-kbuild/20260110-uapi-test-disable-headers-arm-clang-unaligned-access-v1-1-b7b0fa541daa@kernel.org/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-kbuild/29b2e736-d462-45b7-a0a9-85f8d8a3de56@app.fastmail.com/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Tested-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Nicolas Schier <nsc@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260115-kbuild-alignment-vbox-v1-2-076aed1623ff@linutronix.de Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-01-16hyper-v: Mark inner union in hv_kvp_exchg_msg_value as packedThomas Weißschuh
The unpacked union within a packed struct generates alignment warnings on clang for 32-bit ARM: ./usr/include/linux/hyperv.h:361:2: error: field within 'struct hv_kvp_exchg_msg_value' is less aligned than 'union hv_kvp_exchg_msg_value::(anonymous at ./usr/include/linux/hyperv.h:361:2)' and is usually due to 'struct hv_kvp_exchg_msg_value' being packed, which can lead to unaligned accesses [-Werror,-Wunaligned-access] 361 | union { | ^ With the recent changes to compile-test the UAPI headers in more cases, this warning in combination with CONFIG_WERROR breaks the build. Fix the warning. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512140314.DzDxpIVn-lkp@intel.com/ Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/linux-kbuild/20260110-uapi-test-disable-headers-arm-clang-unaligned-access-v1-1-b7b0fa541daa@kernel.org/ Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-kbuild/29b2e736-d462-45b7-a0a9-85f8d8a3de56@app.fastmail.com/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Wei Liu (Microsoft) <wei.liu@kernel.org> Tested-by: Nicolas Schier <nsc@kernel.org> Reviewed-by: Nicolas Schier <nsc@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260115-kbuild-alignment-vbox-v1-1-076aed1623ff@linutronix.de Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-01-16Merge tag 'pm-6.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an error path memory leak in the energy model management code, fix a kerneldoc comment in it, and fix and revamp the energy model YNL specification added recently along with the new energy model management netlink interface (that received feedback after being added): - Fix a memory leak in em_create_pd() error path (Malaya Kumar Rout) - Fix stale description of the cost field in struct em_perf_state to reflect the current code (Yaxiong Tian) - Fix and revamp the energy model YNL specification added recently along with the energy model netlink interface (Changwoo Min)" * tag 'pm-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: EM: Add dump to get-perf-domains in the EM YNL spec PM: EM: Change cpus' type from string to u64 array in the EM YNL spec PM: EM: Rename em.yaml to dev-energymodel.yaml PM: EM: Fix yamllint warnings in the EM YNL spec PM: EM: Fix memory leak in em_create_pd() error path PM: EM: Fix incorrect description of the cost field in struct em_perf_state
2026-01-16mount: add OPEN_TREE_NAMESPACEChristian Brauner
When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts. On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful. This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly. With a large mount table and a system where thousands or ten-thousands of containers are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore. Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs. The caller can setns() into that mount namespace and perform any additionally required setup such as move_mount() detached mounts in there. This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root(). A caller may for example choose to create an extremely minimal rootfs: fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); This will create a mount namespace where "wootwoot" has become the rootfs mounted on top of the real rootfs. The caller can now setns() into this new mount namespace and assemble additional mounts. This also works with user namespaces: unshare(CLONE_NEWUSER); fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); which creates a new mount namespace owned by the earlier created user namespace with "wootwoot" as the rootfs mounted on top of the real rootfs. Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-1-bfb24c7b061f@kernel.org Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>