summaryrefslogtreecommitdiff
path: root/lib/group_cpus.c
AgeCommit message (Collapse)Author
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12lib/group_cpus: handle const qualifier from clusters allocation typeKees Cook
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "const struct cpumask **", but the returned type, while matching, is not const qualified. To get them exactly matching, just use the dereferenced pointer for the sizeof(). Link: https://lkml.kernel.org/r/20260206222010.work.349-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org> Cc: Wangyang Guo <wangyang.guo@intel.com> Cc: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26lib/group_cpus: make group CPU cluster awareWangyang Guo
As CPU core counts increase, the number of NVMe IRQs may be smaller than the total number of CPUs. This forces multiple CPUs to share the same IRQ. If the IRQ affinity and the CPU's cluster do not align, a performance penalty can be observed on some platforms. This patch improves IRQ affinity by grouping CPUs by cluster within each NUMA domain, ensuring better locality between CPUs and their assigned NVMe IRQs. Details: Intel Xeon E platform packs 4 CPU cores as 1 module (cluster) and share the L2 cache. Let's say, if there are 40 CPUs in 1 NUMA domain and 11 IRQs to dispatch. The existing algorithm will map first 7 IRQs each with 4 CPUs and remained 4 IRQs each with 3 CPUs. The last 4 IRQs may have cross cluster issue. For example, the 9th IRQ which pinned to CPU32, then for CPU31, it will have cross L2 memory access. CPU |28 29 30 31|32 33 34 35|36 ... -------- -------- -------- IRQ 8 9 10 If this patch applied, then first 2 IRQs each mapped with 2 CPUs and rest 9 IRQs each mapped with 4 CPUs, which avoids the cross cluster memory access. CPU |00 01 02 03|04 05 06 07|08 09 10 11| ... ----- ----- ----------- ----------- IRQ 1 2 3 4 As a result, 15%+ performance difference is observed in FIO libaio/randread/bs=8k. Changes since V1: - Add more performance details in commit messages. - Fix endless loop when topology_cluster_cpumask return invalid mask. History: v1: https://lore.kernel.org/all/20251024023038.872616-1-wangyang.guo@intel.com/ v1 [RESEND]: https://lore.kernel.org/all/20251111020608.1501543-1-wangyang.guo@intel.com/ Link: https://lkml.kernel.org/r/20260113022958.3379650-1-wangyang.guo@intel.com Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: Tianyou Li <tianyou.li@intel.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Tested-by: Dan Liang <dan.liang@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Radu Rendec <rrendec@redhat.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-01lib/group_cpus: Let group_cpu_evenly() return the number of initialized masksDaniel Wagner
group_cpu_evenly() might have allocated less groups then requested: group_cpu_evenly() __group_cpus_evenly() alloc_nodes_groups() # allocated total groups may be less than numgrps when # active total CPU number is less then numgrps In this case, the caller will do an out of bound access because the caller assumes the masks returned has numgrps. Return the number of groups created so the caller can limit the access range accordingly. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250617-isolcpus-queue-counters-v1-1-13923686b54b@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-06-25lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly()Yu Kuai
While testing null_blk with configfs, echo 0 > poll_queues will trigger following panic: BUG: kernel NULL pointer dereference, address: 0000000000000010 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 27 UID: 0 PID: 920 Comm: bash Not tainted 6.15.0-02023-gadbdb95c8696-dirty #1238 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:__bitmap_or+0x48/0x70 Call Trace: <TASK> __group_cpus_evenly+0x822/0x8c0 group_cpus_evenly+0x2d9/0x490 blk_mq_map_queues+0x1e/0x110 null_map_queues+0xc9/0x170 [null_blk] blk_mq_update_queue_map+0xdb/0x160 blk_mq_update_nr_hw_queues+0x22b/0x560 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk] nullb_device_poll_queues_store+0xa4/0x130 [null_blk] configfs_write_iter+0x109/0x1d0 vfs_write+0x26e/0x6f0 ksys_write+0x79/0x180 __x64_sys_write+0x1d/0x30 x64_sys_call+0x45c4/0x45f0 do_syscall_64+0xa5/0x240 entry_SYSCALL_64_after_hwframe+0x76/0x7e Root cause is that numgrps is set to 0, and ZERO_SIZE_PTR is returned from kcalloc(), and later ZERO_SIZE_PTR will be deferenced. Fix the problem by checking numgrps first in group_cpus_evenly(), and return NULL directly if numgrps is zero. [yukuai3@huawei.com: also fix the non-SMP version] Link: https://lkml.kernel.org/r/20250620010958.1265984-1-yukuai1@huaweicloud.com Link: https://lkml.kernel.org/r/20250619132655.3318883-1-yukuai1@huaweicloud.com Fixes: 6a6dcae8f486 ("blk-mq: Build default queue map via group_cpus_evenly()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: ErKun Yang <yangerkun@huawei.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "zhangyi (F)" <yi.zhang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenlyMing Lei
group_cpus_evenly() could be part of storage driver's error handler, such as nvme driver, when may happen during CPU hotplug, in which storage queue has to drain its pending IOs because all CPUs associated with the queue are offline and the queue is becoming inactive. And handling IO needs error handler to provide forward progress. Then deadlock is caused: 1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's handler is waiting for inflight IO 2) error handler is waiting for CPU hotplug lock 3) inflight IO can't be completed in blk-mq's CPU hotplug handler because error handling can't provide forward progress. Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(), in which two stage spreads are taken: 1) the 1st stage is over all present CPUs; 2) the end stage is over all other CPUs. Turns out the two stage spread just needs consistent 'cpu_present_mask', and remove the CPU hotplug lock by storing it into one local cache. This way doesn't change correctness, because all CPUs are still covered. Link: https://lkml.kernel.org/r/20231120083559.285174-1-ming.lei@redhat.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Guangwu Zhang <guazhang@redhat.com> Tested-by: Guangwu Zhang <guazhang@redhat.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-21lib/group_cpus: Export group_cpus_evenly()Xie Yongji
Export group_cpus_evenly() so that some modules can make use of it to group CPUs evenly according to NUMA and CPU locality. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-2-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-18genirq/affinity: Only build SMP-only helper functions on SMP kernelsIngo Molnar
allnoconfig grew these new build warnings in lib/group_cpus.c: lib/group_cpus.c:247:12: warning: ‘__group_cpus_evenly’ defined but not used [-Wunused-function] lib/group_cpus.c:75:13: warning: ‘build_node_to_cpumask’ defined but not used [-Wunused-function] lib/group_cpus.c:66:13: warning: ‘free_node_to_cpumask’ defined but not used [-Wunused-function] lib/group_cpus.c:43:23: warning: ‘alloc_node_to_cpumask’ defined but not used [-Wunused-function] Widen the #ifdef CONFIG_SMP block to not expose unused helpers on non-SMP builds. Also annotate the preprocessor branches for better readability. Fixes: f7b3ea8cf72f ("genirq/affinity: Move group_cpus_evenly() into lib/") Cc: Ming Lei <ming.lei@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221227022905.352674-6-ming.lei@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-17genirq/affinity: Move group_cpus_evenly() into lib/Ming Lei
group_cpus_evenly() has become a generic function which can be used for other subsystems than the interrupt subsystem, so move it into lib/. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20221227022905.352674-6-ming.lei@redhat.com