summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-01-31mm/damon: remove damon_operations->cleanup()SeongJae Park
Patch series "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION". Do miscellaneous code cleanups for improving readability. First three patches cleanup kdamond termination process, by removing unused operation set cleanup callback (patch 1) and moving damon_ctx specific resource cleanups on kdamond termination to synchronization-easy place (patches 2 and 3). Next two patches touch damon_call() infrastructure, by refactoring kdamond_call() function to do less and simpler locking operations (patch 4), and documenting when dealloc_on_free does work (patch 5). Final three patches rename things for clear uses of those. Those rename damos_filter_out() to be more explicit about the fact that it is only for core-handled filters (patch 6), DAMON_MIN_REGION macro to be more explicit it is not about number of regions but size of each region (patch 7), and damon_ctx->min_sz_region to be different from damos_access_patern->min_sz_region (patch 8), so that those are not confusing and easy to grep. This patch (of 8): damon_operations->cleanup() was added for a case that an operation set implementation requires additional cleanups. But no such implementation exists at the moment. Remove it. Link: https://lkml.kernel.org/r/20260117175256.82826-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260117175256.82826-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm: page_isolation: introduce page_is_unmovable()Kefeng Wang
Patch series "mm: accelerate gigantic folio allocation". Optimize pfn_range_valid_contig() and replace_free_hugepage_folios() in alloc_contig_frozen_pages() to speed up gigantic folio allocation. The allocation time for 120*1G folios drops from 3.605s to 0.431s. This patch (of 5): Factor out the check if a page is unmovable into a new helper, and will be reused in the following patch. No functional change intended, the minor changes are as follows, 1) Avoid unnecessary calls by checking CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION 2) Directly call PageCompound since PageTransCompound may be dropped 3) Using folio_test_hugetlb() Link: https://lkml.kernel.org/r/20260112150954.1802953-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20260112150954.1802953-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/vmscan: add tracepoint and reason for kswapd_failures resetJiayuan Chen
Currently, kswapd_failures is reset in multiple places (kswapd, direct reclaim, PCP freeing, memory-tiers), but there's no way to trace when and why it was reset, making it difficult to debug memory reclaim issues. This patch: 1. Introduce kswapd_clear_hopeless() as a wrapper function to centralize kswapd_failures reset logic. 2. Introduce kswapd_test_hopeless() to encapsulate hopeless node checks, replacing all open-coded kswapd_failures comparisons. 3. Add kswapd_clear_hopeless_reason enum to distinguish reset sources: - KSWAPD_CLEAR_HOPELESS_KSWAPD: reset from kswapd context - KSWAPD_CLEAR_HOPELESS_DIRECT: reset from direct reclaim - KSWAPD_CLEAR_HOPELESS_PCP: reset from PCP page freeing - KSWAPD_CLEAR_HOPELESS_OTHER: reset from other paths 4. Add tracepoints for better observability: - mm_vmscan_kswapd_clear_hopeless: traces each reset with reason - mm_vmscan_kswapd_reclaim_fail: traces each kswapd reclaim failure Test results: $ trace-cmd record -e vmscan:mm_vmscan_kswapd_clear_hopeless -e vmscan:mm_vmscan_kswapd_reclaim_fail $ # generate memory pressure $ trace-cmd report cpus=4 kswapd0-71 [000] 27.216563: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.217169: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.217764: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.218353: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.218993: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.219744: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.220488: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.221206: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.221806: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.222634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.223286: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.223894: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.224712: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.225424: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.226082: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.226810: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 kswapd1-72 [002] 27.386869: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=1 kswapd1-72 [002] 27.387435: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=2 kswapd1-72 [002] 27.388016: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=3 kswapd1-72 [002] 27.388586: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=4 kswapd1-72 [002] 27.389155: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=5 kswapd1-72 [002] 27.389723: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=6 kswapd1-72 [002] 27.390292: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=7 kswapd1-72 [002] 27.392364: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=8 kswapd1-72 [002] 27.392934: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=9 kswapd1-72 [002] 27.393504: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=10 kswapd1-72 [002] 27.394073: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=11 kswapd1-72 [002] 27.394899: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=12 kswapd1-72 [002] 27.395472: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=13 kswapd1-72 [002] 27.396055: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=14 kswapd1-72 [002] 27.396628: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=15 kswapd1-72 [002] 27.397199: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=16 kworker/u18:0-40 [002] 27.410151: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=DIRECT kswapd0-71 [000] 27.439454: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.440048: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.440634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.441211: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.441787: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.442363: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.443030: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.443725: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.444315: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.444898: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.445476: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.446053: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.446646: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.447230: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.447812: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.448391: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 ann-423 [003] 28.028285: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=PCP Link: https://lkml.kernel.org/r/20260120024402.387576-3-jiayuan.chen@linux.dev Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> [tracing] Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/vmscan: mitigate spurious kswapd_failures reset from direct reclaimJiayuan Chen
Patch series "mm/vmscan: add tracepoint and reason for kswapd_failures reset", v4. Currently, kswapd_failures is reset in multiple places (kswapd, direct reclaim, PCP freeing, memory-tiers), but there's no way to trace when and why it was reset, making it difficult to debug memory reclaim issues. This patch: 1. Introduce kswapd_clear_hopeless() as a wrapper function to centralize kswapd_failures reset logic. 2. Introduce kswapd_test_hopeless() to encapsulate hopeless node checks, replacing all open-coded kswapd_failures comparisons. 3. Add kswapd_clear_hopeless_reason enum to distinguish reset sources: - KSWAPD_CLEAR_HOPELESS_KSWAPD: reset from kswapd context - KSWAPD_CLEAR_HOPELESS_DIRECT: reset from direct reclaim - KSWAPD_CLEAR_HOPELESS_PCP: reset from PCP page freeing - KSWAPD_CLEAR_HOPELESS_OTHER: reset from other paths 4. Add tracepoints for better observability: - mm_vmscan_kswapd_clear_hopeless: traces each reset with reason - mm_vmscan_kswapd_reclaim_fail: traces each kswapd reclaim failure Test results: $ trace-cmd record -e vmscan:mm_vmscan_kswapd_clear_hopeless -e vmscan:mm_vmscan_kswapd_reclaim_fail $ # generate memory pressure $ trace-cmd report cpus=4 kswapd0-71 [000] 27.216563: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.217169: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.217764: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.218353: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.218993: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.219744: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.220488: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.221206: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.221806: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.222634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.223286: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.223894: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.224712: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.225424: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.226082: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.226810: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 kswapd1-72 [002] 27.386869: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=1 kswapd1-72 [002] 27.387435: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=2 kswapd1-72 [002] 27.388016: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=3 kswapd1-72 [002] 27.388586: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=4 kswapd1-72 [002] 27.389155: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=5 kswapd1-72 [002] 27.389723: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=6 kswapd1-72 [002] 27.390292: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=7 kswapd1-72 [002] 27.392364: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=8 kswapd1-72 [002] 27.392934: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=9 kswapd1-72 [002] 27.393504: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=10 kswapd1-72 [002] 27.394073: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=11 kswapd1-72 [002] 27.394899: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=12 kswapd1-72 [002] 27.395472: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=13 kswapd1-72 [002] 27.396055: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=14 kswapd1-72 [002] 27.396628: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=15 kswapd1-72 [002] 27.397199: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=16 kworker/u18:0-40 [002] 27.410151: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=DIRECT kswapd0-71 [000] 27.439454: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.440048: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.440634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.441211: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.441787: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.442363: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.443030: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.443725: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.444315: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.444898: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.445476: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.446053: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.446646: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.447230: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.447812: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.448391: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 ann-423 [003] 28.028285: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=PCP This patch (of 2): When kswapd fails to reclaim memory, kswapd_failures is incremented. Once it reaches MAX_RECLAIM_RETRIES, kswapd stops running to avoid futile reclaim attempts. However, any successful direct reclaim unconditionally resets kswapd_failures to 0, which can cause problems. We observed an issue in production on a multi-NUMA system where a process allocated large amounts of anonymous pages on a single NUMA node, causing its watermark to drop below high and evicting most file pages: $ numastat -m Per-node system memory usage (in MBs): Node 0 Node 1 Total --------------- --------------- --------------- MemTotal 128222.19 127983.91 256206.11 MemFree 1414.48 1432.80 2847.29 MemUsed 126807.71 126551.11 252358.82 SwapCached 0.00 0.00 0.00 Active 29017.91 25554.57 54572.48 Inactive 92749.06 95377.00 188126.06 Active(anon) 28998.96 23356.47 52355.43 Inactive(anon) 92685.27 87466.11 180151.39 Active(file) 18.95 2198.10 2217.05 Inactive(file) 63.79 7910.89 7974.68 With swap disabled, only file pages can be reclaimed. When kswapd is woken (e.g., via wake_all_kswapds()), it runs continuously but cannot raise free memory above the high watermark since reclaimable file pages are insufficient. Normally, kswapd would eventually stop after kswapd_failures reaches MAX_RECLAIM_RETRIES. However, containers on this machine have memory.high set in their cgroup. Business processes continuously trigger the high limit, causing frequent direct reclaim that keeps resetting kswapd_failures to 0. This prevents kswapd from ever stopping. The key insight is that direct reclaim triggered by cgroup memory.high performs aggressive scanning to throttle the allocating process. With sufficiently aggressive scanning, even hot pages will eventually be reclaimed, making direct reclaim "successful" at freeing some memory. However, this success does not mean the node has reached a balanced state - the freed memory may still be insufficient to bring free pages above the high watermark. Unconditionally resetting kswapd_failures in this case keeps kswapd alive indefinitely. The result is that kswapd runs endlessly. Unlike direct reclaim which only reclaims from the allocating cgroup, kswapd scans the entire node's memory. This causes hot file pages from all workloads on the node to be evicted, not just those from the cgroup triggering memory.high. These pages constantly refault, generating sustained heavy IO READ pressure across the entire system. Fix this by only resetting kswapd_failures when the node is actually balanced. This allows both kswapd and direct reclaim to clear kswapd_failures upon successful reclaim, but only when the reclaim actually resolves the memory pressure (i.e., the node becomes balanced). Link: https://lkml.kernel.org/r/20260120024402.387576-1-jiayuan.chen@linux.dev Link: https://lkml.kernel.org/r/20260120024402.387576-2-jiayuan.chen@linux.dev Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm: fix OOM killer inaccuracy on large many-core systemsMathieu Desnoyers
Use the precise, albeit slower, precise RSS counter sums for the OOM killer task selection and console dumps. The approximated value is too imprecise on large many-core systems. The following rss tracking issues were noted by Sweet Tea Dorminy [1], which lead to picking wrong tasks as OOM kill target: Recently, several internal services had an RSS usage regression as part of a kernel upgrade. Previously, they were on a pre-6.2 kernel and were able to read RSS statistics in a backup watchdog process to monitor and decide if they'd overrun their memory budget. Now, however, a representative service with five threads, expected to use about a hundred MB of memory, on a 250-cpu machine had memory usage tens of megabytes different from the expected amount -- this constituted a significant percentage of inaccuracy, causing the watchdog to act. This was a result of commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") [1]. Previously, the memory error was bounded by 64*nr_threads pages, a very livable megabyte. Now, however, as a result of scheduler decisions moving the threads around the CPUs, the memory error could be as large as a gigabyte. This is a really tremendous inaccuracy for any few-threaded program on a large machine and impedes monitoring significantly. These stat counters are also used to make OOM killing decisions, so this additional inaccuracy could make a big difference in OOM situations -- either resulting in the wrong process being killed, or in less memory being returned from an OOM-kill than expected. Here is a (possibly incomplete) list of the prior approaches that were used or proposed, along with their downside: 1) Per-thread rss tracking: large error on many-thread processes. 2) Per-CPU counters: up to 12% slower for short-lived processes and 9% increased system time in make test workloads [1]. Moreover, the inaccuracy increases with O(n^2) with the number of CPUs. 3) Per-NUMA-node counters: requires atomics on fast-path (overhead), error is high with systems that have lots of NUMA nodes (32 times the number of NUMA nodes). commit 82241a83cd15 ("mm: fix the inaccurate memory statistics issue for users") introduced get_mm_counter_sum() for precise proc memory status queries for some proc files. The simple fix proposed here is to do the precise per-cpu counters sum every time a counter value needs to be read. This applies to the OOM killer task selection, oom task console dumps (printk). This change increases the latency introduced when the OOM killer executes in favor of doing a more precise OOM target task selection. Effectively, the OOM killer iterates on all tasks, for all relevant page types, for which the precise sum iterates on all possible CPUs. As a reference, here is the execution time of the OOM killer before/after the change: AMD EPYC 9654 96-Core (2 sockets) Within a KVM, configured with 256 logical cpus. | before | after | ----------------------------------|----------|----------| nr_processes=40 | 0.3 ms | 0.5 ms | nr_processes=10000 | 3.0 ms | 80.0 ms | Link: https://lkml.kernel.org/r/20260114143642.47333-1-mathieu.desnoyers@efficios.com Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") Link: https://lore.kernel.org/lkml/20250331223516.7810-2-sweettea-kernel@dorminy.me/ # [1] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Martin Liu <liumartin@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R . Howlett" <liam.howlett@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Yu Zhao <yuzhao@google.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm: rename CONFIG_MEMORY_BALLOON -> CONFIG_BALLOONDavid Hildenbrand (Red Hat)
Let's make it consistent with the naming of the files but also with the naming of CONFIG_BALLOON_MIGRATION. While at it, add a "/* CONFIG_BALLOON */". Link: https://lkml.kernel.org/r/20260119230133.3551867-24-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm: rename CONFIG_BALLOON_COMPACTION to CONFIG_BALLOON_MIGRATIONDavid Hildenbrand (Red Hat)
While compaction depends on migration, the other direction is not the case. So let's make it clearer that this is all about migration of balloon pages. Adjust all comments/docs in the core to talk about "migration" instead of "compaction". While at it add some "/* CONFIG_BALLOON_MIGRATION */". Link: https://lkml.kernel.org/r/20260119230133.3551867-23-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm: rename balloon_compaction.(c|h) to balloon.(c|h)David Hildenbrand (Red Hat)
Even without CONFIG_BALLOON_COMPACTION this infrastructure implements basic list and page management for a memory balloon. Link: https://lkml.kernel.org/r/20260119230133.3551867-21-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: remove "extern" from functionsDavid Hildenbrand (Red Hat)
Adding "extern" to functions is frowned-upon. Let's just get rid of it for all functions here. Link: https://lkml.kernel.org/r/20260119230133.3551867-19-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: move internal helpers to balloon_compaction.cDavid Hildenbrand (Red Hat)
Let's move the helpers that are not required by drivers anymore. While at it, drop the doc of balloon_page_device() as it is trivial. [david@kernel.org: move balloon_page_device() under CONFIG_BALLOON_COMPACTION] Link: https://lkml.kernel.org/r/27f0adf1-54c1-4d99-8b7f-fd45574e7f41@kernel.org Link: https://lkml.kernel.org/r/20260119230133.3551867-16-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: fold balloon_mapping_gfp_mask() into balloon_page_alloc()David Hildenbrand (Red Hat)
Let's just remove balloon_mapping_gfp_mask(). Link: https://lkml.kernel.org/r/20260119230133.3551867-15-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: remove balloon_page_push/pop()David Hildenbrand (Red Hat)
Let's remove these helpers as they are unused now. Link: https://lkml.kernel.org/r/20260119230133.3551867-14-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: drop fs.h include from balloon_compaction.hDavid Hildenbrand (Red Hat)
Ever since commit 68f2736a8583 ("mm: Convert all PageMovable users to movable_operations") we no longer store an inode in balloon_dev_info, so we can stop including "fs.h". Link: https://lkml.kernel.org/r/20260119230133.3551867-12-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: make balloon_mops staticDavid Hildenbrand (Red Hat)
There is no need to expose this anymore, so let's just make it static. Link: https://lkml.kernel.org/r/20260119230133.3551867-11-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: remove dependency on page lockDavid Hildenbrand (Red Hat)
Let's stop using the page lock in balloon code and instead use only the balloon_device_lock. As soon as we set the PG_movable_ops flag, we might now get isolation callbacks for that page as we are no longer holding the page lock. In there, we'll simply synchronize using the balloon_device_lock. So in balloon_page_isolate() lookup the balloon_dev_info through page->private under balloon_device_lock. It's crucial that we update page->private under the balloon_device_lock, so the isolation callback can properly deal with concurrent deflation. Consequently, make sure that balloon_page_finalize() is called under balloon_device_lock as we remove a page from the list and clear page->private. balloon_page_insert() is already called with the balloon_device_lock held. Note that the core will still lock the pages, for example in isolate_movable_ops_page(). The lock is there still relevant for handling the PageMovableOpsIsolated flag, but that can be later changed to use an atomic test-and-set instead, or moved into the movable_ops backends. Link: https://lkml.kernel.org/r/20260119230133.3551867-10-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: use a device-independent balloon (list) lockDavid Hildenbrand (Red Hat)
In order to remove the dependency on the page lock for balloon pages, we need a lock that is independent of the page. It's crucial that we can handle the scenario where balloon deflation (clearing page->private) can race with page isolation (using page->private to obtain the balloon_dev_info where the lock currently resides). The current lock in balloon_dev_info is therefore not suitable. Fortunately, we never really have more than a single balloon device per VM, so we can just keep it simple and use a static lock to protect all balloon devices. Based on this change we will remove the dependency on the page lock next. Link: https://lkml.kernel.org/r/20260119230133.3551867-9-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/balloon_compaction: centralize adjust_managed_page_count() handlingDavid Hildenbrand (Red Hat)
Let's centralize it, by allowing for the driver to enable this handling through a new flag (bool for now) in the balloon device info. Note that we now adjust the counter when adding/removing a page into the balloon list: when removing a page to deflate it, it will now happen before the driver communicated with hypervisor, not afterwards. This shouldn't make a difference in practice. Link: https://lkml.kernel.org/r/20260119230133.3551867-7-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31mm/khugepaged: change collapse_pte_mapped_thp() to return voidShivank Garg
The only external caller of collapse_pte_mapped_thp() is uprobe, which ignores the return value. Change the external API to return void to simplify the interface. Introduce try_collapse_pte_mapped_thp() for internal use that preserves the return value. This prepares for future patch that will convert the return type to use enum scan_result. Link: https://lkml.kernel.org/r/20260118192253.9263-10-shivankg@amd.com Signed-off-by: Shivank Garg <shivankg@amd.com> Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Lance Yang <lance.yang@linux.dev> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Nico Pache <npache@redhat.com> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-31Merge branch 'mm-hotfixes-stable' into mm-stable to pick up "mm/shmem,Andrew Morton
swap: fix race of truncate and swap entry split", needed for merging "mm, swap: cleanup swap entry management workflow".
2026-01-31bpf, arm64: Add fsession supportLeon Hwang
Implement fsession support in the arm64 BPF JIT trampoline. Extend the trampoline stack layout to store function metadata and session cookies, and pass the appropriate metadata to fentry and fexit programs. This mirrors the existing x86 behavior and enables session cookies on arm64. Acked-by: Puranjay Mohan <puranjay@kernel.org> Tested-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260131144950.16294-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-31bpf: Add bpf_jit_supports_fsession()Leon Hwang
The added fsession does not prevent running on those architectures, that haven't added fsession support. For example, try to run fsession tests on arm64: test_fsession_basic:PASS:fsession_test__open_and_load 0 nsec test_fsession_basic:PASS:fsession_attach 0 nsec check_result:FAIL:test_run_opts err unexpected error: -14 (errno 14) In order to prevent such errors, add bpf_jit_supports_fsession() to guard those architectures. Fixes: 2d419c44658f ("bpf: add fsession support") Acked-by: Puranjay Mohan <puranjay@kernel.org> Tested-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260131144950.16294-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-31crypto: hisilicon/qm - obtain the mailbox configuration at one timeWeili Qian
The malibox needs to be triggered by a 128bit atomic operation. The reason is that the PF and VFs of the device share the mmio memory of the mailbox, and the mutex cannot lock mailbox operations in different functions, especially when passing through VFs to virtual machines. Currently, the write operation to the mailbox is already a 128-bit atomic write. The read operation also needs to be modified to a 128-bit atomic read. Since there is no general 128-bit IO memory access API in the current ARM64 architecture, and the stp and ldp instructions do not guarantee atomic access to device memory, they cannot be extracted as a general API. Therefore, the 128-bit atomic read and write operations need to be implemented in the driver. Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-01-30net: wwan: add NMEA port supportSergey Ryazanov
Many WWAN modems come with embedded GNSS receiver inside and have a dedicated port to output geopositioning data. On the one hand, the GNSS receiver has little in common with WWAN modem and just shares a host interface and should be exported using the GNSS subsystem. On the other hand, GNSS receiver is not automatically activated and needs a generic WWAN control port (AT, MBIM, etc.) to be turned on. And a user space software needs extra information to find the control port. Introduce the new type of WWAN port - NMEA. When driver asks to register a NMEA port, the core allocates common parent WWAN device as usual, but exports the NMEA port via the GNSS subsystem and acts as a proxy between the device driver and the GNSS subsystem. From the WWAN device driver perspective, a NMEA port is registered as a regular WWAN port without any difference. And the driver interacts only with the WWAN core. From the user space perspective, the NMEA port is a GNSS device which parent can be used to enumerate and select the proper control port for the GNSS receiver management. CC: Slark Xiao <slark_xiao@163.com> CC: Muhammad Nuzaihan <zaihan@unrealasia.net> CC: Qiang Yu <quic_qianyu@quicinc.com> CC: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> CC: Johan Hovold <johan@kernel.org> Suggested-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Link: https://patch.msgid.link/20260126062158.308598-6-slark_xiao@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-31PCI/MSI: Convert the boolean no_64bit_msi flag to a DMA address maskVivian Wang
Some PCI devices have PCI_MSI_FLAGS_64BIT in the MSI capability, but implement less than 64 address bits. This breaks on platforms where such a device is assigned an MSI address higher than what's supported. Currently, no_64bit_msi bit is set for these devices, meaning that only 32-bit MSI addresses are allowed for them. However, on some platforms the MSI doorbell address is above the 32-bit limit but within the addressable range of the device. As a first step to enable MSI on those combinations of devices and platforms, convert the boolean no_64bit_msi flag to a DMA mask and fixup the affected usage sites: - no_64bit_msi = 1 -> msi_addr_mask = DMA_BIT_MASK(32) - no_64bit_msi = 0 -> msi_addr_mask = DMA_BIT_MASK(64) - if (no_64bit_msi) -> if (msi_addr_mask < DMA_BIT_MASK(64)) Since no values other than DMA_BIT_MASK(32) and DMA_BIT_MASK(64) are used, this is functionally equivalent. This prepares for changing the binary decision between 32 and 64 bit to a DMA mask based decision which allows to support systems which have a DMA address space less than 64bit but a MSI doorbell address above the 32-bit limit. [ tglx: Massaged changelog ] Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> # ionic Reviewed-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Takashi Iwai <tiwai@suse.de> # sound Link: https://patch.msgid.link/20260129-pci-msi-addr-mask-v4-1-70da998f2750@iscas.ac.cn
2026-01-31i3c: master: Add i3c_master_do_daa_ext() for post-hibernation address recoveryAdrian Hunter
After system hibernation, I3C Dynamic Addresses may be reassigned at boot and no longer match the values recorded before suspend. Introduce i3c_master_do_daa_ext() to handle this situation. The restore procedure is straightforward: issue a Reset Dynamic Address Assignment (RSTDAA), then run the standard DAA sequence. The existing DAA logic already supports detecting and updating devices whose dynamic addresses differ from previously known values. Refactor the DAA path by introducing a shared helper used by both the normal i3c_master_do_daa() path and the new extended restore function, and correct the kernel-doc in the process. Export i3c_master_do_daa_ext() so that master drivers can invoke it from their PM restore callbacks. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260123063325.8210-2-adrian.hunter@intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2026-01-30perf: sched: Fix perf crash with new is_user_task() helperSteven Rostedt
In order to do a user space stacktrace the current task needs to be a user task that has executed in user space. It use to be possible to test if a task is a user task or not by simply checking the task_struct mm field. If it was non NULL, it was a user task and if not it was a kernel task. But things have changed over time, and some kernel tasks now have their own mm field. An idea was made to instead test PF_KTHREAD and two functions were used to wrap this check in case it became more complex to test if a task was a user task or not[1]. But this was rejected and the C code simply checked the PF_KTHREAD directly. It was later found that not all kernel threads set PF_KTHREAD. The io-uring helpers instead set PF_USER_WORKER and this needed to be added as well. But checking the flags is still not enough. There's a very small window when a task exits that it frees its mm field and it is set back to NULL. If perf were to trigger at this moment, the flags test would say its a user space task but when perf would read the mm field it would crash with at NULL pointer dereference. Now there are flags that can be used to test if a task is exiting, but they are set in areas that perf may still want to profile the user space task (to see where it exited). The only real test is to check both the flags and the mm field. Instead of making this modification in every location, create a new is_user_task() helper function that does all the tests needed to know if it is safe to read the user space memory or not. [1] https://lore.kernel.org/all/20250425204120.639530125@goodmis.org/ Fixes: 90942f9fac05 ("perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL") Closes: https://lore.kernel.org/all/0d877e6f-41a7-4724-875d-0b0a27b8a545@roeck-us.net/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260129102821.46484722@gandalf.local.home
2026-01-30NFS: fix delayed delegation return handlingChristoph Hellwig
Rework this code that was totally busted at least as of my most recent changes. Introduce a separate list for delayed delegations so that they can't get lost and don't clutter up the returns list. Add a missing spin_unlock in the helper marking it as a regular pending return. Fixes: 0ebe655bd033 ("NFS: add a separate delegation return list") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: return void from ->return_delegationChristoph Hellwig
The caller doesn't check the return value, so drop it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30Merge tag 'dma-mapping-6.19-2026-01-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: - important fix for ARM 32-bit based systems using cma= kernel parameter (Oreoluwa Babatunde) - a fix for the corner case of the DMA atomic pool based allocations (Sai Sree Kartheek Adivi) * tag 'dma-mapping-6.19-2026-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma/pool: distinguish between missing and exhausted atomic pools of: reserved_mem: Allow reserved_mem framework detect "cma=" kernel param
2026-01-30bpf: Allow sleepable programs to use tail callsJiri Olsa
Allowing sleepable programs to use tail calls. Making sure we can't mix sleepable and non-sleepable bpf programs in tail call map (BPF_MAP_TYPE_PROG_ARRAY) and allowing it to be used in sleepable programs. Sleepable programs can be preempted and sleep which might bring new source of race conditions, but both direct and indirect tail calls should not be affected. Direct tail calls work by patching direct jump to callee into bpf caller program, so no problem there. We atomically switch from nop to jump instruction. Indirect tail call reads the callee from the map and then jumps to it. The callee bpf program can't disappear (be released) from the caller, because it is executed under rcu lock (rcu_read_lock_trace). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260130081208.1130204-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-30NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4Anna Schumaker
Compiling the NFSv4 module without any minorversion support doesn't make much sense, so this patch sets NFS v4.1 as the default, always enabled NFS version allowing us to replace all the CONFIG_NFS_V4_1s scattered throughout the code with CONFIG_NFS_V4. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30NFS: Move sequence slot operations into minorversion operationsAnna Schumaker
At the same time, I move the NFS v4.0 functions into nfs40proc.c to keep v4.0 features together in their own files. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-30tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fastSteven Rostedt
The current use of guard(preempt_notrace)() within __DECLARE_TRACE() to protect invocation of __DO_TRACE_CALL() means that BPF programs attached to tracepoints are non-preemptible. This is unhelpful in real-time systems, whose users apparently wish to use BPF while also achieving low latencies. (Who knew?) One option would be to use preemptible RCU, but this introduces many opportunities for infinite recursion, which many consider to be counterproductive, especially given the relatively small stacks provided by the Linux kernel. These opportunities could be shut down by sufficiently energetic duplication of code, but this sort of thing is considered impolite in some circles. Therefore, use the shiny new SRCU-fast API, which provides somewhat faster readers than those of preemptible RCU, at least on Paul E. McKenney's laptop, where task_struct access is more expensive than access to per-CPU variables. And SRCU-fast provides way faster readers than does SRCU, courtesy of being able to avoid the read-side use of smp_mb(). Also, it is quite straightforward to create srcu_read_{,un}lock_fast_notrace() functions. Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutronix.de/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20260126231256.499701982@kernel.org Co-developed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-30block: introduce bdev_rot()Damien Le Moal
Introduce the helper function bdev_rot() to test if a block device is a rotational one. The existing function bdev_nonrot() which tests for the opposite condition is redefined using this new helper. This avoids the double negation (operator and name) that appears when testing if a block device is a rotational device, thus making the code a little easier to read. Call sites of bdev_nonrot() in the block layer are updated to use this new helper. Remaining users in other subsystems are left unchanged for now. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-30Merge branch 'core/entry' into sched/coreThomas Gleixner
Pull the entry update to avoid merge conflicts with the time slice extension changes. Signed-off-by: Thomas Gleixner <tglx@kernel.org>
2026-01-30entry: Inline syscall_exit_work() and syscall_trace_enter()Jinjie Ruan
After switching ARM64 to the generic entry code, a syscall_exit_work() appeared as a profiling hotspot because it is not inlined. Inlining both syscall_trace_enter() and syscall_exit_work() provides a performance gain when any of the work items is enabled. With audit enabled this results in a ~4% performance gain for perf bench basic syscall on a kunpeng920 system: | Metric | Baseline | Inlined | Change | | ---------- | ----------- | ----------- | ------ | | Total time | 2.353 [sec] | 2.264 [sec] | ↓3.8% | | usecs/op | 0.235374 | 0.226472 | ↓3.8% | | ops/sec | 4,248,588 | 4,415,554 | ↑3.9% | Small gains can be observed on x86 as well, though the generated code optimizes for the work case, which is counterproductive for high performance scenarios where such entry/exit work is usually avoided. Avoid this by marking the work check in syscall_enter_from_user_mode_work() unlikely, which is what the corresponding check in the exit path does already. [ tglx: Massage changelog and add the unlikely() ] Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128031934.3906955-14-ruanjinjie@huawei.com
2026-01-30entry: Add arch_ptrace_report_syscall_entry/exit()Jinjie Ruan
ARM64 requires a architecture specific ptrace wrapper as it needs to save and restore scratch registers. Provide arch_ptrace_report_syscall_entry/exit() wrappers which fall back to ptrace_report_syscall_entry/exit() if the architecture does not provide them. No functional change intended. [ tglx: Massaged changelog and comments ] Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Link: https://patch.msgid.link/20260128031934.3906955-11-ruanjinjie@huawei.com
2026-01-30entry: Rework syscall_exit_to_user_mode_work() for architecture reuseJinjie Ruan
syscall_exit_to_user_mode_work() invokes local_irq_disable_exit_to_user() and syscall_exit_to_user_mode_prepare() after handling pending syscall exit work. The conversion of ARM64 to the generic entry code requires this to be split up, so move the invocations of local_irq_disable_exit_to_user() and syscall_exit_to_user_mode_prepare() into the only caller. No functional change intended. [ tglx: Massaged changelog and comments ] Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20260128031934.3906955-10-ruanjinjie@huawei.com
2026-01-30entry: Remove unused syscall argument from syscall_trace_enter()Jinjie Ruan
The 'syscall' argument of syscall_trace_enter() is immediately overwritten before any real use and serves only as a local variable, so drop the parameter. No functional change intended. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260128031934.3906955-2-ruanjinjie@huawei.com
2026-01-30MIPS: copy pic32.h header file from asm/mach-pic32/ to include/platform-data/Brian Masney
There are currently some pic32 MIPS drivers that are in tree, and are only configured to be compiled on the pic32 platform. There's a risk of breaking some of these drivers when migrating drivers away from legacy APIs. It happened to me with a pic32 clk driver. Let's go ahead and copy the MIPS pic32.h header to include/linux/platform_data/, and make a minor update to allow compiling this on other architectures. This will make it easier, and cleaner to enable COMPILE_TEST for some of these pic32 drivers. The asm variant of the header file will be dropped once all drivers have been updated. Link: https://lore.kernel.org/linux-clk/CABx5tq+eOocJ41X-GSgkGy6S+s+Am1yCS099wqP695NtwALTmg@mail.gmail.com/T/ Signed-off-by: Brian Masney <bmasney@redhat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2026-01-30pkcs7, x509: Add ML-DSA supportDavid Howells
Add support for ML-DSA keys and signatures to the CMS/PKCS#7 and X.509 implementations. ML-DSA-44, -65 and -87 are all supported. For X.509 certificates, the TBSCertificate is required to be signed directly; for CMS, direct signing of the data is preferred, though use of SHA512 (and only that) as an intermediate hash of the content is permitted with signedAttrs. Signed-off-by: David Howells <dhowells@redhat.com> cc: Lukas Wunner <lukas@wunner.de> cc: Ignat Korchagin <ignat@cloudflare.com> cc: Stephan Mueller <smueller@chronox.de> cc: Eric Biggers <ebiggers@kernel.org> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: keyrings@vger.kernel.org cc: linux-crypto@vger.kernel.org
2026-01-30irqchip/gic-v5: Check if impl is virt capableSascha Bischoff
Now that there is support for creating a GICv5-based guest with KVM, check that the hardware itself supports virtualisation, skipping the setting of struct gic_kvm_info if not. Note: If native GICv5 virt is not supported, then nor is FEAT_GCIE_LEGACY, so we are able to skip altogether. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260128175919.3828384-33-sascha.bischoff@arm.com [maz: cosmetic changes] Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-30platform/chrome: lightbar: Add support for large sequenceGwendal Grignou
Current sequences are limited to 192 bytes. Increase support to whatever the EC support. If the sequence is too long, the EC will return an OVERFLOW error. Test: Check sending a large sequence is received by the EC. Signed-off-by: Gwendal Grignou <gwendal@google.com> Link: https://lore.kernel.org/r/20260130081351.487517-2-gwendal@google.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2026-01-30platform/chrome: lightbar: Report number of segmentsGwendal Grignou
Add attribue `num_segments` to return the number of exposed LED segments in the lightbar. It can be smaller than the number of physical leds in the lightbar. Test: Check the attribute is present and returns a value when read. Signed-off-by: Gwendal Grignou <gwendal@google.com> Link: https://lore.kernel.org/r/20260130081351.487517-1-gwendal@google.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2026-01-29Merge tag 'wireless-next-2026-01-29' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Another fairly large set of changes, notably: - cfg80211/mac80211 - most of EPPKE/802.1X over auth frames support - additional FTM capabilities - split up drop reasons better, removing generic RX_DROP - NAN cleanups/fixes - ath11k: - support for Channel Frequency Response measurement - ath12k: - support for the QCC2072 chipset - iwlwifi: - partial NAN support - UNII-9 support - some UHR/802.11bn FW APIs - remove most of MLO/EHT from iwlmvm (such devices use iwlmld) - rtw89: - preparations for RTL8922DE support * tag 'wireless-next-2026-01-29' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (184 commits) wifi: iwlegacy: add missing mutex protection in il4965_store_tx_power() wifi: iwlegacy: add missing mutex protection in il3945_store_measurement() wifi: mac80211: use u64_stats_t with u64_stats_sync properly wifi: p54: Fix memory leak in p54_beacon_update() wifi: cfg80211: treat deprecated INDOOR_SP_AP_OLD control value as LPI mode wifi: rtw88: sdio: Migrate to use sdio specific shutdown function wifi: rsi: sdio: Migrate to use sdio specific shutdown function sdio: Provide a bustype shutdown function wifi: nl80211/cfg80211: support operating as RSTA in PMSR FTM request wifi: nl80211/cfg80211: add negotiated burst period to FTM result wifi: nl80211/cfg80211: clarify periodic FTM parameters for non-EDCA based ranging wifi: nl80211/cfg80211: add new FTM capabilities wifi: iwlwifi: rename struct iwl_mcc_allowed_ap_type_cmd::offset_map wifi: iwlwifi: mvm: Remove link_id from time_events wifi: iwlwifi: mld: change cluster_id type to u8 array wifi: iwlwifi: support V13 of iwl_lari_config_change_cmd wifi: iwlwifi: split bios_value_u32 to separate the header wifi: iwlwifi: uefi: cache the DSM functions wifi: iwlwifi: acpi: cache the DSM functions wifi: iwlwifi: mvm: Cleanup MLO code ... ==================== Link: https://patch.msgid.link/20260129110136.176980-39-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29net: add skb_header_pointer_careful() helperEric Dumazet
This variant of skb_header_pointer() should be used in contexts where @offset argument is user-controlled and could be negative. Negative offsets are supported, as long as the zone starts between skb->head and skb->data. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260128141539.3404400-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.19-rc8). No adjacent changes, conflicts: drivers/net/ethernet/spacemit/k1_emac.c 2c84959167d64 ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()") f66086798f91f ("net: spacemit: Remove broken flow control support") https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29block: introduce blk_queue_rot()Damien Le Moal
To check if a request queue is for a rotational device, a double negation is needed with the pattern "!blk_queue_nonrot(q)". Simplify this with the introduction of the helper blk_queue_rot() which tests if a requests queue limit has the BLK_FEAT_ROTATIONAL feature set. All call sites of blk_queue_nonrot() are modified to use blk_queue_rot() and blk_queue_nonrot() definition removed. No functional changes. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-29block: cleanup queue limit features definitionDamien Le Moal
Unwrap the definition of BLK_FEAT_ATOMIC_WRITES and renumber this feature to be sequential with BLK_FEAT_SKIP_TAGSET_QUIESCE. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-01-29mtd: spinand: Add octal DTR supportMiquel Raynal
Create a new bus interface named ODTR for "octal DTR", which matches the following pattern: 8D-8D-8D. Add octal DTR support for all the existing core operations. Add a second set of templates for this bus interface. Give the possibility for drivers to register their read, write and update cache variants as well as their vendor specific operations. Check the SPI controller driver supports all the octal DTR commands that we might need before switching to the ODTR bus interface. Make the switch by calling ->configure_chip() with the ODTR parameter. Fallback in case this step fails. If someone ever attempts to suspend a chip in octal DTR mode, there are changes that it will loose its configuration at resume. Prevent any problem by explicitly switching back to SSDR while suspending. Note: there is a limitation in the current approach, page I/Os are not available as the dirmaps will be created for the ODTR bus interface if that option is supported and not switched back to SSDR during suspend. Switching them is possible but would be costly and would not bring anything as right after resuming we will switch again to ODTR. In case this capability is used for debug, developpers should mind to destroy and recreate suitable direct mappings. Finally, as a side effect, we increase the buffer for reading IDs to 6. No device at this point returns 6 bytes, but we support 5 bytes IDs, which means in octal DTR mode we have no other choice than reading an even number of bytes, hence 6. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>