summaryrefslogtreecommitdiff
path: root/kernel/trace/bpf_trace.c
AgeCommit message (Collapse)Author
2026-02-26bpf: Fix kprobe_multi cookies access in show_fdinfo callbackJiri Olsa
We don't check if cookies are available on the kprobe_multi link before accessing them in show_fdinfo callback, we should. Cc: stable@vger.kernel.org Fixes: da7e9c0a7fbc ("bpf: Add show_fdinfo for kprobe_multi") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260225111249.186230-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-13Merge tag 'trace-v7.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: "User visible changes: - Add an entry into MAINTAINERS file for RUST versions of code There's now RUST code for tracing and static branches. To differentiate that code from the C code, add entries in for the RUST version (with "[RUST]" around it) so that the right maintainers get notified on changes. - New bitmask-list option added to tracefs When this is set, bitmasks in trace event are not displayed as hex numbers, but instead as lists: e.g. 0-5,7,9 instead of 0000015f - New show_event_filters file in tracefs Instead of having to search all events/*/*/filter for any active filters enabled in the trace instance, the file show_event_filters will list them so that there's only one file that needs to be examined to see if any filters are active. - New show_event_triggers file in tracefs Instead of having to search all events/*/*/trigger for any active triggers enabled in the trace instance, the file show_event_triggers will list them so that there's only one file that needs to be examined to see if any triggers are active. - Have traceoff_on_warning disable trace pintk buffer too Recently recording of trace_printk() could go to other trace instances instead of the top level instance. But if traceoff_on_warning triggers, it doesn't stop the buffer with trace_printk() and that data can easily be lost by being overwritten. Have traceoff_on_warning also disable the instance that has trace_printk() being written to it. - Update the hist_debug file to show what function the field uses When CONFIG_HIST_TRIGGERS_DEBUG is enabled, a hist_debug file exists for every event. This displays the internal data of any histogram enabled for that event. But it is lacking the function that is called to process one of its fields. This is very useful information that was missing when debugging histograms. - Up the histogram stack size from 16 to 31 Stack traces can be used as keys for event histograms. Currently the size of the stack that is stored is limited to just 16 entries. But the storage space in the histogram is 256 bytes, meaning that it can store up to 31 entries (plus one for the count of entries). Instead of letting that space go to waste, up the limit from 16 to 31. This makes the keys much more useful. - Fix permissions of per CPU file buffer_size_kb The per CPU file of buffer_size_kb was incorrectly set to read only in a previous cleanup. It should be writable. - Reset "last_boot_info" if the persistent buffer is cleared The last_boot_info shows address information of a persistent ring buffer if it contains data from a previous boot. It is cleared when recording starts again, but it is not cleared when the buffer is reset. The data is useless after a reset so clear it on reset too. Internal changes: - A change was made to allow tracepoint callbacks to have preemption enabled, and instead be protected by SRCU. This required some updates to the callbacks for perf and BPF. perf needed to disable preemption directly in its callback because it expects preemption disabled in the later code. BPF needed to disable migration, as its code expects to run completely on the same CPU. - Have irq_work wake up other CPU if current CPU is "isolated" When there's a waiter waiting on ring buffer data and a new event happens, an irq work is triggered to wake up that waiter. This is noisy on isolated CPUs (running NO_HZ_FULL). Trigger an IPI to a house keeping CPU instead. - Use proper free of trigger_data instead of open coding it in. - Remove redundant call of event_trigger_reset_filter() It was called immediately in a function that was called right after it. - Workqueue cleanups - Report errors if tracing_update_buffers() were to fail. - Make the enum update workqueue generic for other parts of tracing On boot up, a work queue is created to convert enum names into their numbers in the trace event format files. This work queue can also be used for other aspects of tracing that takes some time and shouldn't be called by the init call code. The blk_trace initialization takes a bit of time. Have the initialization code moved to the new tracing generic work queue function. - Skip kprobe boot event creation call if there's no kprobes defined on cmdline The kprobe initialization to set up kprobes if they are defined on the cmdline requires taking the event_mutex lock. This can be held by other tracing code doing initialization for a long time. Since kprobes added to the kernel command line need to be setup immediately, as they may be tracing early initialization code, they cannot be postponed in a work queue and must be setup in the initcall code. If there's no kprobe on the kernel cmdline, there's no reason to take the mutex and slow down the boot up code waiting to get the lock only to find out there's nothing to do. Simply exit out early if there's no kprobes on the kernel cmdline. If there are kprobes on the cmdline, then someone cares more about tracing over the speed of boot up. - Clean up the trigger code a bit - Move code out of trace.c and into their own files trace.c is now over 11,000 lines of code and has become more difficult to maintain. Start splitting it up so that related code is in their own files. Move all the trace_printk() related code into trace_printk.c. Move the __always_inline stack functions into trace.h. Move the pid filtering code into a new trace_pid.c file. - Better define the max latency and snapshot code The latency tracers have a "max latency" buffer that is a copy of the main buffer and gets swapped with it when a new high latency is detected. This keeps the trace up to the highest latency around where this max_latency buffer is never written to. It is only used to save the last max latency trace. A while ago a snapshot feature was added to tracefs to allow user space to perform the same logic. It could also enable events to trigger a "snapshot" if one of their fields hit a new high. This was built on top of the latency max_latency buffer logic. Because snapshots came later, they were dependent on the latency tracers to be enabled. In reality, the latency tracers depend on the snapshot code and not the other way around. It was just that they came first. Restructure the code and the kconfigs to have the latency tracers depend on snapshot code instead. This actually simplifies the logic a bit and allows to disable more when the latency tracers are not defined and the snapshot code is. - Fix a "false sharing" in the hwlat tracer code The loop to search for latency in hardware was using a variable that could be changed by user space for each sample. If the user change this variable, it could cause a bus contention, and reading that variable can show up as a large latency in the trace causing a false positive. Read this variable at the start of the sample with a READ_ONCE() into a local variable and keep the code from sharing cache lines with readers. - Fix function graph tracer static branch optimization code When only one tracer is defined for function graph tracing, it uses a static branch to call that tracer directly. When another tracer is added, it goes into loop logic to call all the registered callbacks. The code was incorrect when going back to one tracer and never re-enabled the static branch again to do the optimization code. - And other small fixes and cleanups" * tag 'trace-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (46 commits) function_graph: Restore direct mode when callbacks drop to one tracing: Fix indentation of return statement in print_trace_fmt() tracing: Reset last_boot_info if ring buffer is reset tracing: Fix to set write permission to per-cpu buffer_size_kb tracing: Fix false sharing in hwlat get_sample() tracing: Move d_max_latency out of CONFIG_FSNOTIFY protection tracing: Better separate SNAPSHOT and MAX_TRACE options tracing: Add tracer_uses_snapshot() helper to remove #ifdefs tracing: Rename trace_array field max_buffer to snapshot_buffer tracing: Move pid filtering into trace_pid.c tracing: Move trace_printk functions out of trace.c and into trace_printk.c tracing: Use system_state in trace_printk_init_buffers() tracing: Have trace_printk functions use flags instead of using global_trace tracing: Make tracing_update_buffers() take NULL for global_trace tracing: Make printk_trace global for tracing system tracing: Move ftrace_trace_stack() out of trace.c and into trace.h tracing: Move __trace_buffer_{un}lock_*() functions to trace.h tracing: Make tracing_selftest_running global to the tracing subsystem tracing: Make tracing_disabled global for tracing system tracing: Clean up use of trace_create_maxlat_file() ...
2026-01-30bpf: Have __bpf_trace_run() use rcu_read_lock_dont_migrate()Steven Rostedt
In order to switch the protection of tracepoint callbacks from preempt_disable() to srcu_read_lock_fast() the BPF callback from tracepoints needs to have migration prevention as the BPF programs expect to stay on the same CPU as they execute. Put together the RCU protection with migration prevention and use rcu_read_lock_dont_migrate() in __bpf_trace_run(). This will allow tracepoints callbacks to be preemptible. Link: https://lore.kernel.org/all/CAADnVQKvY026HSFGOsavJppm3-Ajm-VsLzY-OeFUe+BaKMRnDg@mail.gmail.com/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20260126231256.335034877@kernel.org Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-24bpf: support fsession for bpf_session_is_returnMenglong Dong
If fsession exists, we will use the bit (1 << BPF_TRAMP_IS_RETURN_SHIFT) in ((u64 *)ctx)[-1] to store the "is_return" flag. The logic of bpf_session_is_return() for fsession is implemented in the verifier by inline following code: bool bpf_session_is_return(void *ctx) { return (((u64 *)ctx)[-1] >> BPF_TRAMP_IS_RETURN_SHIFT) & 1; } Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-5-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24bpf: change prototype of bpf_session_{cookie,is_return}Menglong Dong
Add the function argument of "void *ctx" to bpf_session_cookie() and bpf_session_is_return(), which is a preparation of the next patch. The two kfunc is seldom used now, so it will not introduce much effect to change their function prototype. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20260124062008.8657-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24bpf: use the least significant byte for the nr_args in trampolineMenglong Dong
For now, ((u64 *)ctx)[-1] is used to store the nr_args in the trampoline. However, 1 byte is enough to store such information. Therefore, we use only the least significant byte of ((u64 *)ctx)[-1] to store the nr_args, and reserve the rest for other usages. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-21bpf: support bpf_get_func_arg() for BPF_TRACE_RAW_TPMenglong Dong
For now, bpf_get_func_arg() and bpf_get_func_arg_cnt() is not supported by the BPF_TRACE_RAW_TP, which is not convenient to get the argument of the tracepoint, especially for the case that the position of the arguments in a tracepoint can change. The target tracepoint BTF type id is specified during loading time, therefore we can get the function argument count from the function prototype instead of the stack. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20260121044348.113201-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20bpf: Fix memory access flags in helper prototypesZesen Liu
After commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking"), the verifier started relying on the access type flags in helper function prototypes to perform memory access optimizations. Currently, several helper functions utilizing ARG_PTR_TO_MEM lack the corresponding MEM_RDONLY or MEM_WRITE flags. This omission causes the verifier to incorrectly assume that the buffer contents are unchanged across the helper call. Consequently, the verifier may optimize away subsequent reads based on this wrong assumption, leading to correctness issues. For bpf_get_stack_proto_raw_tp, the original MEM_RDONLY was incorrect since the helper writes to the buffer. Change it to ARG_PTR_TO_UNINIT_MEM which correctly indicates write access to potentially uninitialized memory. Similar issues were recently addressed for specific helpers in commit ac44dcc788b9 ("bpf: Fix verifier assumptions of bpf_d_path's output buffer") and commit 2eb7648558a7 ("bpf: Specify access type of bpf_sysctl_get_name args"). Fix these prototypes by adding the correct memory access flags. Fixes: 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") Co-developed-by: Shuran Liu <electronlsr@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Zesen Liu <ftyghome@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260120-helper_proto-v3-1-27b0180b4e77@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-16bpf: Add __force annotations to silence sparse warningsMykyta Yatsenko
Add __force annotations to casts that convert between __user and kernel address spaces. These casts are intentional: - In bpf_send_signal_common(), the value is stored in si_value.sival_ptr which is typed as void __user *, but the value comes from a BPF program parameter. - In the bpf_*_dynptr() kfuncs, user pointers are cast to const void * before being passed to copy helper functions that correctly handle the user address space through copy_from_user variants. Without __force, sparse reports: warning: cast removes address space '__user' of expression Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260115184509.3585759-1-mykyta.yatsenko5@gmail.com Closes: https://lore.kernel.org/oe-kbuild-all/202601131740.6C3BdBaB-lkp@intel.com/
2026-01-15arm64/ftrace,bpf: Fix partial regs after bpf_prog_runJiri Olsa
Mahe reported issue with bpf_override_return helper not working when executed from kprobe.multi bpf program on arm. The problem is that on arm we use alternate storage for pt_regs object that is passed to bpf_prog_run and if any register is changed (which is the case of bpf_override_return) it's not propagated back to actual pt_regs object. Fixing this by introducing and calling ftrace_partial_regs_update function to propagate the values of changed registers (ip and stack). Reported-by: Mahe Tardy <mahe.tardy@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/bpf/20260112121157.854473-1-jolsa@kernel.org
2025-12-21bpf: move recursion detection logic to helpersPuranjay Mohan
BPF programs detect recursion by doing atomic inc/dec on a per-cpu active counter from the trampoline. Create two helpers for operations on this active counter, this makes it easy to changes the recursion detection logic in future. This commit makes no functional changes. Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20251219184422.2899902-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10bpf: Fix verifier assumptions of bpf_d_path's output bufferShuran Liu
Commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") started distinguishing read vs write accesses performed by helpers. The second argument of bpf_d_path() is a pointer to a buffer that the helper fills with the resulting path. However, its prototype currently uses ARG_PTR_TO_MEM without MEM_WRITE. Before 37cce22dbd51, helper accesses were conservatively treated as potential writes, so this mismatch did not cause issues. Since that commit, the verifier may incorrectly assume that the buffer contents are unchanged across the helper call and base its optimizations on this wrong assumption. This can lead to misbehaviour in BPF programs that read back the buffer, such as prefix comparisons on the returned path. Fix this by marking the second argument of bpf_d_path() as ARG_PTR_TO_MEM | MEM_WRITE so that the verifier correctly models the write to the caller-provided buffer. Fixes: 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") Co-developed-by: Zesen Liu <ftyg@live.com> Signed-off-by: Zesen Liu <ftyg@live.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20251206141210.3148-2-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-29bpf: make kprobe_multi_link_prog_run always_inlineMenglong Dong
Make kprobe_multi_link_prog_run() always inline to obtain better performance. Before this patch, the bench performance is: ./bench trig-kprobe-multi Setting up benchmark 'trig-kprobe-multi'... Benchmark 'trig-kprobe-multi' started. Iter 0 ( 95.485us): hits 62.462M/s ( 62.462M/prod), [...] Iter 1 (-80.054us): hits 62.486M/s ( 62.486M/prod), [...] Iter 2 ( 13.572us): hits 62.287M/s ( 62.287M/prod), [...] Iter 3 ( 76.961us): hits 62.293M/s ( 62.293M/prod), [...] Iter 4 (-77.698us): hits 62.394M/s ( 62.394M/prod), [...] Iter 5 (-13.399us): hits 62.319M/s ( 62.319M/prod), [...] Iter 6 ( 77.573us): hits 62.250M/s ( 62.250M/prod), [...] Summary: hits 62.338 ± 0.083M/s ( 62.338M/prod) And after this patch, the performance is: Iter 0 (454.148us): hits 66.900M/s ( 66.900M/prod), [...] Iter 1 (-435.540us): hits 68.925M/s ( 68.925M/prod), [...] Iter 2 ( 8.223us): hits 68.795M/s ( 68.795M/prod), [...] Iter 3 (-12.347us): hits 68.880M/s ( 68.880M/prod), [...] Iter 4 ( 2.291us): hits 68.767M/s ( 68.767M/prod), [...] Iter 5 ( -1.446us): hits 68.756M/s ( 68.756M/prod), [...] Iter 6 ( 13.882us): hits 68.657M/s ( 68.657M/prod), [...] Summary: hits 68.792 ± 0.087M/s ( 68.792M/prod) As we can see, the performance of kprobe-multi increase from 62M/s to 68M/s. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20251126085246.309942-1-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-27bpf: widen dynptr size/offset to 64 bitMykyta Yatsenko
Dynptr currently caps size and offset at 24 bits, which isn’t sufficient for file-backed use cases; even 32 bits can be limiting. Refactor dynptr helpers/kfuncs to use 64-bit size and offset, ensuring consistency across the APIs. This change does not affect internals of xdp, skb or other dynptrs, which continue to behave as before. Also it does not break binary compatibility. The widening enables large-file access support via dynptr, implemented in the next patches. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20251026203853.135105-3-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-03Merge tag 'pull-f_path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file->f_path constification from Al Viro: "Only one thing was modifying ->f_path of an opened file - acct(2). Massaging that away and constifying a bunch of struct path * arguments in functions that might be given &file->f_path ends up with the situation where we can turn ->f_path into an anon union of const struct path f_path and struct path __f_path, the latter modified only in a few places in fs/{file_table,open,namei}.c, all for struct file instances that are yet to be opened" * tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) Have cc(1) catch attempts to modify ->f_path kernel/acct.c: saner struct file treatment configfs:get_target() - release path as soon as we grab configfs_item reference apparmor/af_unix: constify struct path * arguments ovl_is_real_file: constify realpath argument ovl_sync_file(): constify path argument ovl_lower_dir(): constify path argument ovl_get_verity_digest(): constify path argument ovl_validate_verity(): constify {meta,data}path arguments ovl_ensure_verity_loaded(): constify datapath argument ksmbd_vfs_set_init_posix_acl(): constify path argument ksmbd_vfs_inherit_posix_acl(): constify path argument ksmbd_vfs_kern_path_unlock(): constify path argument ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path * check_export(): constify path argument export_operations->open(): constify path argument rqst_exp_get_by_name(): constify path argument nfs: constify path argument of __vfs_getattr() bpf...d_path(): constify path argument done_path_create(): constify path argument ...
2025-09-24bpf: Allow uprobe program to change context registersJiri Olsa
Currently uprobe (BPF_PROG_TYPE_KPROBE) program can't write to the context registers data. While this makes sense for kprobe attachments, for uprobe attachment it might make sense to be able to change user space registers to alter application execution. Since uprobe and kprobe programs share the same type (BPF_PROG_TYPE_KPROBE), we can't deny write access to context during the program load. We need to check on it during program attachment to see if it's going to be kprobe or uprobe. Storing the program's write attempt to context and checking on it during the attachment. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-18bpf: Move the signature kfuncs to helpers.cKP Singh
No functional changes, except for the addition of the headers for the kfuncs so that they can be used for signature verification. Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250914215141.15144-8-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-15bpf...d_path(): constify path argumentAl Viro
Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-15bpf: Remove migrate_disable in kprobe_multi_link_prog_runTao Chen
Graph tracer framework ensures we won't migrate, kprobe_multi_link_prog_run called all the way from graph tracer, which disables preemption in function_graph_enter_regs, as Jiri and Yonghong suggested, there is no need to use migrate_disable. As a result, some overhead may will be reduced. And add cant_sleep check for __this_cpu_inc_return. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250814121430.2347454-1-chen.dylane@linux.dev
2025-07-16bpf: Clean up individual BTF_ID codeFeng Yang
Use BTF_ID_LIST_SINGLE(a, b, c) instead of BTF_ID_LIST(a) BTF_ID(b, c) Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Link: https://lore.kernel.org/r/20250710055419.70544-1-yangfeng59949@163.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-11bpf: Add attach_type field to bpf_linkTao Chen
Attach_type will be set when a link is created by user. It is better to record attach_type in bpf_link generically and have it available universally for all link types. So add the attach_type field in bpf_link and move the sleepable field to avoid unnecessary gap padding. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250710032038.888700-2-chen.dylane@linux.dev
2025-07-03bpf: Add show_fdinfo for kprobe_multiTao Chen
Show kprobe_multi link info with fdinfo, the info as follows: link_type: kprobe_multi link_id: 1 prog_tag: a69740b9746f7da8 prog_id: 21 kprobe_cnt: 8 missed: 0 cookie func 1 bpf_fentry_test1+0x0/0x20 7 bpf_fentry_test2+0x0/0x20 2 bpf_fentry_test3+0x0/0x20 3 bpf_fentry_test4+0x0/0x20 4 bpf_fentry_test5+0x0/0x20 5 bpf_fentry_test6+0x0/0x20 6 bpf_fentry_test7+0x0/0x20 8 bpf_fentry_test8+0x0/0x10 Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250702153958.639852-3-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03bpf: Add show_fdinfo for uprobe_multiTao Chen
Show uprobe_multi link info with fdinfo, the info as follows: link_type: uprobe_multi link_id: 9 prog_tag: e729f789e34a8eca prog_id: 39 uprobe_cnt: 3 pid: 0 path: /home/dylane/bpf/tools/testing/selftests/bpf/test_progs cookie offset ref_ctr_offset 3 0xa69f13 0x0 1 0xa69f1e 0x0 2 0xa69f29 0x0 Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250702153958.639852-2-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03bpf: Show precise link_type for {uprobe,kprobe}_multi fdinfoTao Chen
Alexei suggested, 'link_type' can be more precise and differentiate for human in fdinfo. In fact BPF_LINK_TYPE_KPROBE_MULTI includes kretprobe_multi type, the same as BPF_LINK_TYPE_UPROBE_MULTI, so we can show it more concretely. link_type: kprobe_multi link_id: 1 prog_tag: d2b307e915f0dd37 ... link_type: kretprobe_multi link_id: 2 prog_tag: ab9ea0545870781d ... link_type: uprobe_multi link_id: 9 prog_tag: e729f789e34a8eca ... link_type: uretprobe_multi link_id: 10 prog_tag: 7db356c03e61a4d4 Co-developed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250702153958.639852-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-17bpf: Fix key serial argument of bpf_lookup_user_key()James Bottomley
The underlying lookup_user_key() function uses a signed 32 bit integer for key serial numbers because legitimate serial numbers are positive (and > 3) and keyrings are negative. Using a u32 for the keyring in the bpf function doesn't currently cause any conversion problems but will start to trip the signed to unsigned conversion warnings when the kernel enables them, so convert the argument to signed (and update the tests accordingly) before it acquires more users. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com> Link: https://lore.kernel.org/r/84cdb0775254d297d75e21f577089f64abdfbd28.camel@HansenPartnership.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-23bpf: Fix error return value in bpf_copy_from_user_dynptrMykyta Yatsenko
On error, copy_from_user returns number of bytes not copied to destination, but current implementation of copy_user_data_sleepable does not handle that correctly and returns it as error value, which may confuse user, expecting meaningful negative error value. Fixes: a498ee7576de ("bpf: Implement dynptr copy kfuncs") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250523181705.261585-1-mykyta.yatsenko5@gmail.com
2025-05-22bpf: Revert "bpf: remove unnecessary rcu_read_{lock,unlock}() in ↵Di Shen
multi-uprobe attach logic" This reverts commit 4a8f635a6054. Althought get_pid_task() internally already calls rcu_read_lock() and rcu_read_unlock(), the find_vpid() was not. The documentation for find_vpid() clearly states: "Must be called with the tasklist_lock or rcu_read_lock() held." Add proper rcu_read_lock/unlock() to protect the find_vpid(). Fixes: 4a8f635a6054 ("bpf: remove unnecessary rcu_read_{lock,unlock}() in multi-uprobe attach logic") Reported-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Di Shen <di.shen@unisoc.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250520054943.5002-1-xuewen.yan@unisoc.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-13bpf: Fix WARN() in get_bpf_raw_tp_regsTao Chen
syzkaller reported an issue: WARNING: CPU: 3 PID: 5971 at kernel/trace/bpf_trace.c:1861 get_bpf_raw_tp_regs+0xa4/0x100 kernel/trace/bpf_trace.c:1861 Modules linked in: CPU: 3 UID: 0 PID: 5971 Comm: syz-executor205 Not tainted 6.15.0-rc5-syzkaller-00038-g707df3375124 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:get_bpf_raw_tp_regs+0xa4/0x100 kernel/trace/bpf_trace.c:1861 RSP: 0018:ffffc90003636fa8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff81c6bc4c RDX: ffff888032efc880 RSI: ffffffff81c6bc83 RDI: 0000000000000005 RBP: ffff88806a730860 R08: 0000000000000005 R09: 0000000000000003 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000004 R13: 0000000000000001 R14: ffffc90003637008 R15: 0000000000000900 FS: 0000000000000000(0000) GS:ffff8880d6cdf000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7baee09130 CR3: 0000000029f5a000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1934 [inline] bpf_get_stack_raw_tp+0x24/0x160 kernel/trace/bpf_trace.c:1931 bpf_prog_ec3b2eefa702d8d3+0x43/0x47 bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline] __bpf_prog_run include/linux/filter.h:718 [inline] bpf_prog_run include/linux/filter.h:725 [inline] __bpf_trace_run kernel/trace/bpf_trace.c:2363 [inline] bpf_trace_run3+0x23f/0x5a0 kernel/trace/bpf_trace.c:2405 __bpf_trace_mmap_lock_acquire_returned+0xfc/0x140 include/trace/events/mmap_lock.h:47 __traceiter_mmap_lock_acquire_returned+0x79/0xc0 include/trace/events/mmap_lock.h:47 __do_trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline] trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline] __mmap_lock_do_trace_acquire_returned+0x138/0x1f0 mm/mmap_lock.c:35 __mmap_lock_trace_acquire_returned include/linux/mmap_lock.h:36 [inline] mmap_read_trylock include/linux/mmap_lock.h:204 [inline] stack_map_get_build_id_offset+0x535/0x6f0 kernel/bpf/stackmap.c:157 __bpf_get_stack+0x307/0xa10 kernel/bpf/stackmap.c:483 ____bpf_get_stack kernel/bpf/stackmap.c:499 [inline] bpf_get_stack+0x32/0x40 kernel/bpf/stackmap.c:496 ____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1941 [inline] bpf_get_stack_raw_tp+0x124/0x160 kernel/trace/bpf_trace.c:1931 bpf_prog_ec3b2eefa702d8d3+0x43/0x47 Tracepoint like trace_mmap_lock_acquire_returned may cause nested call as the corner case show above, which will be resolved with more general method in the future. As a result, WARN_ON_ONCE will be triggered. As Alexei suggested, remove the WARN_ON_ONCE first. Fixes: 9594dc3c7e71 ("bpf: fix nested bpf tracepoints with per-cpu data") Reported-by: syzbot+45b0c89a0fc7ae8dbadc@syzkaller.appspotmail.com Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250513042747.757042-1-chen.dylane@linux.dev Closes: https://lore.kernel.org/bpf/8bc2554d-1052-4922-8832-e0078a033e1d@gmail.com
2025-05-12bpf: Implement dynptr copy kfuncsMykyta Yatsenko
This patch introduces a new set of kfuncs for working with dynptrs in BPF programs, enabling reading variable-length user or kernel data into dynptr directly. To enable memory-safety, verifier allows only constant-sized reads via existing bpf_probe_read_{user|kernel} etc. kfuncs, dynptr-based kfuncs allow dynamically-sized reads without memory safety shortcomings. The following kfuncs are introduced: * `bpf_probe_read_kernel_dynptr()`: probes kernel-space data into a dynptr * `bpf_probe_read_user_dynptr()`: probes user-space data into a dynptr * `bpf_probe_read_kernel_str_dynptr()`: probes kernel-space string into a dynptr * `bpf_probe_read_user_str_dynptr()`: probes user-space string into a dynptr * `bpf_copy_from_user_dynptr()`: sleepable, copies user-space data into a dynptr for the current task * `bpf_copy_from_user_str_dynptr()`: sleepable, copies user-space string into a dynptr for the current task * `bpf_copy_from_user_task_dynptr()`: sleepable, copies user-space data of the task into a dynptr * `bpf_copy_from_user_task_str_dynptr()`: sleepable, copies user-space string of the task into a dynptr The implementation is built on two generic functions: * __bpf_dynptr_copy * __bpf_dynptr_copy_str These functions take function pointers as arguments, enabling the copying of data from various sources, including both kernel and user space. Use __always_inline for generic functions and callbacks to make sure the compiler doesn't generate indirect calls into callbacks, which is more expensive, especially on some kernel configurations. Inlining allows compiler to put direct calls into all the specific callback implementations (copy_user_data_sleepable, copy_user_data_nofault, and so on). Reviewed-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20250512205348.191079-3-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-09bpf: Allow some trace helpers for all prog typesFeng Yang
if it works under NMI and doesn't use any context-dependent things, should be fine for any program type. The detailed discussion is in [1]. [1] https://lore.kernel.org/all/CAEf4Bza6gK3dsrTosk6k3oZgtHesNDSrDd8sdeQ-GiS6oJixQg@mail.gmail.com/ Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/bpf/20250506061434.94277-2-yangfeng59949@163.com
2025-04-23bpf: Streamline allowed helpers between tracing and base setsFeng Yang
Many conditional checks in switch-case are redundant with bpf_base_func_proto and should be removed. Regarding the permission checks bpf_base_func_proto: The permission checks in bpf_prog_load (as outlined below) ensure that the trace has both CAP_BPF and CAP_PERFMON capabilities, thus enabling the use of corresponding prototypes in bpf_base_func_proto without adverse effects. bpf_prog_load ...... bpf_cap = bpf_token_capable(token, CAP_BPF); ...... if (type != BPF_PROG_TYPE_SOCKET_FILTER && type != BPF_PROG_TYPE_CGROUP_SKB && !bpf_cap) goto put_token; ...... if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON)) goto put_token; ...... Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20250423073151.297103-1-yangfeng59949@163.com
2025-04-09bpf: Check link_create.flags parameter for multi_uprobeTao Chen
The link_create.flags are currently not used for multi-uprobes, so return -EINVAL if it is set, same as for other attach APIs. We allow target_fd to have an arbitrary value for multi-uprobe, though, as there are existing users (libbpf) relying on this. Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250407035752.1108927-2-chen.dylane@linux.dev
2025-04-09bpf: Check link_create.flags parameter for multi_kprobeTao Chen
The link_create.flags are currently not used for multi-kprobes, so return -EINVAL if it is set, same as for other attach APIs. We allow target_fd, on the other hand, to have an arbitrary value for multi-kprobe, as there are existing users (libbpf) relying on this. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250407035752.1108927-1-chen.dylane@linux.dev
2025-03-30Merge tag 'modules-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules updates from Petr Pavlu: - Use RCU instead of RCU-sched The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable() in the module code and its users has been replaced with just rcu_read_lock() - The rest of changes are smaller fixes and updates * tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits) MAINTAINERS: Update the MODULE SUPPORT section module: Remove unnecessary size argument when calling strscpy() module: Replace deprecated strncpy() with strscpy() params: Annotate struct module_param_attrs with __counted_by() bug: Use RCU instead RCU-sched to protect module_bug_list. static_call: Use RCU in all users of __module_text_address(). kprobes: Use RCU in all users of __module_text_address(). bpf: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_address(). x86: Use RCU in all users of __module_address(). cfi: Use RCU while invoking __module_address(). powerpc/ftrace: Use RCU in all users of __module_text_address(). LoongArch: ftrace: Use RCU in all users of __module_text_address(). LoongArch/orc: Use RCU in all users of __module_address(). arm64: module: Use RCU in all users of __module_text_address(). ARM: module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_address(). module: Use RCU in search_module_extables(). ...
2025-03-30Merge tag 'bpf-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "For this merge window we're splitting BPF pull request into three for higher visibility: main changes, res_spin_lock, try_alloc_pages. These are the main BPF changes: - Add DFA-based live registers analysis to improve verification of programs with loops (Eduard Zingerman) - Introduce load_acquire and store_release BPF instructions and add x86, arm64 JIT support (Peilin Ye) - Fix loop detection logic in the verifier (Eduard Zingerman) - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet) - Add kfunc for populating cpumask bits (Emil Tsalapatis) - Convert various shell based tests to selftests/bpf/test_progs format (Bastien Curutchet) - Allow passing referenced kptrs into struct_ops callbacks (Amery Hung) - Add a flag to LSM bpf hook to facilitate bpf program signing (Blaise Boscaccy) - Track arena arguments in kfuncs (Ihor Solodrai) - Add copy_remote_vm_str() helper for reading strings from remote VM and bpf_copy_from_user_task_str() kfunc (Jordan Rome) - Add support for timed may_goto instruction (Kumar Kartikeya Dwivedi) - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy) - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup local storage (Martin KaFai Lau) - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko) - Allow retrieving BTF data with BTF token (Mykyta Yatsenko) - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix (Song Liu) - Reject attaching programs to noreturn functions (Yafang Shao) - Introduce pre-order traversal of cgroup bpf programs (Yonghong Song)" * tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits) selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid bpf: Fix out-of-bounds read in check_atomic_load/store() libbpf: Add namespace for errstr making it libbpf_errstr bpf: Add struct_ops context information to struct bpf_prog_aux selftests/bpf: Sanitize pointer prior fclose() selftests/bpf: Migrate test_xdp_vlan.sh into test_progs selftests/bpf: test_xdp_vlan: Rename BPF sections bpf: clarify a misleading verifier error message selftests/bpf: Add selftest for attaching fexit to __noreturn functions bpf: Reject attaching fexit/fmod_ret to __noreturn functions bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage bpf: Make perf_event_read_output accessible in all program types. bpftool: Using the right format specifiers bpftool: Add -Wformat-signedness flag to detect format errors selftests/bpf: Test freplace from user namespace libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID bpf: Return prog btf_id without capable check bpf: BPF token support for BPF_BTF_GET_FD_BY_ID bpf, x86: Fix objtool warning for timed may_goto bpf: Check map->record at the beginning of check_and_free_fields() ...
2025-03-18bpf: Make perf_event_read_output accessible in all program types.Emil Tsalapatis
The perf_event_read_event_output helper is currently only available to tracing protrams, but is useful for other BPF programs like sched_ext schedulers. When the helper is available, provide its bpf_func_proto directly from the bpf base_proto. Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250318030753.10949-1-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-10bpf: Use RCU in all users of __module_text_address().Sebastian Andrzej Siewior
__module_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_address() with RCU. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matt Bobrowski <mattbobrowski@google.com> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: bpf@vger.kernel.org Cc: linux-trace-kernel@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20250129084751.tH6iidUO@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2025-02-26bpf: Fix deadlock between rcu_tasks_trace and event_mutex.Alexei Starovoitov
Fix the following deadlock: CPU A _free_event() perf_kprobe_destroy() mutex_lock(&event_mutex) perf_trace_event_unreg() synchronize_rcu_tasks_trace() There are several paths where _free_event() grabs event_mutex and calls sync_rcu_tasks_trace. Above is one such case. CPU B bpf_prog_test_run_syscall() rcu_read_lock_trace() bpf_prog_run_pin_on_cpu() bpf_prog_load() bpf_tracing_func_proto() trace_set_clr_event() mutex_lock(&event_mutex) Delegate trace_set_clr_event() to workqueue to avoid such lock dependency. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250224221637.4780-1-alexei.starovoitov@gmail.com
2025-02-20bpf: Use preempt_count() directly in bpf_send_signal_common()Hou Tao
bpf_send_signal_common() uses preemptible() to check whether or not the current context is preemptible. If it is preemptible, it will use irq_work to send the signal asynchronously instead of trying to hold a spin-lock, because spin-lock is sleepable under PREEMPT_RT. However, preemptible() depends on CONFIG_PREEMPT_COUNT. When CONFIG_PREEMPT_COUNT is turned off (e.g., CONFIG_PREEMPT_VOLUNTARY=y), !preemptible() will be evaluated as 1 and bpf_send_signal_common() will use irq_work unconditionally. Fix it by unfolding "!preemptible()" and using "preempt_count() != 0 || irqs_disabled()" instead. Fixes: 87c544108b61 ("bpf: Send signals asynchronously if !preemptible") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250220042259.1583319-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-14x86/ibt: Clean up is_endbr()Peter Zijlstra
Pretty much every caller of is_endbr() actually wants to test something at an address and ends up doing get_kernel_nofault(). Fold the lot into a more convenient helper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250207122546.181367417@infradead.org
2025-01-23Merge tag 'bpf-next-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "A smaller than usual release cycle. The main changes are: - Prepare selftest to run with GCC-BPF backend (Ihor Solodrai) In addition to LLVM-BPF runs the BPF CI now runs GCC-BPF in compile only mode. Half of the tests are failing, since support for btf_decl_tag is still WIP, but this is a great milestone. - Convert various samples/bpf to selftests/bpf/test_progs format (Alexis Lothoré and Bastien Curutchet) - Teach verifier to recognize that array lookup with constant in-range index will always succeed (Daniel Xu) - Cleanup migrate disable scope in BPF maps (Hou Tao) - Fix bpf_timer destroy path in PREEMPT_RT (Hou Tao) - Always use bpf_mem_alloc in bpf_local_storage in PREEMPT_RT (Martin KaFai Lau) - Refactor verifier lock support (Kumar Kartikeya Dwivedi) This is a prerequisite for upcoming resilient spin lock. - Remove excessive 'may_goto +0' instructions in the verifier that LLVM leaves when unrolls the loops (Yonghong Song) - Remove unhelpful bpf_probe_write_user() warning message (Marco Elver) - Add fd_array_cnt attribute for prog_load command (Anton Protopopov) This is a prerequisite for upcoming support for static_branch" * tag 'bpf-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (125 commits) selftests/bpf: Add some tests related to 'may_goto 0' insns bpf: Remove 'may_goto 0' instruction in opt_remove_nops() bpf: Allow 'may_goto 0' instruction in verifier selftests/bpf: Add test case for the freeing of bpf_timer bpf: Cancel the running bpf_timer through kworker for PREEMPT_RT bpf: Free element after unlock in __htab_map_lookup_and_delete_elem() bpf: Bail out early in __htab_map_lookup_and_delete_elem() bpf: Free special fields after unlock in htab_lru_map_delete_node() tools: Sync if_xdp.h uapi tooling header libbpf: Work around kernel inconsistently stripping '.llvm.' suffix bpf: selftests: verifier: Add nullness elision tests bpf: verifier: Support eliding map lookup nullness bpf: verifier: Refactor helper access type tracking bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write bpf: verifier: Add missing newline on verbose() call selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED libbpf: Fix return zero when elf_begin failed selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test veristat: Load struct_ops programs only once ...
2025-01-21Merge tag 'ftrace-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Have fprobes built on top of function graph infrastructure The fprobe logic is an optimized kprobe that uses ftrace to attach to functions when a probe is needed at the start or end of the function. The fprobe and kretprobe logic implements a similar method as the function graph tracer to trace the end of the function. That is to hijack the return address and jump to a trampoline to do the trace when the function exits. To do this, a shadow stack needs to be created to store the original return address. Fprobes and function graph do this slightly differently. Fprobes (and kretprobes) has slots per callsite that are reserved to save the return address. This is fine when just a few points are traced. But users of fprobes, such as BPF programs, are starting to add many more locations, and this method does not scale. The function graph tracer was created to trace all functions in the kernel. In order to do this, when function graph tracing is started, every task gets its own shadow stack to hold the return address that is going to be traced. The function graph tracer has been updated to allow multiple users to use its infrastructure. Now have fprobes be one of those users. This will also allow for the fprobe and kretprobe methods to trace the return address to become obsolete. With new technologies like CFI that need to know about these methods of hijacking the return address, going toward a solution that has only one method of doing this will make the kernel less complex. - Cleanup with guard() and free() helpers There were several places in the code that had a lot of "goto out" in the error paths to either unlock a lock or free some memory that was allocated. But this is error prone. Convert the code over to use the guard() and free() helpers that let the compiler unlock locks or free memory when the function exits. - Remove disabling of interrupts in the function graph tracer When function graph tracer was first introduced, it could race with interrupts and NMIs. To prevent that race, it would disable interrupts and not trace NMIs. But the code has changed to allow NMIs and also interrupts. This change was done a long time ago, but the disabling of interrupts was never removed. Remove the disabling of interrupts in the function graph tracer is it is not needed. This greatly improves its performance. - Allow the :mod: command to enable tracing module functions on the kernel command line. The function tracer already has a way to enable functions to be traced in modules by writing ":mod:<module>" into set_ftrace_filter. That will enable either all the functions for the module if it is loaded, or if it is not, it will cache that command, and when the module is loaded that matches <module>, its functions will be enabled. This also allows init functions to be traced. But currently events do not have that feature. Because enabling function tracing can be done very early at boot up (before scheduling is enabled), the commands that can be done when function tracing is started is limited. Having the ":mod:" command to trace module functions as they are loaded is very useful. Update the kernel command line function filtering to allow it. * tag 'ftrace-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits) ftrace: Implement :mod: cache filtering on kernel command line tracing: Adopt __free() and guard() for trace_fprobe.c bpf: Use ftrace_get_symaddr() for kprobe_multi probes ftrace: Add ftrace_get_symaddr to convert fentry_ip to symaddr Documentation: probes: Update fprobe on function-graph tracer selftests/ftrace: Add a test case for repeating register/unregister fprobe selftests: ftrace: Remove obsolate maxactive syntax check tracing/fprobe: Remove nr_maxactive from fprobe fprobe: Add fprobe_header encoding feature fprobe: Rewrite fprobe on function-graph tracer s390/tracing: Enable HAVE_FTRACE_GRAPH_FUNC ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS tracing: Add ftrace_fill_perf_regs() for perf event tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs fprobe: Use ftrace_regs in fprobe exit handler fprobe: Use ftrace_regs in fprobe entry handler fgraph: Pass ftrace_regs to retfunc fgraph: Replace fgraph_ret_regs with ftrace_regs ...
2025-01-21Merge tag 'perf-core-2025-01-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Seqlock optimizations that arose in a perf context and were merged into the perf tree: - seqlock: Add raw_seqcount_try_begin (Suren Baghdasaryan) - mm: Convert mm_lock_seq to a proper seqcount (Suren Baghdasaryan) - mm: Introduce mmap_lock_speculate_{try_begin|retry} (Suren Baghdasaryan) - mm/gup: Use raw_seqcount_try_begin() (Peter Zijlstra) Core perf enhancements: - Reduce 'struct page' footprint of perf by mapping pages in advance (Lorenzo Stoakes) - Save raw sample data conditionally based on sample type (Yabin Cui) - Reduce sampling overhead by checking sample_type in perf_sample_save_callchain() and perf_sample_save_brstack() (Yabin Cui) - Export perf_exclude_event() (Namhyung Kim) Uprobes scalability enhancements: (Andrii Nakryiko) - Simplify find_active_uprobe_rcu() VMA checks - Add speculative lockless VMA-to-inode-to-uprobe resolution - Simplify session consumer tracking - Decouple return_instance list traversal and freeing - Ensure return_instance is detached from the list before freeing - Reuse return_instances between multiple uretprobes within task - Guard against kmemdup() failing in dup_return_instance() AMD core PMU driver enhancements: - Relax privilege filter restriction on AMD IBS (Namhyung Kim) AMD RAPL energy counters support: (Dhananjay Ugwekar) - Introduce topology_logical_core_id() (K Prateek Nayak) - Remove the unused get_rapl_pmu_cpumask() function - Remove the cpu_to_rapl_pmu() function - Rename rapl_pmu variables - Make rapl_model struct global - Add arguments to the init and cleanup functions - Modify the generic variable names to *_pkg* - Remove the global variable rapl_msrs - Move the cntr_mask to rapl_pmus struct - Add core energy counter support for AMD CPUs Intel core PMU driver enhancements: - Support RDPMC 'metrics clear mode' feature (Kan Liang) - Clarify adaptive PEBS processing (Kan Liang) - Factor out functions for PEBS records processing (Kan Liang) - Simplify the PEBS records processing for adaptive PEBS (Kan Liang) Intel uncore driver enhancements: (Kan Liang) - Convert buggy pmu->func_id use to pmu->registered - Support more units on Granite Rapids" * tag 'perf-core-2025-01-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) perf: map pages in advance perf/x86/intel/uncore: Support more units on Granite Rapids perf/x86/intel/uncore: Clean up func_id perf/x86/intel: Support RDPMC metrics clear mode uprobes: Guard against kmemdup() failing in dup_return_instance() perf/x86: Relax privilege filter restriction on AMD IBS perf/core: Export perf_exclude_event() uprobes: Reuse return_instances between multiple uretprobes within task uprobes: Ensure return_instance is detached from the list before freeing uprobes: Decouple return_instance list traversal and freeing uprobes: Simplify session consumer tracking uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution uprobes: simplify find_active_uprobe_rcu() VMA checks mm: introduce mmap_lock_speculate_{try_begin|retry} mm: convert mm_lock_seq to a proper seqcount mm/gup: Use raw_seqcount_try_begin() seqlock: add raw_seqcount_try_begin perf/x86/rapl: Add core energy counter support for AMD CPUs perf/x86/rapl: Move the cntr_mask to rapl_pmus struct perf/x86/rapl: Remove the global variable rapl_msrs ...
2025-01-15bpf: Send signals asynchronously if !preemptiblePuranjay Mohan
BPF programs can execute in all kinds of contexts and when a program running in a non-preemptible context uses the bpf_send_signal() kfunc, it will cause issues because this kfunc can sleep. Change `irqs_disabled()` to `!preemptible()`. Reported-by: syzbot+97da3d7e0112d59971de@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67486b09.050a0220.253251.0084.GAE@google.com/ Fixes: 1bc7896e9ef4 ("bpf: Fix deadlock with rq_lock in bpf_send_signal()") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250115103647.38487-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-13bpf: Use ftrace_get_symaddr() for kprobe_multi probesMasami Hiramatsu (Google)
Add ftrace_get_entry_ip() which is only for ftrace based probes, and use it for kprobe multi probes because they are based on fprobe which uses ftrace instead of kprobes. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/173566081414.878879.10631096557346094362.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-08bpf: Return error for missed kprobe multi bpf program executionJiri Olsa
When kprobe multi bpf program can't be executed due to recursion check, we currently return 0 (success) to fprobe layer where it's ignored for standard kprobe multi probes. For kprobe session the success return value will make fprobe layer to install return probe and try to execute it as well. But the return session probe should not get executed, because the entry part did not run. FWIW the return probe bpf program most likely won't get executed, because its recursion check will likely fail as well, but we don't need to run it in the first place.. also we can make this clear and obvious. It also affects missed counts for kprobe session program execution, which are now doubled (extra count for not executed return probe). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20250106175048.1443905-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-08bpf: Move out synchronize_rcu_tasks_trace from mutex CSPu Lehui
Commit ef1b808e3b7c ("bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors") resolved a possible UAF issue in uprobes that attach non-sleepable bpf prog by explicitly waiting for a tasks-trace-RCU grace period. But, in the current implementation, synchronize_rcu_tasks_trace is included within the mutex critical section, which increases the length of the critical section and may affect performance. So let's move out synchronize_rcu_tasks_trace from mutex CS. Signed-off-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250104013946.1111785-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-26bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabledMasami Hiramatsu (Google)
Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is converted from ftrace_regs by ftrace_partial_regs(), thus some registers may always returns 0. But it should be enough for function entry (access arguments) and exit (access return value). Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/173519000417.391279.14011193569589886419.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Florent Revest <revest@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>