summaryrefslogtreecommitdiff
path: root/net/xfrm
AgeCommit message (Collapse)Author
12 daysMerge tag 'ipsec-2026-03-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-03-23 1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi. From Sabrina Dubroca. 2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the proper check. From Sabrina Dubroca. 3) Call xdo_dev_state_delete during state update to properly cleanup the xdo device state. From Sabrina Dubroca. 4) Fix a potential skb leak in espintcp when async crypto is used. From Sabrina Dubroca. 5) Validate inner IPv4 header length in IPTFS payload to avoid parsing malformed packets. From Roshan Kumar. 6) Fix skb_put() panic on non-linear skb during IPTFS reassembly. From Fernando Fernandez Mancera. 7) Silence various sparse warnings related to RCU, state, and policy handling. From Sabrina Dubroca. 8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini(). From Hyunwoo Kim. 9) Prevent policy_hthresh.work from racing with netns teardown by using a proper cleanup mechanism. From Minwoo Ra. 10) Validate that the family of the source and destination addresses match in pfkey_send_migrate(). From Eric Dumazet. 11) Only publish mode_data after the clone is setup in the IPTFS receive path. This prevents leaving x->mode_data pointing at freed memory on error. From Paul Moses. Please pull or let me know if there are problems. ipsec-2026-03-23 * tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: iptfs: only publish mode_data after clone setup af_key: validate families in pfkey_send_migrate() xfrm: prevent policy_hthresh.work from racing with netns teardown xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() xfrm: avoid RCU warnings around the per-netns netlink socket xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini} xfrm: state: silence sparse warnings during netns exit xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock xfrm: state: fix sparse warnings around XFRM_STATE_INSERT xfrm: state: fix sparse warnings in xfrm_state_init xfrm: state: fix sparse warnings on xfrm_state_hold_rcu xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly xfrm: iptfs: validate inner IPv4 header length in IPTFS payload esp: fix skb leak with espintcp and async crypto xfrm: call xdo_dev_state_delete during state update xfrm: fix the condition on x->pcpu_num in xfrm_sa_len xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi ==================== Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-17xfrm: iptfs: only publish mode_data after clone setupPaul Moses
iptfs_clone_state() stores x->mode_data before allocating the reorder window. If that allocation fails, the code frees the cloned state and returns -ENOMEM, leaving x->mode_data pointing at freed memory. The xfrm clone unwind later runs destroy_state() through x->mode_data, so the failed clone path tears down IPTFS state that clone_state() already freed. Keep the cloned IPTFS state private until all allocations succeed so failed clones leave x->mode_data unset. The destroy path already handles a NULL mode_data pointer. Fixes: 6be02e3e4f37 ("xfrm: iptfs: handle reordering of received packets") Cc: stable@vger.kernel.org Signed-off-by: Paul Moses <p@1g4.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-16xfrm: prevent policy_hthresh.work from racing with netns teardownMinwoo Ra
A XFRM_MSG_NEWSPDINFO request can queue the per-net work item policy_hthresh.work onto the system workqueue. The queued callback, xfrm_hash_rebuild(), retrieves the enclosing struct net via container_of(). If the net namespace is torn down before that work runs, the associated struct net may already have been freed, and xfrm_hash_rebuild() may then dereference stale memory. xfrm_policy_fini() already flushes policy_hash_work during teardown, but it does not synchronize policy_hthresh.work. Synchronize policy_hthresh.work in xfrm_policy_fini() as well, so the queued work cannot outlive the net namespace teardown and access a freed struct net. Fixes: 880a6fab8f6b ("xfrm: configure policy hash table thresholds by netlink") Signed-off-by: Minwoo Ra <raminwo0202@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-13xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()Hyunwoo Kim
After cancel_delayed_work_sync() is called from xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining states via __xfrm_state_delete(), which calls xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work. The following is a simple race scenario: cpu0 cpu1 cleanup_net() [Round 1] ops_undo_list() xfrm_net_exit() xfrm_nat_keepalive_net_fini() cancel_delayed_work_sync(nat_keepalive_work); xfrm_state_fini() xfrm_state_flush() xfrm_state_delete(x) __xfrm_state_delete(x) xfrm_nat_keepalive_state_updated(x) schedule_delayed_work(nat_keepalive_work); rcu_barrier(); net_complete_free(); net_passive_dec(net); llist_add(&net->defer_free_list, &defer_free_list); cleanup_net() [Round 2] rcu_barrier(); net_complete_free() kmem_cache_free(net_cachep, net); nat_keepalive_work() // on freed net To prevent this, cancel_delayed_work_sync() is replaced with disable_delayed_work_sync(). Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: avoid RCU warnings around the per-netns netlink socketSabrina Dubroca
net->xfrm.nlsk is used in 2 types of contexts: - fully under RCU, with rcu_read_lock + rcu_dereference and a NULL check - in the netlink handlers, with requests coming from a userspace socket In the 2nd case, net->xfrm.nlsk is guaranteed to stay non-NULL and the object is alive, since we can't enter the netns destruction path while the user socket holds a reference on the netns. After adding the __rcu annotation to netns_xfrm.nlsk (which silences sparse warnings in the RCU users and __net_init code), we need to tell sparse that the 2nd case is safe. Add a helper for that. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfoSabrina Dubroca
xfrm_input_afinfo is __rcu, we should use rcu_access_pointer to avoid a sparse warning: net/xfrm/xfrm_input.c:78:21: error: incompatible types in comparison expression (different address spaces): net/xfrm/xfrm_input.c:78:21: struct xfrm_input_afinfo const [noderef] __rcu * net/xfrm/xfrm_input.c:78:21: struct xfrm_input_afinfo const * Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfoSabrina Dubroca
xfrm_policy_afinfo is __rcu, use rcu_access_pointer to silence: net/xfrm/xfrm_policy.c:4152:43: error: incompatible types in comparison expression (different address spaces): net/xfrm/xfrm_policy.c:4152:43: struct xfrm_policy_afinfo const [noderef] __rcu * net/xfrm/xfrm_policy.c:4152:43: struct xfrm_policy_afinfo const * Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini}Sabrina Dubroca
In xfrm_policy_init: add rcu_assign_pointer to fix warning: net/xfrm/xfrm_policy.c:4238:29: warning: incorrect type in assignment (different address spaces) net/xfrm/xfrm_policy.c:4238:29: expected struct hlist_head [noderef] __rcu *table net/xfrm/xfrm_policy.c:4238:29: got struct hlist_head * add rcu_dereference_protected to silence warning: net/xfrm/xfrm_policy.c:4265:36: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_policy.c:4265:36: expected struct hlist_head *n net/xfrm/xfrm_policy.c:4265:36: got struct hlist_head [noderef] __rcu *table The netns is being created, no concurrent access is possible yet. In xfrm_policy_fini, net is going away, there shouldn't be any concurrent changes to the hashtables, so we can use rcu_dereference_protected to silence warnings: net/xfrm/xfrm_policy.c:4291:17: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_policy.c:4291:17: expected struct hlist_head const *h net/xfrm/xfrm_policy.c:4291:17: got struct hlist_head [noderef] __rcu *table net/xfrm/xfrm_policy.c:4292:36: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_policy.c:4292:36: expected struct hlist_head *n net/xfrm/xfrm_policy.c:4292:36: got struct hlist_head [noderef] __rcu *table Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: state: silence sparse warnings during netns exitSabrina Dubroca
Silence sparse warnings in xfrm_state_fini: net/xfrm/xfrm_state.c:3327:9: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_state.c:3327:9: expected struct hlist_head const *h net/xfrm/xfrm_state.c:3327:9: got struct hlist_head [noderef] __rcu *state_byseq Add xfrm_state_deref_netexit() to wrap those calls. The netns is going away, we don't have to worry about the state_by* pointers being changed behind our backs. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_protoSabrina Dubroca
xfrm_state_lookup_spi_proto is called under xfrm_state_lock by xfrm_alloc_spi, no need to take a reference on the state and pretend to be under RCU. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: state: add xfrm_state_deref_prot to state_by* walk under lockSabrina Dubroca
We're under xfrm_state_lock for all those walks, we can use xfrm_state_deref_prot to silence sparse warnings such as: net/xfrm/xfrm_state.c:933:17: warning: dereference of noderef expression Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: state: fix sparse warnings around XFRM_STATE_INSERTSabrina Dubroca
We're under xfrm_state_lock in all those cases, use xfrm_state_deref_prot(state_by*) to avoid sparse warnings: net/xfrm/xfrm_state.c:2597:25: warning: cast removes address space '__rcu' of expression net/xfrm/xfrm_state.c:2597:25: warning: incorrect type in argument 2 (different address spaces) net/xfrm/xfrm_state.c:2597:25: expected struct hlist_head *h net/xfrm/xfrm_state.c:2597:25: got struct hlist_head [noderef] __rcu * Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: state: fix sparse warnings in xfrm_state_initSabrina Dubroca
Use rcu_assign_pointer, and tmp variables for freeing on the error path without accessing net->xfrm.state_by*. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-12xfrm: state: fix sparse warnings on xfrm_state_hold_rcuSabrina Dubroca
In all callers, x is not an __rcu pointer. We can drop the annotation to avoid sparse warnings: net/xfrm/xfrm_state.c:58:39: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_state.c:58:39: expected struct refcount_struct [usertype] *r net/xfrm/xfrm_state.c:58:39: got struct refcount_struct [noderef] __rcu * net/xfrm/xfrm_state.c:1166:42: warning: incorrect type in argument 1 (different address spaces) net/xfrm/xfrm_state.c:1166:42: expected struct xfrm_state [noderef] __rcu *x net/xfrm/xfrm_state.c:1166:42: got struct xfrm_state *[assigned] x (repeated for each caller) Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-09xfrm: iptfs: fix skb_put() panic on non-linear skb during reassemblyFernando Fernandez Mancera
In iptfs_reassem_cont(), IP-TFS attempts to append data to the new inner packet 'newskb' that is being reassembled. First a zero-copy approach is tried if it succeeds then newskb becomes non-linear. When a subsequent fragment in the same datagram does not meet the fast-path conditions, a memory copy is performed. It calls skb_put() to append the data and as newskb is non-linear it triggers SKB_LINEAR_ASSERT check. Oops: invalid opcode: 0000 [#1] SMP NOPTI [...] RIP: 0010:skb_put+0x3c/0x40 [...] Call Trace: <IRQ> iptfs_reassem_cont+0x1ab/0x5e0 [xfrm_iptfs] iptfs_input_ordered+0x2af/0x380 [xfrm_iptfs] iptfs_input+0x122/0x3e0 [xfrm_iptfs] xfrm_input+0x91e/0x1a50 xfrm4_esp_rcv+0x3a/0x110 ip_protocol_deliver_rcu+0x1d7/0x1f0 ip_local_deliver_finish+0xbe/0x1e0 __netif_receive_skb_core.constprop.0+0xb56/0x1120 __netif_receive_skb_list_core+0x133/0x2b0 netif_receive_skb_list_internal+0x1ff/0x3f0 napi_complete_done+0x81/0x220 virtnet_poll+0x9d6/0x116e [virtio_net] __napi_poll.constprop.0+0x2b/0x270 net_rx_action+0x162/0x360 handle_softirqs+0xdc/0x510 __irq_exit_rcu+0xe7/0x110 irq_exit_rcu+0xe/0x20 common_interrupt+0x85/0xa0 </IRQ> <TASK> Fix this by checking if the skb is non-linear. If it is, linearize it by calling skb_linearize(). As the initial allocation of newskb originally reserved enough tailroom for the entire reassembled packet we do not need to check if we have enough tailroom or extend it. Fixes: 5f2b6a909574 ("xfrm: iptfs: add skb-fragment sharing code") Reported-by: Hao Long <me@imlonghao.com> Closes: https://lore.kernel.org/netdev/DGRCO9SL0T5U.JTINSHJQ9KPK@imlonghao.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-03-02xfrm: iptfs: validate inner IPv4 header length in IPTFS payloadRoshan Kumar
Add validation of the inner IPv4 packet tot_len and ihl fields parsed from decrypted IPTFS payloads in __input_process_payload(). A crafted ESP packet containing an inner IPv4 header with tot_len=0 causes an infinite loop: iplen=0 leads to capturelen=min(0, remaining)=0, so the data offset never advances and the while(data < tail) loop never terminates, spinning forever in softirq context. Reject inner IPv4 packets where tot_len < ihl*4 or ihl*4 < sizeof(struct iphdr), which catches both the tot_len=0 case and malformed ihl values. The normal IP stack performs this validation in ip_rcv_core(), but IPTFS extracts and processes inner packets before they reach that layer. Reported-by: Roshan Kumar <roshaen09@gmail.com> Fixes: 6c82d2433671 ("xfrm: iptfs: add basic receive packet (tunnel egress) handling") Cc: stable@vger.kernel.org Signed-off-by: Roshan Kumar <roshaen09@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-26Merge tag 'net-7.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...
2026-02-25xfrm: call xdo_dev_state_delete during state updateSabrina Dubroca
When we update an SA, we construct a new state and call xdo_dev_state_add, but never insert it. The existing state is updated, then we immediately destroy the new state. Since we haven't added it, we don't go through the standard state delete code, and we're skipping removing it from the device (but xdo_dev_state_free will get called when we destroy the temporary state). This is similar to commit c5d4d7d83165 ("xfrm: Fix deletion of offloaded SAs on failure."). Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-25xfrm: fix the condition on x->pcpu_num in xfrm_sa_lenSabrina Dubroca
pcpu_num = 0 is a valid value. The marker for "unset pcpu_num" which makes copy_to_user_state_extra not add the XFRMA_SA_PCPU attribute is UINT_MAX. Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-25xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspiSabrina Dubroca
We're returning an error caused by invalid user input without setting an extack. Add one. Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-20Merge tag 'ipsec-2026-02-20' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-02-20 1) Check the value of ipv6_dev_get_saddr() to fix an uninitialized saddr in xfrm6_get_saddr(). From Jiayuan Chen. 2) Skip the templates check for packet offload in tunnel mode. Is was already done by the hardware and causes an unexpected XfrmInTmplMismatch increase. From Leon Romanovsky. 3) Fix a unregister_netdevice stall due to not dropped refcounts by always flushing xfrm state and policy on a NETDEV_UNREGISTER event. From Tetsuo Handa. * tag 'ipsec-2026-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: always flush state and policy upon NETDEV_UNREGISTER event xfrm: skip templates check for packet offload tunnel mode xfrm6: fix uninitialized saddr in xfrm6_get_saddr() ==================== Link: https://patch.msgid.link/20260220094133.14219-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-19espintcp: Fix race condition in espintcp_close()Hyunwoo Kim
This issue was discovered during a code audit. After cancel_work_sync() is called from espintcp_close(), espintcp_tx_work() can still be scheduled from paths such as the Delayed ACK handler or ksoftirqd. As a result, the espintcp_tx_work() worker may dereference a freed espintcp ctx or sk. The following is a simple race scenario: cpu0 cpu1 espintcp_close() cancel_work_sync(&ctx->work); espintcp_write_space() schedule_work(&ctx->work); To prevent this race condition, cancel_work_sync() is replaced with disable_work_sync(). Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/aZSie7rEdh9Nu0eM@v4bel Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10Merge tag 'bpf-next-7.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support associating BPF program with struct_ops (Amery Hung) - Switch BPF local storage to rqspinlock and remove recursion detection counters which were causing false positives (Amery Hung) - Fix live registers marking for indirect jumps (Anton Protopopov) - Introduce execution context detection BPF helpers (Changwoo Min) - Improve verifier precision for 32bit sign extension pattern (Cupertino Miranda) - Optimize BTF type lookup by sorting vmlinux BTF and doing binary search (Donglin Peng) - Allow states pruning for misc/invalid slots in iterator loops (Eduard Zingerman) - In preparation for ASAN support in BPF arenas teach libbpf to move global BPF variables to the end of the region and enable arena kfuncs while holding locks (Emil Tsalapatis) - Introduce support for implicit arguments in kfuncs and migrate a number of them to new API. This is a prerequisite for cgroup sub-schedulers in sched-ext (Ihor Solodrai) - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen) - Fix ORC stack unwind from kprobe_multi (Jiri Olsa) - Speed up fentry attach by using single ftrace direct ops in BPF trampolines (Jiri Olsa) - Require frozen map for calculating map hash (KP Singh) - Fix lock entry creation in TAS fallback in rqspinlock (Kumar Kartikeya Dwivedi) - Allow user space to select cpu in lookup/update operations on per-cpu array and hash maps (Leon Hwang) - Make kfuncs return trusted pointers by default (Matt Bobrowski) - Introduce "fsession" support where single BPF program is executed upon entry and exit from traced kernel function (Menglong Dong) - Allow bpf_timer and bpf_wq use in all programs types (Mykyta Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei Starovoitov) - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their definition across the tree (Puranjay Mohan) - Allow BPF arena calls from non-sleepable context (Puranjay Mohan) - Improve register id comparison logic in the verifier and extend linked registers with negative offsets (Puranjay Mohan) - In preparation for BPF-OOM introduce kfuncs to access memcg events (Roman Gushchin) - Use CFI compatible destructor kfunc type (Sami Tolvanen) - Add bitwise tracking for BPF_END in the verifier (Tianci Cao) - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou Tang) - Make BPF selftests work with 64k page size (Yonghong Song) * tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits) selftests/bpf: Fix outdated test on storage->smap selftests/bpf: Choose another percpu variable in bpf for btf_dump test selftests/bpf: Remove test_task_storage_map_stress_lookup selftests/bpf: Update task_local_storage/task_storage_nodeadlock test selftests/bpf: Update task_local_storage/recursion test selftests/bpf: Update sk_storage_omem_uncharge test bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} bpf: Support lockless unlink when freeing map or local storage bpf: Prepare for bpf_selem_unlink_nofail() bpf: Remove unused percpu counter from bpf_local_storage_map_free bpf: Remove cgroup local storage percpu counter bpf: Remove task local storage percpu counter bpf: Change local_storage->lock and b->lock to rqspinlock bpf: Convert bpf_selem_unlink to failable bpf: Convert bpf_selem_link_map to failable bpf: Convert bpf_selem_unlink_map to failable bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage selftests/xsk: fix number of Tx frags in invalid packet selftests/xsk: properly handle batch ending in the middle of a packet bpf: Prevent reentrance into call_rcu_tasks_trace() ...
2026-02-09xfrm: always flush state and policy upon NETDEV_UNREGISTER eventTetsuo Handa
syzbot is reporting that "struct xfrm_state" refcount is leaking. unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 2 ref_tracker: netdev@ffff888052f24618 has 1/1 users at __netdev_tracker_alloc include/linux/netdevice.h:4400 [inline] netdev_tracker_alloc include/linux/netdevice.h:4412 [inline] xfrm_dev_state_add+0x3a5/0x1080 net/xfrm/xfrm_device.c:316 xfrm_state_construct net/xfrm/xfrm_user.c:986 [inline] xfrm_add_sa+0x34ff/0x5fa0 net/xfrm/xfrm_user.c:1022 xfrm_user_rcv_msg+0x58e/0xc00 net/xfrm/xfrm_user.c:3507 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2550 xfrm_netlink_rcv+0x71/0x90 net/xfrm/xfrm_user.c:3529 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa5d/0xc30 net/socket.c:2592 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2646 __sys_sendmsg+0x16d/0x220 net/socket.c:2678 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f This is because commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") implemented xfrm_dev_unregister() as no-op despite xfrm_dev_state_add() from xfrm_state_construct() acquires a reference to "struct net_device". I guess that that commit expected that NETDEV_DOWN event is fired before NETDEV_UNREGISTER event fires, and also assumed that xfrm_dev_state_add() is called only if (dev->features & NETIF_F_HW_ESP) != 0. Sabrina Dubroca identified steps to reproduce the same symptoms as below. echo 0 > /sys/bus/netdevsim/new_device dev=$(ls -1 /sys/bus/netdevsim/devices/netdevsim0/net/) ip xfrm state add src 192.168.13.1 dst 192.168.13.2 proto esp \ spi 0x1000 mode tunnel aead 'rfc4106(gcm(aes))' $key 128 \ offload crypto dev $dev dir out ethtool -K $dev esp-hw-offload off echo 0 > /sys/bus/netdevsim/del_device Like these steps indicate, the NETIF_F_HW_ESP bit can be cleared after xfrm_dev_state_add() acquired a reference to "struct net_device". Also, xfrm_dev_state_add() does not check for the NETIF_F_HW_ESP bit when acquiring a reference to "struct net_device". Commit 03891f820c21 ("xfrm: handle NETDEV_UNREGISTER for xfrm device") re-introduced the NETDEV_UNREGISTER event to xfrm_dev_event(), but that commit for unknown reason chose to share xfrm_dev_down() between the NETDEV_DOWN event and the NETDEV_UNREGISTER event. I guess that that commit missed the behavior in the previous paragraph. Therefore, we need to re-introduce xfrm_dev_unregister() in order to release the reference to "struct net_device" by unconditionally flushing state and policy. Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-02-02xfrm: skip templates check for packet offload tunnel modeLeon Romanovsky
In packet offload, hardware is responsible to check templates. The result of its operation is forwarded through secpath by relevant drivers. That secpath is actually removed in __xfrm_policy_check2(). In case packet is forwarded, this secpath is reset in RX, but pushed again to TX where policy is rechecked again against dummy secpath in xfrm_policy_ok(). Such situation causes to unexpected XfrmInTmplMismatch increase. As a solution, simply skip template mismatch check. Fixes: 600258d555f0 ("xfrm: delete intermediate secpath entry in packet offload mode") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-01-02bpf: xfrm: drop dead NULL check in bpf_xdp_get_xfrm_state()Puranjay Mohan
As KF_TRUSTED_ARGS is now considered the default for all kfuncs, the opts parameter in bpf_xdp_get_xfrm_state() can never be NULL. Verifier will detect this at load time and will not allow passing NULL to this function. This matches the documentation above the kfunc that says this parameter (opts) Cannot be NULL. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260102180038.2708325-5-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-15xfrm: set ipv4 no_pmtu_disc flag only on output sa when direction is setAntony Antony
The XFRM_STATE_NOPMTUDISC flag is only meaningful for output SAs, but it was being applied regardless of the SA direction when the sysctl ip_no_pmtu_disc is enabled. This can unintentionally affect input SAs. Limit setting XFRM_STATE_NOPMTUDISC to output SAs when the SA direction is configured. Closes: https://github.com/strongswan/strongswan/issues/2946 Fixes: a4a87fa4e96c ("xfrm: Add Direction to the SA in or out") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18Merge tag 'ipsec-2025-11-18' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2025-11-18 1) Misc fixes for xfrm_state creation/modification/deletion. Patchset from Sabrina Dubroca. 2) Fix inner packet family determination for xfrm offloads. From Jianbo Liu. 3) Don't push locally generated packets directly to L2 tunnel mode offloading, they still need processing from the standard xfrm path. From Jianbo Liu. 4) Fix memory leaks in xfrm_add_acquire for policy offloads and policy security contexts. From Zilin Guan. * tag 'ipsec-2025-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix memory leak in xfrm_add_acquire() xfrm: Prevent locally generated packets from direct output in tunnel mode xfrm: Determine inner GSO type from packet inner protocol xfrm: Check inner packet family directly from skb_dst xfrm: check all hash buckets for leftover states during netns deletion xfrm: set err and extack on failure to create pcpu SA xfrm: call xfrm_dev_state_delete when xfrm_state_migrate fails to add the state xfrm: make state as DEAD before final put when migrate fails xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added xfrm: drop SA reference in xfrm_state_update if dir doesn't match ==================== Link: https://patch.msgid.link/20251118085344.2199815-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14xfrm: fix memory leak in xfrm_add_acquire()Zilin Guan
The xfrm_add_acquire() function constructs an xfrm policy by calling xfrm_policy_construct(). This allocates the policy structure and potentially associates a security context and a device policy with it. However, at the end of the function, the policy object is freed using only kfree() . This skips the necessary cleanup for the security context and device policy, leading to a memory leak. To fix this, invoke the proper cleanup functions xfrm_dev_policy_delete(), xfrm_dev_policy_free(), and security_xfrm_policy_free() before freeing the policy object. This approach mirrors the error handling path in xfrm_add_policy(), ensuring that all associated resources are correctly released. Fixes: 980ebd25794f ("[IPSEC]: Sync series - acquire insert") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-30xfrm: Prevent locally generated packets from direct output in tunnel modeJianbo Liu
Add a check to ensure locally generated packets (skb->sk != NULL) do not use direct output in tunnel mode, as these packets require proper L2 header setup that is handled by the normal XFRM processing path. Fixes: 5eddd76ec2fd ("xfrm: fix tunnel mode TX datapath in packet offload mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-30xfrm: Check inner packet family directly from skb_dstJianbo Liu
In the output path, xfrm_dev_offload_ok and xfrm_get_inner_ipproto need to determine the protocol family of the inner packet (skb) before it gets encapsulated. In xfrm_dev_offload_ok, the code checked x->inner_mode.family. This is unreliable because, for states handling both IPv4 and IPv6, the relevant inner family could be either x->inner_mode.family or x->inner_mode_iaf.family. Checking only the former can lead to a mismatch with the actual packet being processed. In xfrm_get_inner_ipproto, the code checked x->outer_mode.family. This is also incorrect for tunnel mode, as the inner packet's family can be different from the outer header's family. At both of these call sites, the skb variable holds the original inner packet. The most direct and reliable source of truth for its protocol family is its destination entry. This patch fixes the issue by using skb_dst(skb)->ops->family to ensure protocol-specific headers are only accessed for the correct packet type. Fixes: 91d8a53db219 ("xfrm: fix offloading of cross-family tunnels") Fixes: 45a98ef4922d ("net/xfrm: IPsec tunnel mode fix inner_ipproto setting in sec_path") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-30pfkey: Deprecate pfkeySteffen Klassert
The pfkey user configuration interface was replaced by the netlink user configuration interface more than a decade ago. In between all maintained IKE implementations moved to the netlink interface. So let config NET_KEY default to no in Kconfig. The pfkey code will be removed in a second step. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Antony Antony <antony.antony@secunet.com> Acked-by: Tobias Brunner <tobias@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Tuomo Soini <tis@foobar.fi> Acked-by: Paul Wouters <paul@nohats.ca>
2025-10-27xfrm: Skip redundant replay recheck for the hardware offload pathJianbo Liu
The xfrm_replay_recheck() function was introduced to handle the issues arising from asynchronous crypto algorithms. The crypto offload path is now effectively synchronous, as it holds the state lock throughout its operation. This eliminates the race condition, making the recheck an unnecessary overhead. This patch improves performance by skipping the redundant call when crypto_done is true. Additionally, the sequence number assignment is moved to an earlier point in the function. This improves performance by reducing lock contention and places the logic at a more appropriate point, as the full sequence number (including the higher-order bits) can be determined as soon as the packet is received. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-27xfrm: Refactor xfrm_input lock to reduce contention with RSSJianbo Liu
With newer NICs like mlx5 supporting RSS for IPsec crypto offload, packets for a single Security Association (SA) are scattered across multiple CPU cores for parallel processing. The xfrm_state spinlock (x->lock) is held for each packet during xfrm processing. When multiple connections or flows share the same SA, this parallelism causes high lock contention on x->lock, creating a performance bottleneck and limiting scalability. The original xfrm_input() function exacerbated this issue by releasing and immediately re-acquiring x->lock. For hardware crypto offload paths, this unlock/relock sequence is unnecessary and introduces significant overhead. This patch refactors the function to relocate the type_offload->input_tail call for the offload path, performing all necessary work while continuously holding the lock. This reordering is safe, since packets which don't pass the checks below will still fail them with the new code. Performance testing with iperf using multiple parallel streams over a single IPsec SA shows significant improvement in throughput as the number of queues (and thus CPU cores) increases: +-----------+---------------+--------------+-----------------+ | RX queues | Before (Gbps) | After (Gbps) | Improvement (%) | +-----------+---------------+--------------+-----------------+ | 2 | 32.3 | 34.4 | 6.5 | | 4 | 34.4 | 40.0 | 16.3 | | 6 | 24.5 | 38.3 | 56.3 | | 8 | 23.1 | 38.3 | 65.8 | | 12 | 18.1 | 29.9 | 65.2 | | 16 | 16.0 | 25.2 | 57.5 | +-----------+---------------+--------------+-----------------+ Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-23espintcp: use datagram_poll_queue for socket readinessRalf Lici
espintcp uses a custom queue (ike_queue) to deliver packets to userspace. The polling logic relies on datagram_poll, which checks sk_receive_queue, which can lead to false readiness signals when that queue contains non-userspace packets. Switch espintcp_poll to use datagram_poll_queue with ike_queue, ensuring poll only signals readiness when userspace data is actually available. Fixes: e27cca96cd68 ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251021100942.195010-3-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-21xfrm: check all hash buckets for leftover states during netns deletionSabrina Dubroca
The current hlist_empty checks only test the first bucket of each hashtable, ignoring any other bucket. They should be caught by the WARN_ON for state_all, but better to make all the checks accurate. Fixes: 73d189dce486 ("netns xfrm: per-netns xfrm_state_bydst hash") Fixes: d320bbb306f2 ("netns xfrm: per-netns xfrm_state_bysrc hash") Fixes: b754a4fd8f58 ("netns xfrm: per-netns xfrm_state_byspi hash") Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-21xfrm: set err and extack on failure to create pcpu SASabrina Dubroca
xfrm_state_construct can fail without setting an error if the requested pcpu_num value is too big. Set err and add an extack message to avoid confusing userspace. Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-21xfrm: call xfrm_dev_state_delete when xfrm_state_migrate fails to add the stateSabrina Dubroca
In case xfrm_state_migrate fails after calling xfrm_dev_state_add, we directly release the last reference and destroy the new state, without calling xfrm_dev_state_delete (this only happens in __xfrm_state_delete, which we're not calling on this path, since the state was never added). Call xfrm_dev_state_delete on error when an offload configuration was provided. Fixes: ab244a394c7f ("xfrm: Migrate offload configuration") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-21xfrm: make state as DEAD before final put when migrate failsSabrina Dubroca
xfrm_state_migrate/xfrm_state_clone_and_setup create a new state, and call xfrm_state_put to destroy it in case of failure. __xfrm_state_destroy expects the state to be in XFRM_STATE_DEAD, but we currently don't do that. Reported-by: syzbot+5cd6299ede4d4f70987b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5cd6299ede4d4f70987b Fixes: 78347c8c6b2d ("xfrm: Fix xfrm_state_migrate leak") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-21xfrm: also call xfrm_state_delete_tunnel at destroy time for states that ↵Sabrina Dubroca
were never added In commit b441cf3f8c4b ("xfrm: delete x->tunnel as we delete x"), I missed the case where state creation fails between full initialization (->init_state has been called) and being inserted on the lists. In this situation, ->init_state has been called, so for IPcomp tunnels, the fallback tunnel has been created and added onto the lists, but the user state never gets added, because we fail before that. The user state doesn't go through __xfrm_state_delete, so we don't call xfrm_state_delete_tunnel for those states, and we end up leaking the FB tunnel. There are several codepaths affected by this: the add/update paths, in both net/key and xfrm, and the migrate code (xfrm_migrate, xfrm_state_migrate). A "proper" rollback of the init_state work would probably be doable in the add/update code, but for migrate it gets more complicated as multiple states may be involved. At some point, the new (not-inserted) state will be destroyed, so call xfrm_state_delete_tunnel during xfrm_state_gc_destroy. Most states will have their fallback tunnel cleaned up during __xfrm_state_delete, which solves the issue that b441cf3f8c4b (and other patches before it) aimed at. All states (including FB tunnels) will be removed from the lists once xfrm_state_fini has called flush_work(&xfrm_state_gc_work). Reported-by: syzbot+999eb23467f83f9bf9bf@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=999eb23467f83f9bf9bf Fixes: b441cf3f8c4b ("xfrm: delete x->tunnel as we delete x") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-10-21xfrm: drop SA reference in xfrm_state_update if dir doesn't matchSabrina Dubroca
We're not updating x1, but we still need to put() it. Fixes: a4a87fa4e96c ("xfrm: Add Direction to the SA in or out") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-09-26Merge tag 'ipsec-next-2025-09-26' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2025-09-26 1) Fix field-spanning memcpy warning in AH output. From Charalampos Mitrodimas. 2) Replace the strcpy() calls for alg_name by strscpy(). From Miguel García. * tag 'ipsec-next-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: xfrm_user: use strscpy() for alg_name net: ipv6: fix field-spanning memcpy warning in AH output ==================== Link: https://patch.msgid.link/20250926053025.2242061-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.17-rc8). Conflicts: drivers/net/can/spi/hi311x.c 6b6968084721 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled") 27ce71e1ce81 ("net: WQ_PERCPU added to alloc_workqueue users") https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org net/smc/smc_loopback.c drivers/dibs/dibs_loopback.c a35c04de2565 ("net/smc: fix warning in smc_rx_splice() when calling get_page()") cc21191b584c ("dibs: Move data path to dibs layer") https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org Adjacent changes: drivers/net/dsa/lantiq/lantiq_gswip.c c0054b25e2f1 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()") 7a1eaef0a791 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-15xfrm: fix offloading of cross-family tunnelsSabrina Dubroca
Xiumei reported a regression in IPsec offload tests over xfrmi, where the traffic for IPv6 over IPv4 tunnels is processed in SW instead of going through crypto offload, after commit cc18f482e8b6 ("xfrm: provide common xdo_dev_offload_ok callback implementation"). Commit cc18f482e8b6 added a generic version of existing checks attempting to prevent packets with IPv4 options or IPv6 extension headers from being sent to HW that doesn't support offloading such packets. The check mistakenly uses x->props.family (the outer family) to determine the inner packet's family and verify if options/extensions are present. In the case of IPv6 over IPv4, the check compares some of the traffic class bits to the expected no-options ihl value (5). The original check was introduced in commit 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path"), and then duplicated in the other drivers. Before commit cc18f482e8b6, the loose check (ihl > 5) passed because those traffic class bits were not set to a value that triggered the no-offload codepath. Packets with options/extension headers that should have been handled in SW went through the offload path, and were likely dropped by the NIC or incorrectly processed. Since commit cc18f482e8b6, the check is now strict (ihl != 5), and in a basic setup (no traffic class configured), all packets go through the no-offload codepath. The commits that introduced the incorrect family checks in each driver are: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path") 8362ea16f69f ("crypto: chcr - ESN for Inline IPSec Tx") 859a497fe80c ("nfp: implement xfrm callbacks and expose ipsec offload feature to upper layer") 32188be805d0 ("cn10k-ipsec: Allow ipsec crypto offload for skb with SA") [ixgbe/ixgbevf commits are ignored, as that HW does not support tunnel mode, thus no cross-family setups are possible] Fixes: cc18f482e8b6 ("xfrm: provide common xdo_dev_offload_ok callback implementation") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-09-08xfrm: snmp: do not use SNMP_MIB_SENTINEL anymoreEric Dumazet
Use ARRAY_SIZE(), so that we know the limit at compile time. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250905165813.1470708-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-01xfrm: xfrm_alloc_spi shouldn't use 0 as SPISabrina Dubroca
x->id.spi == 0 means "no SPI assigned", but since commit 94f39804d891 ("xfrm: Duplicate SPI Handling"), we now create states and add them to the byspi list with this value. __xfrm_state_delete doesn't remove those states from the byspi list, since they shouldn't be there, and this shows up as a UAF the next time we go through the byspi list. Reported-by: syzbot+a25ee9d20d31e483ba7b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a25ee9d20d31e483ba7b Fixes: 94f39804d891 ("xfrm: Duplicate SPI Handling") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>