| Age | Commit message (Collapse) | Author |
|
- Build zero-sized resources when a BAR is larger than 4G but
pci_bus_addr_t or resource_size_t can't represent 64-bit addresses (Ilpo
Järvinen)
- Fix bridge window alignment with optional resources, where we previously
lost the additional alignment requirement (Ilpo Järvinen)
- Stop over-estimating bridge window size since we now assign them without
any gaps between them (Ilpo Järvinen)
- Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening for
nested bridges and endpoints (Ilpo Järvinen)
- Remove old_size limit from bridge window sizing (Ilpo Järvinen)
- Push realloc check into pbus_size_mem() to simplify callers (Ilpo
Järvinen)
- Pass bridge window resource to pbus_size_mem() to avoid looking it up
again (Ilpo Järvinen)
- Use res_to_dev_res() instead of open-coding the same search (Ilpo
Järvinen)
- Add pci_resource_is_bridge_win() helper (Ilpo Järvinen)
- Add more logging of resource assignment (Ilpo Järvinen)
- Add pbus_mem_size_optional() to handle sizes of optional resources
(SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen)
- Move CardBus code to setup-cardbus.c and only build it when
CONFIG_CARDBUS is set (Ilpo Järvinen)
- Use scnprintf() instead of sprintf() (Ilpo Järvinen)
- Add pbus_validate_busn() for Bus Number validation (Ilpo Järvinen)
- Don't claim disabled bridge windows to avoid spurious claim failures
(Ilpo Järvinen)
* pci/resource:
PCI: Don't claim disabled bridge windows
PCI: Move CardBus bridge scanning to setup-cardbus.c
PCI: Add pbus_validate_busn() for Bus Number validation
PCI: Add dword #defines for Bus Number + Secondary Latency Timer
PCI: Use scnprintf() instead of sprintf()
PCI: Handle CardBus-specific params in setup-cardbus.c
PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS
PCI: Add 'pci' prefix to struct pci_dev_resource handling functions
PCI: Use resource_assigned() in setup-bus.c algorithm
resource: Mark res given to resource_assigned() as const
PCI: Add pbus_mem_size_optional() to handle optional sizes
PCI: Check invalid align earlier in pbus_size_mem()
PCI: Log reset and restore of resources
PCI: Add pci_resource_is_bridge_win()
PCI: Fetch dev_res to local var in __assign_resources_sorted()
PCI: Use res_to_dev_res() in reassign_resources_sorted()
PCI: Pass bridge window resource to pbus_size_mem()
PCI: Push realloc check into pbus_size_mem()
PCI: Remove old_size limit from bridge window sizing
resource: Increase MAX_IORES_LEVEL to 8
PCI: Stop over-estimating bridge window size
PCI: Rewrite bridge window head alignment function
PCI: Fix bridge window alignment with optional resources
PCI: Use resource_set_range() that correctly sets ->end
|
|
- Rename pwrseq, tc9563, and slot driver structs, variables, and functions
for consistency (Bjorn Helgaas)
- Add power_on/off callbacks with generic signature to pwrseq, tc9563, and
slot drivers so they can be used by pwrctrl core (Manivannan Sadhasivam)
- Add interfaces to create and destroy pwrctrl devices (Krishna Chaitanya
Chundru)
- Add interfaces to power devices on and off (Manivannan Sadhasivam)
- Switch to pwrctrl interfaces to create, destroy, and power on/off
devices, calling them from host controller drivers instead of the PCI
core (Manivannan Sadhasivam)
- Drop qcom .assert_perst() callbacks since this is now done by the
controller driver instead of the pwrctrl driver (Manivannan Sadhasivam)
- Add PCIe M.2 connector support to the slot pwrctrl driver (Manivannan
Sadhasivam)
- Create pwrctrl devices for devicetree PCIe M.2 connector nodes
(Manivannan Sadhasivam)
* pci/pwrctrl:
PCI/pwrctrl: Create pwrctrl device if graph port is found
PCI/pwrctrl: Add PCIe M.2 connector support
PCI: Drop the assert_perst() callback
PCI: qcom: Drop the assert_perst() callbacks
PCI/pwrctrl: Switch to pwrctrl create, power on/off, destroy APIs
PCI/pwrctrl: Add APIs to power on/off pwrctrl devices
PCI/pwrctrl: Add APIs to create, destroy pwrctrl devices
PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks
PCI/pwrctrl: pwrseq: Factor out power on/off code to helpers
PCI/pwrctrl: slot: Factor out power on/off code to helpers
PCI/pwrctrl: tc9563: Rename private struct and pointers for consistency
PCI/pwrctrl: tc9563: Add local variables to reduce repetition
PCI/pwrctrl: tc9563: Clean up whitespace
PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter()
PCI/pwrctrl: slot: Rename private struct and pointers for consistency
PCI/pwrctrl: pwrseq: Rename private struct and pointers for consistency
# Conflicts:
# drivers/pci/bus.c
|
|
- Add PCI_BRIDGE_NO_ALIAS quirk for ASPEED AST1150, where VGA and USB are
behind a PCIe-to-PCI bridge and share the same StreamID (Nirmoy Das)
* pci/iommu:
PCI: Add PCI_BRIDGE_NO_ALIAS quirk for ASPEED AST1150
PCI: Add ASPEED vendor ID to pci_ids.h
|
|
The commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as
PCIe BW controller") was found to lead to a boot hang on a Intel P45
system. Testing without setting Link Bandwidth Management Interrupt Enable
(LBMIE) and Link Autonomous Bandwidth Interrupt Enable (LABIE) (PCIe r7.0,
sec 7.5.3.7) in bwctrl allowed system to come up.
P45 is a very old chipset and supports only up to gen2 PCIe, so not having
bwctrl does not seem a huge deficiency.
Add no_bw_notif in struct pci_dev and quirk Intel P45 Root Port with it.
Reported-by: Adam Stylinski <kungfujesus06@gmail.com>
Link: https://lore.kernel.org/linux-pci/aUCt1tHhm_-XIVvi@eggsbenedict/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Adam Stylinski <kungfujesus06@gmail.com>
Link: https://patch.msgid.link/20260116131513.2359-1-ilpo.jarvinen@linux.intel.com
|
|
The ACS Capability register is read-only. Cache it to allow quirks to
override it and to avoid re-reading it.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://patch.msgid.link/20260102-pci_acs-v3-2-72280b94d288@oss.qualcomm.com
|
|
Take care of rqspinlock error in bpf_local_storage_{map_free, destroy}()
properly by switching to bpf_selem_unlink_nofail().
Both functions iterate their own RCU-protected list of selems and call
bpf_selem_unlink_nofail(). In map_free(), to prevent infinite loop when
both map_free() and destroy() fail to remove a selem from b->list
(extremely unlikely), switch to hlist_for_each_entry_rcu(). In destroy(),
also switch to hlist_for_each_entry_rcu() since we no longer iterate
local_storage->list under local_storage->lock.
bpf_selem_unlink() now becomes dedicated to helpers and syscalls paths
so reuse_now should always be false. Remove it from the argument and
hardcode it.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-12-ameryhung@gmail.com
|
|
Introduce bpf_selem_unlink_nofail() to properly handle errors returned
from rqspinlock in bpf_local_storage_map_free() and
bpf_local_storage_destroy() where the operation must succeeds.
The idea of bpf_selem_unlink_nofail() is to allow an selem to be
partially linked and use atomic operation on a bit field, selem->state,
to determine when and who can free the selem if any unlink under lock
fails. An selem initially is fully linked to a map and a local storage.
Under normal circumstances, bpf_selem_unlink_nofail() will be able to
grab locks and unlink a selem from map and local storage in sequeunce,
just like bpf_selem_unlink(), and then free it after an RCU grace period.
However, if any of the lock attempts fails, it will only clear
SDATA(selem)->smap or selem->local_storage depending on the caller and
set SELEM_MAP_UNLINKED or SELEM_STORAGE_UNLINKED according to the
caller. Then, after both map_free() and destroy() see the selem and the
state becomes SELEM_UNLINKED, one of two racing caller can succeed in
cmpxchg the state from SELEM_UNLINKED to SELEM_TOFREE, ensuring no
double free or memory leak.
To make sure bpf_obj_free_fields() is done only once and when map is
still present, it is called when unlinking an selem from b->list under
b->lock.
To make sure uncharging memory is done only when the owner is still
present in map_free(), block destroy() from returning until there is no
pending map_free().
Since smap may not be valid in destroy(), bpf_selem_unlink_nofail()
skips bpf_selem_unlink_storage_nolock_misc() when called from destroy().
This is okay as bpf_local_storage_destroy() will return the remaining
amount of memory charge tracked by mem_charge to the owner to uncharge.
It is also safe to skip clearing local_storage->owner and owner_storage
as the owner is being freed and no users or bpf programs should be able
to reference the owner and using local_storage.
Finally, access of selem, SDATA(selem)->smap and selem->local_storage
are racy. Callers will protect these fields with RCU.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-11-ameryhung@gmail.com
|
|
The next patch will introduce bpf_selem_unlink_nofail() to handle
rqspinlock errors. bpf_selem_unlink_nofail() will allow an selem to be
partially unlinked from map or local storage. Save memory allocation
method in selem so that later an selem can be correctly freed even when
SDATA(selem)->smap is init to NULL.
In addition, keep track of memory charge to the owner in local storage
so that later bpf_selem_unlink_nofail() can return the correct memory
charge to the owner. Updating local_storage->mem_charge is protected by
local_storage->lock.
Finally, extract miscellaneous tasks performed when unlinking an selem
from local_storage into bpf_selem_unlink_storage_nolock_misc(). It will
be reused by bpf_selem_unlink_nofail().
This patch also takes the chance to remove local_storage->smap, which
is no longer used since commit f484f4a3e058 ("bpf: Replace bpf memory
allocator with kmalloc_nolock() in local storage").
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-10-ameryhung@gmail.com
|
|
Percpu locks have been removed from cgroup and task local storage. Now
that all local storage no longer use percpu variables as locks preventing
recursion, there is no need to pass them to bpf_local_storage_map_free().
Remove the argument from the function.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-9-ameryhung@gmail.com
|
|
Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock
from raw_spin_lock to rqspinlock.
Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall
return or BPF helper return.
In bpf_local_storage_destroy(), ignore return from
raw_res_spin_lock_irqsave() for now. A later patch will correctly
handle errors correctly in bpf_local_storage_destroy() so that it can
unlink selems even when failing to acquire locks.
For __bpf_local_storage_map_cache(), instead of handling the error,
skip updating the cache.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-6-ameryhung@gmail.com
|
|
To prepare changing both bpf_local_storage_map_bucket::lock and
bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to
failable. It still always succeeds and returns 0 until the change
happens. No functional change.
Open code bpf_selem_unlink_storage() in the only caller,
bpf_selem_unlink(), since unlink_map and unlink_storage must be done
together after all the necessary locks are acquired.
For bpf_local_storage_map_free(), ignore the return from
bpf_selem_unlink() for now. A later patch will allow it to unlink selems
even when failing to acquire locks.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-5-ameryhung@gmail.com
|
|
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock,
convert bpf_selem_link_map() to failable. It still always succeeds and
returns 0 until the change happens. No functional change.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-4-ameryhung@gmail.com
|
|
A later bpf_local_storage refactor will acquire all locks before
performing any update. To simplified the number of locks needed to take
in bpf_local_storage_map_update(), determine the bucket based on the
local_storage an selem belongs to instead of the selem pointer.
Currently, when a new selem needs to be created to replace the old selem
in bpf_local_storage_map_update(), locks of both buckets need to be
acquired to prevent racing. This can be simplified if the two selem
belongs to the same bucket so that only one bucket needs to be locked.
Therefore, instead of hashing selem, hashing the local_storage pointer
the selem belongs.
Performance wise, this is slightly better as update now requires locking
one bucket. It should not change the level of contention on one bucket
as the pointers to local storages of selems in a map are just as unique
as pointers to selems.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-2-ameryhung@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"A couple of late-breaking MM fixes. One against a new-in-this-cycle
patch and the other addresses a locking issue which has been there for
over a year"
* tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/memory-failure: reject unsupported non-folio compound page
procfs: avoid fetching build ID while holding VMA lock
|
|
Pull ceph fixes from Ilya Dryomov:
"One RBD and two CephFS fixes which address potential oopses.
The RBD thing is more of a rare edge case that pops up in our CI,
while the two CephFS scenarios are regressions that were reported by
users and can be triggered trivially in normal operation. All marked
for stable"
* tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client:
ceph: fix NULL pointer dereference in ceph_mds_auth_match()
ceph: fix oops due to invalid pointer for kfree() in parse_longname()
rbd: check for EOD after exclusive lock is ensured to be held
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux
Pull dma-mapping fixes from Marek Szyprowski:
"Two minor fixes for the DMA-mapping subsystem:
- check for the rare case of the allocation failure of the global CMA
pool (Shanker Donthineni)
- avoid perf buffer overflow when tracing large scatter-gather lists
(Deepanshu Kartikey)"
* tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
dma: contiguous: Check return value of dma_contiguous_reserve_area()
tracing/dma: Cap dma_map_sg tracepoint arrays to prevent buffer overflow
|
|
- Move ABI requirement next to each access right to prepare adding more
access rights;
- Mention the possibility to remove the random component of a socket's
ephemeral port choice within the netns-wide ephemeral port range,
since it allows choosing the "random" ephemeral port.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251212163704.142301-2-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a
given Landlock ruleset is applied to all threads of the calling
process, instead of only the current one.
Without this flag, multithreaded userspace programs currently resort
to using the nptl(7)/libpsx hack for multithreaded policy enforcement,
which is also used by libcap and for setuid(2). Using this
userspace-based scheme, the threads of a process enforce the same
Landlock policy, but the resulting Landlock domains are still
separate. The domains being separate causes multiple problems:
* When using Landlock's "scoped" access rights, the domain identity is
used to determine whether an operation is permitted. As a result,
when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads
stops working. This is a problem for programming languages and
frameworks which are inherently multithreaded (e.g. Go).
* In audit logging, the domains of separate threads in a process will
get logged with different domain IDs, even when they are based on
the same ruleset FD, which might confuse users.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com
[mic: Fix restrict_self_flags test, clean up Makefile, allign comments,
reduce local variable scope, add missing includes]
Closes: https://github.com/landlock-lsm/linux/issues/2
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This reverts commit 62eb557580eb2177cf16c3fd2b6efadff297b29a.
The revocable implementation uses two separate abstractions, struct
revocable_provider and struct revocable, in order to store the SRCU read
lock index which must be passed unaltered to srcu_read_unlock() in the
same context when a resource is no longer needed.
With the merged revocable API, multiple threads could however share the
same struct revocable and therefore potentially overwrite the SRCU index
of another thread which can cause the SRCU synchronisation in
revocable_provider_revoke() to never complete. [1]
An example revocable conversion of the gpiolib code also turned out to
be fundamentally flawed and could lead to use-after-free. [2]
An attempt to address both issues was quickly put together and merged,
but revocable is still fundamentally broken. [3]
Specifically, the latest design relies on RCU for storing a pointer to
the revocable provider, but since the resource can be shared by value
(e.g. as in the now reverted selftests) this does not work at all and
can also lead to use-after-free:
static void revocable_provider_release(struct kref *kref)
{
struct revocable_provider *rp = container_of(kref,
struct revocable_provider, kref);
cleanup_srcu_struct(&rp->srcu);
kfree_rcu(rp, rcu);
}
void revocable_provider_revoke(struct revocable_provider __rcu **rp_ptr)
{
struct revocable_provider *rp;
rp = rcu_replace_pointer(*rp_ptr, NULL, 1);
...
kref_put(&rp->kref, revocable_provider_release);
}
int revocable_init(struct revocable_provider __rcu *_rp,
struct revocable *rev)
{
struct revocable_provider *rp;
...
scoped_guard(rcu) {
rp = rcu_dereference(_rp);
if (!rp)
return -ENODEV;
if (!kref_get_unless_zero(&rp->kref))
return -ENODEV;
}
...
}
producer:
priv->rp = revocable_provider_alloc(&priv->res);
// pass priv->rp by value to consumer
revocable_provider_revoke(&priv->rp);
consumer:
struct revocable_provider __rcu *rp = filp->private_data;
struct revocable *rev;
revocable_init(rp, &rev);
as _rp would still be non-NULL in revocable_init() regardless of whether
the producer has revoked the resource and set its pointer to NULL.
Essentially revocable still relies on having a pointer to reference
counted driver data which holds the revocable provider, which makes all
the RCU protection unnecessary along with most of the current revocable
design and implementation.
As the above shows, and as has been pointed out repeatedly elsewhere,
these kind of issues are not something that should be addressed
incrementally. [4]
Revert the revocable implementation until a redesign has been proposed
and evaluated properly.
Link: https://lore.kernel.org/all/20260124170535.11756-4-johan@kernel.org/ [1]
Link: https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/ [2]
Link: https://lore.kernel.org/all/20260129143733.45618-1-tzungbi@kernel.org/ [3]
Link: https://lore.kernel.org/all/aXobzoeooJqxMkEj@hovoldconsulting.com/ [4]
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20260204142849.22055-4-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently io_uring supports restricting operations on a per-ring basis.
To use those, the ring must be setup in a disabled state by setting
IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and
the ring can then be enabled.
This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd
== -1, like the other "blind" register opcodes which work on the task
rather than a specific ring. This allows registration of the same kind
of restrictions as can been done on a specific ring, but with the task
itself. Once done, any ring created will inherit these restrictions.
If a restriction filter is registered with a task, then it's inherited
on fork for its children. Children may only further restrict operations,
not extend them.
Inheriting restrictions include both the classic
IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF
filters that have been registered with the task via
IORING_REGISTER_BPF_FILTER.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Called when copy_process() is called to copy state to a new child.
Right now this is just a stub, but will be used shortly to properly
handle fork'ing of task based io_uring restrictions.
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently, KVM assumes the minimum of implemented HGEIE bits and
"BIT(gc->guest_index_bits) - 1" as the number of guest files available
across all CPUs. This will not work when CPUs have different number
of guest files because KVM may incorrectly allocate a guest file on a
CPU with fewer guest files.
To address above, during initialization, calculate the number of
available guest interrupt files according to MMIO resources and
constrain the number of guest interrupt files that can be allocated
by KVM.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Thomas Gleixner <tglx@kernel.org>
Link: https://lore.kernel.org/r/20260104133457.57742-1-luxu.kernel@bytedance.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Open intervals do not have an end element, in particular an open
interval at the end of the set is hard to validate because of it is
lacking the end element, and interval validation relies on such end
element to perform the checks.
This patch adds a new flag field to struct nft_set_elem, this is not an
issue because this is a temporary object that is allocated in the stack
from the insert/deactivate path. This flag field is used to specify that
this is the last element in this add/delete command.
The last flag is used, in combination with the start element cookie, to
check if there is a partial overlap, eg.
Already exists: 255.255.255.0-255.255.255.254
Add interval: 255.255.255.0-255.255.255.255
~~~~~~~~~~~~~
start element overlap
Basically, the idea is to check for an existing end element in the set
if there is an overlap with an existing start element.
However, the last open interval can come in any position in the add
command, the corner case can get a bit more complicated:
Already exists: 255.255.255.0-255.255.255.254
Add intervals: 255.255.255.0-255.255.255.255,255.255.255.0-255.255.255.254
~~~~~~~~~~~~~
start element overlap
To catch this overlap, annotate that the new start element is a possible
overlap, then report the overlap if the next element is another start
element that confirms that previous element in an open interval at the
end of the set.
For deletions, do not update the start cookie when deleting an open
interval, otherwise this can trigger spurious EEXIST when adding new
elements.
Unfortunately, there is no NFT_SET_ELEM_INTERVAL_OPEN flag which would
make easier to detect open interval overlaps.
Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
nft_counter_reset() calls u64_stats_add() with a negative value to reset
the counter. This will work on 64bit archs, hence the negative value
added will wrap as a 64bit value which then can wrap the stat counter as
well.
On 32bit archs, the added negative value will wrap as a 32bit value and
_not_ wrapping the stat counter properly. In most cases, this would just
lead to a very large 32bit value being added to the stat counter.
Fix by introducing u64_stats_sub().
Fixes: 4a1d3acd6ea8 ("netfilter: nft_counter: Use u64_stats_t for statistic.")
Signed-off-by: Anders Grahn <anders.grahn@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Ulrich reports a regression with nfqueue:
If an application did not set the 'F_GSO' capability flag and a gso
packet with an unconfirmed nf_conn entry is received all packets are
now dropped instead of queued, because the check happens after
skb_gso_segment(). In that case, we did have exclusive ownership
of the skb and its associated conntrack entry. The elevated use
count is due to skb_clone happening via skb_gso_segment().
Move the check so that its peformed vs. the aggregated packet.
Then, annotate the individual segments except the first one so we
can do a 2nd check at reinject time.
For the normal case, where userspace does in-order reinjects, this avoids
packet drops: first reinjected segment continues traversal and confirms
entry, remaining segments observe the confirmed entry.
While at it, simplify nf_ct_drop_unconfirmed(): We only care about
unconfirmed entries with a refcnt > 1, there is no need to special-case
dying entries.
This only happens with UDP. With TCP, the only unconfirmed packet will
be the TCP SYN, those aren't aggregated by GRO.
Next patch adds a udpgro test case to cover this scenario.
Reported-by: Ulrich Weber <ulrich.weber@gmail.com>
Fixes: 7d8dc1c7be8d ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks")
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
pinconf_generic_dt_node_to_map_pinmux() is not actually a generic
function, and really belongs in the amlogic-am4 driver. There are three
reasons why.
First, and least, of the reasons is that this function behaves
differently to the other dt_node_to_map functions in a way that is not
obvious from a first glance. This difference stems for the devicetree
properties that the function is intended for use with, and how they are
typically used. The other generic dt_node_to_map functions support
platforms where the pins, groups and functions are described statically
in the driver and require a function that will produce a mapping from dt
nodes to these pre-established descriptions. No other code in the driver
is require to be executed at runtime.
pinconf_generic_dt_node_to_map_pinmux() on the other hand is intended for
use with the pinmux property, where groups and functions are determined
entirely from the devicetree. As a result, there are no statically
defined groups and functions in the driver for this function to perform
a mapping to. Other drivers that use the pinmux property (e.g. the k1)
their dt_node_to_map function creates the groups and functions as the
devicetree is parsed. Instead of that,
pinconf_generic_dt_node_to_map_pinmux() requires that the devicetree is
parsed twice, once by it and once at probe, so that the driver
dynamically creates the groups and functions before the dt_node_to_map
callback is executed. I don't believe this double parsing requirement is
how developers would expect this to work and is not necessary given
there are drivers that do not have this behaviour.
Secondly and thirdly, the function bakes in some assumptions that only
really match the amlogic platform about how the devicetree is constructed.
These, to me, are problematic for something that claims to be generic.
The other dt_node_to_map implementations accept a being called for
either a node containing pin configuration properties or a node
containing child nodes that each contain the configuration properties.
IOW, they support the following two devicetree configurations:
| cfg {
| label: group {
| pinmux = <asjhdasjhlajskd>;
| config-item1;
| };
| };
| label: cfg {
| group1 {
| pinmux = <dsjhlfka>;
| config-item2;
| };
| group2 {
| pinmux = <lsdjhaf>;
| config-item1;
| };
| };
pinconf_generic_dt_node_to_map_pinmux() only supports the latter.
The other assumption about devicetree configuration that the function
makes is that the labeled node's parent is a "function node". The amlogic
driver uses these "function nodes" to create the functions at probe
time, and pinconf_generic_dt_node_to_map_pinmux() finds the parent of
the node it is operating on's name as part of the mapping. IOW, it
requires that the devicetree look like:
| pinctrl@bla {
|
| func-foo {
| label: group-default {
| pinmuxes = <lskdf>;
| };
| };
| };
and couldn't be used if the nodes containing the pinmux and
configuration properties are children of the pinctrl node itself:
| pinctrl@bla {
|
| label: group-default {
| pinmuxes = <lskdf>;
| };
| };
These final two reasons are mainly why I believe this is not suitable as
a generic function, and should be moved into the driver that is the sole
user and originator of the "generic" function.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
|
|
Currently, hwrng_fill is not cleared until the hwrng_fillfn() thread
exits. Since hwrng_unregister() reads hwrng_fill outside the rng_mutex
lock, a concurrent hwrng_unregister() may call kthread_stop() again on
the same task.
Additionally, if hwrng_unregister() is called immediately after
hwrng_register(), the stopped thread may have never been executed. Thus,
hwrng_fill remains dirty even after hwrng_unregister() returns. In this
case, subsequent calls to hwrng_register() will fail to start new
threads, and hwrng_unregister() will call kthread_stop() on the same
freed task. In both cases, a use-after-free occurs:
refcount_t: addition on 0; use-after-free.
WARNING: ... at lib/refcount.c:25 refcount_warn_saturate+0xec/0x1c0
Call Trace:
kthread_stop+0x181/0x360
hwrng_unregister+0x288/0x380
virtrng_remove+0xe3/0x200
This patch fixes the race by protecting the global hwrng_fill pointer
inside the rng_mutex lock, so that hwrng_fillfn() thread is stopped only
once, and calls to kthread_run() and kthread_stop() are serialized
with the lock held.
To avoid deadlock in hwrng_fillfn() while being stopped with the lock
held, we convert current_rng to RCU, so that get_current_rng() can read
current_rng without holding the lock. To remove the lock from put_rng(),
we also delay the actual cleanup into a work_struct.
Since get_current_rng() no longer returns ERR_PTR values, the IS_ERR()
checks are removed from its callers.
With hwrng_fill protected by the rng_mutex lock, hwrng_fillfn() can no
longer clear hwrng_fill itself. Therefore, if hwrng_fillfn() returns
directly after current_rng is dropped, kthread_stop() would be called on
a freed task_struct later. To fix this, hwrng_fillfn() calls schedule()
now to keep the task alive until being stopped. The kthread_stop() call
is also moved from hwrng_unregister() to drop_current_rng(), ensuring
kthread_stop() is called on all possible paths where current_rng becomes
NULL, so that the thread would not wait forever.
Fixes: be4000bc4644 ("hwrng: create filler thread")
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Lianjie Wang <karin0.zst@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The newly added structure causes a warning about implied padding:
./usr/include/linux/kvm.h:1247:1: error: padding struct size to alignment boundary with 6 bytes [-Werror=padded]
The padding can lead to leaking kernel data and ABI incompatibilies
when used on x86. Neither of these is a problem in this specific
patch, but it's best to avoid it and use explicit padding fields
in general.
Fixes: 0ee4ddc1647b ("KVM: s390: Storage key manipulation IOCTL")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
'core' into next
|
|
The printf() format attributes are applied inconsistently for the binary
printf helpers, which causes warnings for the bpf_trace code using
them from functions that pass down format strings:
kernel/trace/bpf_trace.c: In function '____bpf_trace_printk':
kernel/trace/bpf_trace.c:377:9: error: function '____bpf_trace_printk' might be a candidate for 'gnu_printf' format attribute [-Werror=suggest-attribute=format]
377 | ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt, data.bin_args);
| ^~~
This can be addressed either by annotating all five callers in bpf code,
or by removing the annotations on the callees that were added by Andy
Shevchenko last year.
As Alexei Starovoitov points out, there are no callers in C code that
would benefit from the __printf attributes, the only users are in BPF
code or in the do_trace_printk() helper that already checks the arguments.
Drop all three of these annotations, reverting the earlierl commits that
added these, in order to get a clean build with -Wsuggest-attribute=format.
Fixes: 6b2c1e30ad68 ("seq_file: Mark binary printing functions with __printf() attribute")
Fixes: 7bf819aa992f ("vsnprintf: Mark binary printing functions with __printf() attribute")
Link: https://lore.kernel.org/all/CAADnVQK3eZp3yp35OUx8j1UBsQFhgsn5-4VReqAJ=68PaaKYmg@mail.gmail.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512061640.9hKTnB8p-lkp@intel.com/
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20260204132643.1302967-1-arnd@kernel.org
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
Rename TAUI/TBASE to GAUI/GBASE in 1600G link mode identifier and its
usage in ethtool and link-info tables.
Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://patch.msgid.link/20260204194324.1723534-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock
or per-VMA lock, whichever was used to lock VMA under question, to avoid
deadlock reported by syzbot:
-> #1 (&mm->mmap_lock){++++}-{4:4}:
__might_fault+0xed/0x170
_copy_to_iter+0x118/0x1720
copy_page_to_iter+0x12d/0x1e0
filemap_read+0x720/0x10a0
blkdev_read_iter+0x2b5/0x4e0
vfs_read+0x7f4/0xae0
ksys_read+0x12a/0x250
do_syscall_64+0xcb/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}:
__lock_acquire+0x1509/0x26d0
lock_acquire+0x185/0x340
down_read+0x98/0x490
blkdev_read_iter+0x2a7/0x4e0
__kernel_read+0x39a/0xa90
freader_fetch+0x1d5/0xa80
__build_id_parse.isra.0+0xea/0x6a0
do_procmap_query+0xd75/0x1050
procfs_procmap_ioctl+0x7a/0xb0
__x64_sys_ioctl+0x18e/0x210
do_syscall_64+0xcb/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(&mm->mmap_lock);
lock(&sb->s_type->i_mutex_key#8);
lock(&mm->mmap_lock);
rlock(&sb->s_type->i_mutex_key#8);
*** DEADLOCK ***
This seems to be exacerbated (as we haven't seen these syzbot reports
before that) by the recent:
777a8560fd29 ("lib/buildid: use __kernel_read() for sleepable context")
To make this safe, we need to grab file refcount while VMA is still locked, but
other than that everything is pretty straightforward. Internal build_id_parse()
API assumes VMA is passed, but it only needs the underlying file reference, so
just add another variant build_id_parse_file() that expects file passed
directly.
[akpm@linux-foundation.org: fix up kerneldoc]
Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org
Fixes: ed5d583a88a9 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.19-rc9).
No adjacent changes, conflicts:
drivers/net/ethernet/spacemit/k1_emac.c
3125fc1701694 ("net: spacemit: k1-emac: fix jumbo frame support")
f66086798f91f ("net: spacemit: Remove broken flow control support")
https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and Netfilter.
Previous releases - regressions:
- eth: stmmac: fix stm32 (and potentially others) resume regression
- nf_tables: fix inverted genmask check in nft_map_catchall_activate()
- usb: r8152: fix resume reset deadlock
- fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for RSS contexts
Previous releases - always broken:
- sched: cls_u32: use skb_header_pointer_careful() to avoid OOB reads
with malicious u32 rules
- eth: ice: timestamping related fixes"
* tag 'net-6.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF
netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate()
net: cpsw: Execute ndo_set_rx_mode callback in a work queue
net: cpsw_new: Execute ndo_set_rx_mode callback in a work queue
gve: Correct ethtool rx_dropped calculation
gve: Fix stats report corruption on queue count change
selftest: net: add a test-case for encap segmentation after GRO
net: gro: fix outer network offset
net: add proper RCU protection to /proc/net/ptype
net: ethernet: adi: adin1110: Check return value of devm_gpiod_get_optional() in adin1110_check_spi()
wifi: iwlwifi: mvm: pause TCM on fast resume
wifi: iwlwifi: mld: cancel mlo_scan_start_wk
net: spacemit: k1-emac: fix jumbo frame support
net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4
net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4
net: enetc: Remove CBDR cacheability AXI settings for ENETC v4
net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4
tipc: use kfree_sensitive() for session key material
net: stmmac: fix stm32 (and potentially others) resume regression
net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contexts
...
|
|
Currently we are registering one dynamic lockdep key for each allocated
qdisc, to avoid false deadlock reports when mirred (or TC eBPF) redirects
packets to another device while the root lock is acquired [1].
Since dynamic keys are a limited resource, we can save them at least for
qdiscs that are not meant to acquire the root lock in the traffic path,
or to carry traffic at all, like:
- clsact
- ingress
- noqueue
Don't register dynamic keys for the above schedulers, so that we hit
MAX_LOCKDEP_KEYS later in our tests.
[1] https://github.com/multipath-tcp/mptcp_net-next/issues/451
Changes in v2:
- change ordering of spin_lock_init() vs. lockdep_register_key()
(Jakub Kicinski)
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/94448f7fa7c4f52d2ce416a4895ec87d456d7417.1770220576.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Inlining __reqsk_free() is overkill, let's reclaim 2 Kbytes of text.
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 2/4 grow/shrink: 2/14 up/down: 225/-2338 (-2113)
Function old new delta
__reqsk_free - 114 +114
sock_edemux 18 82 +64
inet_csk_listen_start 233 264 +31
__pfx___reqsk_free - 16 +16
__pfx_reqsk_queue_alloc 16 - -16
__pfx_reqsk_free 16 - -16
reqsk_queue_alloc 46 - -46
tcp_req_err 272 177 -95
reqsk_fastopen_remove 348 253 -95
cookie_bpf_check 157 62 -95
cookie_tcp_reqsk_alloc 387 290 -97
cookie_v4_check 1568 1465 -103
reqsk_free 105 - -105
cookie_v6_check 1519 1412 -107
sock_gen_put 187 78 -109
sock_pfree 212 82 -130
tcp_try_fastopen 1818 1683 -135
tcp_v4_rcv 3478 3294 -184
reqsk_put 306 90 -216
tcp_get_cookie_sock 551 318 -233
tcp_v6_rcv 3404 3141 -263
tcp_conn_request 2677 2384 -293
Total: Before=24887415, After=24885302, chg -0.01%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260204055147.1682705-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Only called once from inet_csk_listen_start(), it can be static.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260204055147.1682705-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for using the state of pins on the amplifier to indicate
the type of speaker fitted.
Previously, where there were alternate speaker vendors, this was
indicated using host CPU GPIOs.
Some new Dell models use spare pins on the CS35L63 as GPIOs for the
speaker ID detection.
Cirrus-specific SDCA Disco properties provide a list of the pins to be
used, and pull-up/down settings for the pads. This list is ordered,
MSbit to LSbit.
The code to set the firmware filename has been modified to check for
using chip pins for speaker ID. The entire block of code to set
firmware name has been moved out of cs35l56_component_probe() into
its own function to make it easier to KUnit test.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260205164838.1611295-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The PORTSC1 register has a different offset in Tegra20 compared to
Tegra30+, yet they share a crucial set of registers required for HSIC
functionality. Reflect this register offset change in the SoC config.
Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Link: https://patch.msgid.link/20260202080526.23487-5-clamor95@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The parallel transceiver select used in HSIC mode differs on Tegra20,
where it uses the UTMI value (0), whereas Tegra30+ uses a dedicated HSIC
value. Reflect this in the SoC config.
Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Link: https://patch.msgid.link/20260202080526.23487-4-clamor95@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Change TEGRA_USB_HOSTPC1_DEVLC_PTS_HSIC to its literal value instead of
using the BIT macro, as it is an enumeration. Correct the spelling in the
comment and rename uhsic_registers_shift to uhsic_registers_offset.
These changes are cosmetic and do not affect code behavior.
Signed-off-by: Svyatoslav Ryhel <clamor95@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Link: https://patch.msgid.link/20260202080526.23487-2-clamor95@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The linker script scripts/module.lds.S specifies that all input
__klp_objects sections should be consolidated into an output section of
the same name, and start/stop symbols should be created to enable
scripts/livepatch/init.c to locate this data.
This start/stop pattern is not ideal for modules because the symbols are
created even if no __klp_objects input sections are present.
Consequently, a dummy __klp_objects section also appears in the
resulting module. This unnecessarily pollutes non-livepatch modules.
Instead, since modules are relocatable files, the usual method for
locating consolidated data in a module is to read its section table.
This approach avoids the aforementioned problem.
The klp_modinfo already stores a copy of the entire section table with
the final addresses. Introduce a helper function that
scripts/livepatch/init.c can call to obtain the location of the
__klp_objects section from this data.
Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Link: https://patch.msgid.link/20260123102825.3521961-2-petr.pavlu@suse.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
skb_protocol() is bloated, and forces slow stack canaries in many
fast paths.
Add vlan_get_protocol_offset_inline() which deals with the non-vlan
common cases.
__vlan_get_protocol_offset() is now out of line.
It returns a vlan_type_depth struct to avoid stack canaries in callers.
struct vlan_type_depth {
__be16 type;
u16 depth;
};
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/2 grow/shrink: 0/22 up/down: 0/-6320 (-6320)
Function old new delta
vlan_get_protocol_dgram 61 59 -2
__pfx_skb_protocol 16 - -16
__vlan_get_protocol_offset 307 273 -34
tap_get_user 1374 1207 -167
ip_md_tunnel_xmit 1625 1452 -173
tap_sendmsg 940 753 -187
netif_skb_features 1079 866 -213
netem_enqueue 3017 2800 -217
vlan_parse_protocol 271 50 -221
tso_start 567 344 -223
fq_dequeue 1908 1685 -223
skb_network_protocol 434 205 -229
ip6_tnl_xmit 2639 2409 -230
br_dev_queue_push_xmit 474 236 -238
skb_protocol 258 - -258
packet_parse_headers 621 357 -264
__ip6_tnl_rcv 1306 1039 -267
skb_csum_hwoffload_help 515 224 -291
ip_tunnel_xmit 2635 2339 -296
sch_frag_xmit_hook 1582 1233 -349
bpf_skb_ecn_set_ce 868 457 -411
IP6_ECN_decapsulate 1297 768 -529
ip_tunnel_rcv 2121 1489 -632
ipip6_rcv 2572 1922 -650
Total: Before=24892803, After=24886483, chg -0.03%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260204053023.1622775-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Some functions do not modify the pointed-to data, but lack const
qualifiers. Add const qualifiers to the arguments of
flow_rule_match_has_control_flags() and flow_cls_offload_flow_rule().
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260204052839.198602-1-mmyangfl@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for the REF_TRACKER infrastructure to the DPLL subsystem.
When enabled, this allows developers to track and debug reference counting
leaks or imbalances for dpll_device and dpll_pin objects. It records stack
traces for every get/put operation and exposes this information via
debugfs at:
/sys/kernel/debug/ref_tracker/dpll_device_*
/sys/kernel/debug/ref_tracker/dpll_pin_*
The following API changes are made to support this:
1. dpll_device_get() / dpll_device_put() now accept a 'dpll_tracker *'
(which is a typedef to 'struct ref_tracker *' when enabled, or an empty
struct otherwise).
2. dpll_pin_get() / dpll_pin_put() and fwnode_dpll_pin_find() similarly
accept the tracker argument.
3. Internal registration structures now hold a tracker to associate the
reference held by the registration with the specific owner.
All existing in-tree drivers (ice, mlx5, ptp_ocp, zl3073x) are updated
to pass NULL for the new tracker argument, maintaining current behavior
while enabling future debugging capabilities.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260203174002.705176-8-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Allow drivers to register DPLL pins without manually specifying a pin
index.
Currently, drivers must provide a unique pin index when calling
dpll_pin_get(). This works well for hardware-mapped pins but creates
friction for drivers handling virtual pins or those without a strict
hardware indexing scheme.
Introduce DPLL_PIN_IDX_UNSPEC (U32_MAX). When a driver passes this
value as the pin index:
1. The core allocates a unique index using an IDA
2. The allocated index is mapped to a range starting above `INT_MAX`
This separation ensures that dynamically allocated indices never collide
with standard driver-provided hardware indices, which are assumed to be
within the `0` to `INT_MAX` range. The index is automatically freed when
the pin is released in dpll_pin_put().
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260203174002.705176-5-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, the DPLL subsystem reports events (creation, deletion, changes)
to userspace via Netlink. However, there is no mechanism for other kernel
components to be notified of these events directly.
Add a raw notifier chain to the DPLL core protected by dpll_lock. This
allows other kernel subsystems or drivers to register callbacks and
receive notifications when DPLL devices or pins are created, deleted,
or modified.
Define the following:
- Registration helpers: {,un}register_dpll_notifier()
- Event types: DPLL_DEVICE_CREATED, DPLL_PIN_CREATED, etc.
- Context structures: dpll_{device,pin}_notifier_info to pass relevant
data to the listeners.
The notification chain is invoked alongside the existing Netlink event
generation to ensure in-kernel listeners are kept in sync with the
subsystem state.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20260203174002.705176-4-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Extend the DPLL core to support associating a DPLL pin with a firmware
node. This association is required to allow other subsystems (such as
network drivers) to locate and request specific DPLL pins defined in
the Device Tree or ACPI.
* Add a .fwnode field to the struct dpll_pin
* Introduce dpll_pin_fwnode_set() helper to allow the provider driver
to associate a pin with a fwnode after the pin has been allocated
* Introduce fwnode_dpll_pin_find() helper to allow consumers to search
for a registered DPLL pin using its associated fwnode handle
* Ensure the fwnode reference is properly released in dpll_pin_put()
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260203174002.705176-2-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Merge ACPI APEI support updates for 6.20-rc1/7.0-rc1:
- Make read-only array non_mmio_desc[] static const (Colin Ian King)
- Prevent the APEI GHES support code on ARM from accessing memory out
of bounds or going past the ARM processor CPER record buffer (Mauro
Carvalho Chehab)
- Prevent cper_print_fw_err() from dumping the entire memory on systems
with defective firmware (Mauro Carvalho Chehab)
- Improve ghes_notify_nmi() status check to avoid unnecessary overhead
in the NMI handler by carrying out all of the requisite preparations
and the NMI registration time (Tony Luck)
- Refactor the GHES driver by extracting common functionality into
reusable helper functions to reduce code duplication and improve
the ghes_notify_sea() status check in analogy with the previous
ghes_notify_nmi() status check improvement (Shuai Xue)
- Make ELOG and GHES log and trace consistently and support the CPER
CXL protocol analogously (Fabio De Francesco)
- Disable KASAN instrumentation in the APEI GHES driver when compile
testing with clang < 18 (Nathan Chancellor)
- Let ghes_edac be the preferred driver to load on __ZX__ and _BYO_
systems by extending the platform detection list in the APEI GHES
driver (Tony W Wang-oc)
* acpi-apei:
ACPI: APEI: GHES: Add ghes_edac support for __ZX__ and _BYO_ systems
ACPI: APEI: GHES: Disable KASAN instrumentation when compile testing with clang < 18
ACPI: extlog: Trace CPER CXL Protocol Error Section
ACPI: APEI: GHES: Add helper to copy CPER CXL protocol error info to work struct
ACPI: APEI: GHES: Add helper for CPER CXL protocol errors checks
ACPI: extlog: Trace CPER PCI Express Error Section
ACPI: extlog: Trace CPER Non-standard Section Body
ACPI: APEI: GHES: Improve ghes_notify_sea() status check
ACPI: APEI: GHES: Extract helper functions for error status handling
ACPI: APEI: GHES: Improve ghes_notify_nmi() status check
EFI/CPER: don't dump the entire memory region
APEI/GHES: ensure that won't go past CPER allocated record
EFI/CPER: don't go past the ARM processor CPER record buffer
APEI/GHES: ARM processor Error: don't go past allocated memory
ACPI: APEI: EINJ: make read-only array non_mmio_desc static const
|
|
Merge ACPI processor driver changes for 6.20-rc1/7.0-rc1:
- Rework the ACPI idle driver initialization to register it directly
from the common initialization code instead of doing that from a
CPU hotplug "online" callback and clean it up (Huisong Li, Rafael
Wysocki)
- Fix a possible NULL pointer dereference in
acpi_processor_errata_piix4() (Tuo Li)
* acpi-processor:
ACPI: processor: idle: Rework the handling of acpi_processor_ffh_lpi_probe()
ACPI: processor: idle: Convert acpi_processor_setup_cpuidle_dev() to void
ACPI: processor: idle: Convert acpi_processor_setup_cpuidle_states() to void
ACPI: processor: idle: Add debug log for states with invalid entry methods
ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4()
ACPI: processor: Do not expose global variable acpi_idle_driver
ACPI: processor: idle: Rearrange declarations in header file
ACPI: processor: idle: Redefine two functions as void
ACPI: processor: Update cpuidle driver check in __acpi_processor_start()
ACPI: processor: Remove unused empty stubs of some functions
ACPI: processor: idle: Optimize ACPI idle driver registration
|