| Age | Commit message (Collapse) | Author |
|
Failure to read the descriptor (because it is outside of a memslot) should
result in a SEA being injected in the guest.
Suggested-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/86ms1m9lp3.wl-maz@kernel.org
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260225173515.20490-4-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As per R_BFHQH,
" When an Address size fault is generated, the reported fault code
indicates one of the following:
If the fault was generated due to the TTBR_ELx used in the translation
having nonzero address bits above the OA size, then a fault at level 0. "
Fix the reported Address size fault level as being 0 if the base address is
wrongly programmed by L1.
Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic")
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260225173515.20490-3-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
check_base_s2_limits() checks the validity of SL0 and inputsize against
ia_size (inputsize again!) but the pseudocode from DDI0487 G.a
AArch64.TranslationTableWalk() says that we should check against the
implemented PA size.
We would otherwise fail to walk S2 with a valid configuration. E.g.,
granule size = 4KB, inputsize = 40 bits, initial lookup level = 0 (no
concatenation) on a system with 48 bits PA range supported is allowed by
architecture.
Fix it by obtaining PA size by kvm_get_pa_bits(). Note that
kvm_get_pa_bits() returns the fixed limit now and should eventually reflect
the per VM PARange (one day!). Given that the configured PARange should not
be greater that kvm_ipa_limit, it at least fixes the problem described
above.
While at it, inject a level 0 translation fault to guest if
check_base_s2_limits() fails, as per the pseudocode.
Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic")
Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev>
Link: https://patch.msgid.link/20260225173515.20490-2-zenghui.yu@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.0, take #1
- Make sure we don't leak any S1POE state from guest to guest when
the feature is supported on the HW, but not enabled on the host
- Propagate the ID registers from the host into non-protected VMs
managed by pKVM, ensuring that the guest sees the intended feature set
- Drop double kern_hyp_va() from unpin_host_sve_state(), which could
bite us if we were to change kern_hyp_va() to not being idempotent
- Don't leak stage-2 mappings in protected mode
- Correctly align the faulting address when dealing with single page
stage-2 mappings for PAGE_SIZE > 4kB
- Fix detection of virtualisation-capable GICv5 IRS, due to the
maintainer being obviously fat fingered...
- Remove duplication of code retrieving the ASID for the purpose of
S1 PT handling
- Fix slightly abusive const-ification in vgic_set_kvm_info()
|
|
We currently have three versions of the ASID retrieval code, one
in the S1 walker, and two in the VNCR handling (although the last
two are limited to the EL2&0 translation regime).
Make this code common, and take this opportunity to also simplify
the code a bit while switching over to the TTBRx_EL1_ASID macro.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is
not supported") added an early return to several functions in
arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error
when accessing the pgt union for non-nested VMs.
However, this early return was inadvertently applied to
kvm_arch_flush_shadow_all() as well, causing it to skip the call to
kvm_uninit_stage2_mmu(kvm) for all non-nested VMs.
For pKVM, skipping this teardown means the host never unshares the
guest's memory with the EL2 hypervisor. When the host kernel later
recycles these leaked pages for a new VM, it attempts to re-share them.
The hypervisor correctly rejects this with -EPERM, triggering a host
WARN_ON and hanging the guest.
Fix this by dropping the early return from kvm_arch_flush_shadow_all().
The for-loop guarding the nested MMU cleanup already bounds itself when
nested_mmus_size == 0, allowing execution to proceed to
kvm_uninit_stage2_mmu() as intended.
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/
Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported")
Signed-off-by: Fuad Tabba <tabba@google.com>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
* kvm-arm64/misc-6.20:
: .
: Misc KVM/arm64 changes for 6.20
:
: - Trivial FPSIMD cleanups
:
: - Calculate hyp VA size only once, avoiding potential mapping issues when
: VA bits is smaller than expected
:
: - Silence sparse warning for the HYP stack base
:
: - Fix error checking when handling FFA_VERSION
:
: - Add missing trap configuration for DBGWCR15_EL1
:
: - Don't try to deal with nested S2 when NV isn't enabled for a guest
:
: - Various spelling fixes
: .
KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported
KVM: arm64: Fix various comments
KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1
KVM: arm64: Fix error checking for FFA_VERSION
KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include
KVM: arm64: Calculate hyp VA size only once
KVM: arm64: Remove ISB after writing FPEXC32_EL2
KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices
KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host()
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Sanitise SCTLR_EL2 the usual way. The most important aspect of
this is that we benefit from SCTLR_EL2.SPAN being RES1 when
HCR_EL2.E2H==0.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-20-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
FEAT_TME has been dropped from the architecture. Retrospectively.
I'm sure someone is crying somewhere, but most of us won't.
Clean-up time.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a new helper to retrieve the RESx values for a given system
register, and use it for the runtime sanitisation.
This results in slightly better code generation for a fairly hot
path in the hypervisor, and additionally covers all sanitised
registers in all conditions, not just the VNCR-based ones.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We have so far mostly tracked RES0 bits, but only made a few attempts
at being just as strict for RES1 bits (probably because they are both
rarer and harder to handle).
Start scratching the surface by introducing a data structure tracking
RES0 and RES1 bits at the same time.
Note that contrary to the usual idiom, this structure is mostly passed
around by value -- the ABI handles it nicely, and the resulting code is
much nicer.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202184329.2724080-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The NV stage-2 manipulation functions kvm_nested_s2_unmap(),
kvm_nested_s2_wp(), and others, are being called for any stage-2
manipulation regardless of whether nested virtualization is supported or
enabled for the VM.
For protected KVM (pKVM), `struct kvm_pgtable` uses the
`pkvm_mappings` member of the union. This member aliases `ia_bits`,
which is used by the non-protected NV code paths. Attempting to
read `pgt->ia_bits` in these functions results in treating
protected mapping pointers or state values as bit-shift amounts.
This triggers a UBSAN shift-out-of-bounds error:
UBSAN: shift-out-of-bounds in arch/arm64/kvm/nested.c:1127:34
shift exponent 174565952 is too large for 64-bit type 'unsigned long'
Call trace:
__ubsan_handle_shift_out_of_bounds+0x28c/0x2c0
kvm_nested_s2_unmap+0x228/0x248
kvm_arch_flush_shadow_memslot+0x98/0xc0
kvm_set_memslot+0x248/0xce0
Since pKVM and NV are mutually exclusive, prevent entry into these
NV handling functions if the VM has not allocated any nested MMUs
(i.e., `kvm->arch.nested_mmus_size` is 0).
Fixes: 7270cc9157f47 ("KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20260202152310.113467-1-tabba@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Describe all the VTCR_EL2 fields and their respective configurations,
making sure that we correctly ignore the bits that are not defined
for a given guest configuration.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Our definition of VTCR_EL2 is both partial (tons of fields are
missing) and totally inconsistent (some constants are shifted,
some are not). They are also expressed in terms of TCR, which is
rather inconvenient.
Replace the ad-hoc definitions with the the generated version.
This results in a bunch of additional changes to make the code
with the unshifted nature of generated enumerations.
The register data was extracted from the BSD licenced AARCHMRS
(AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0).
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Pull KVM updates from Paolo Bonzini:
"ARM:
- Support for userspace handling of synchronous external aborts
(SEAs), allowing the VMM to potentially handle the abort in a
non-fatal manner
- Large rework of the VGIC's list register handling with the goal of
supporting more active/pending IRQs than available list registers
in hardware. In addition, the VGIC now supports EOImode==1 style
deactivations for IRQs which may occur on a separate vCPU than the
one that acked the IRQ
- Support for FEAT_XNX (user / privileged execute permissions) and
FEAT_HAF (hardware update to the Access Flag) in the software page
table walkers and shadow MMU
- Allow page table destruction to reschedule, fixing long
need_resched latencies observed when destroying a large VM
- Minor fixes to KVM and selftests
Loongarch:
- Get VM PMU capability from HW GCFG register
- Add AVEC basic support
- Use 64-bit register definition for EIOINTC
- Add KVM timer test cases for tools/selftests
RISC/V:
- SBI message passing (MPXY) support for KVM guest
- Give a new, more specific error subcode for the case when in-kernel
AIA virtualization fails to allocate IMSIC VS-file
- Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
in small chunks
- Fix guest page fault within HLV* instructions
- Flush VS-stage TLB after VCPU migration for Andes cores
s390:
- Always allocate ESCA (Extended System Control Area), instead of
starting with the basic SCA and converting to ESCA with the
addition of the 65th vCPU. The price is increased number of exits
(and worse performance) on z10 and earlier processor; ESCA was
introduced by z114/z196 in 2010
- VIRT_XFER_TO_GUEST_WORK support
- Operation exception forwarding support
- Cleanups
x86:
- Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
SPTE caching is disabled, as there can't be any relevant SPTEs to
zap
- Relocate a misplaced export
- Fix an async #PF bug where KVM would clear the completion queue
when the guest transitioned in and out of paging mode, e.g. when
handling an SMI and then returning to paged mode via RSM
- Leave KVM's user-return notifier registered even when disabling
virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
keeping the notifier registered is ok; the kernel does not use the
MSRs and the callback will run cleanly and restore host MSRs if the
CPU manages to return to userspace before the system goes down
- Use the checked version of {get,put}_user()
- Fix a long-lurking bug where KVM's lack of catch-up logic for
periodic APIC timers can result in a hard lockup in the host
- Revert the periodic kvmclock sync logic now that KVM doesn't use a
clocksource that's subject to NTP corrections
- Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
latter behind CONFIG_CPU_MITIGATIONS
- Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
path; the only reason they were handled in the fast path was to
paper of a bug in the core #MC code, and that has long since been
fixed
- Add emulator support for AVX MOV instructions, to play nice with
emulated devices whose guest drivers like to access PCI BARs with
large multi-byte instructions
x86 (AMD):
- Fix a few missing "VMCB dirty" bugs
- Fix the worst of KVM's lack of EFER.LMSLE emulation
- Add AVIC support for addressing 4k vCPUs in x2AVIC mode
- Fix incorrect handling of selective CR0 writes when checking
intercepts during emulation of L2 instructions
- Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
on VMRUN and #VMEXIT
- Fix a bug where KVM corrupt the guest code stream when re-injecting
a soft interrupt if the guest patched the underlying code after the
VM-Exit, e.g. when Linux patches code with a temporary INT3
- Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
to userspace, and extend KVM "support" to all policy bits that
don't require any actual support from KVM
x86 (Intel):
- Use the root role from kvm_mmu_page to construct EPTPs instead of
the current vCPU state, partly as worthwhile cleanup, but mostly to
pave the way for tracking per-root TLB flushes, and elide EPT
flushes on pCPU migration if the root is clean from a previous
flush
- Add a few missing nested consistency checks
- Rip out support for doing "early" consistency checks via hardware
as the functionality hasn't been used in years and is no longer
useful in general; replace it with an off-by-default module param
to WARN if hardware fails a check that KVM does not perform
- Fix a currently-benign bug where KVM would drop the guest's
SPEC_CTRL[63:32] on VM-Enter
- Misc cleanups
- Overhaul the TDX code to address systemic races where KVM (acting
on behalf of userspace) could inadvertantly trigger lock contention
in the TDX-Module; KVM was either working around these in weird,
ugly ways, or was simply oblivious to them (though even Yan's
devilish selftests could only break individual VMs, not the host
kernel)
- Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
TDX vCPU, if creating said vCPU failed partway through
- Fix a few sparse warnings (bad annotation, 0 != NULL)
- Use struct_size() to simplify copying TDX capabilities to userspace
- Fix a bug where TDX would effectively corrupt user-return MSR
values if the TDX Module rejects VP.ENTER and thus doesn't clobber
host MSRs as expected
Selftests:
- Fix a math goof in mmu_stress_test when running on a single-CPU
system/VM
- Forcefully override ARCH from x86_64 to x86 to play nice with
specifying ARCH=x86_64 on the command line
- Extend a bunch of nested VMX to validate nested SVM as well
- Add support for LA57 in the core VM_MODE_xxx macro, and add a test
to verify KVM can save/restore nested VMX state when L1 is using
5-level paging, but L2 is not
- Clean up the guest paging code in anticipation of sharing the core
logic for nested EPT and nested NPT
guest_memfd:
- Add NUMA mempolicy support for guest_memfd, and clean up a variety
of rough edges in guest_memfd along the way
- Define a CLASS to automatically handle get+put when grabbing a
guest_memfd from a memslot to make it harder to leak references
- Enhance KVM selftests to make it easer to develop and debug
selftests like those added for guest_memfd NUMA support, e.g. where
test and/or KVM bugs often result in hard-to-debug SIGBUS errors
- Misc cleanups
Generic:
- Use the recently-added WQ_PERCPU when creating the per-CPU
workqueue for irqfd cleanup
- Fix a goof in the dirty ring documentation
- Fix choice of target for directed yield across different calls to
kvm_vcpu_on_spin(); the function was always starting from the first
vCPU instead of continuing the round-robin search"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
KVM: arm64: selftests: Add test for AT emulation
KVM: arm64: nv: Expose hardware access flag management to NV guests
KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
KVM: arm64: Implement HW access flag management in stage-1 SW PTW
KVM: arm64: Propagate PTW errors up to AT emulation
KVM: arm64: Add helper for swapping guest descriptor
KVM: arm64: nv: Use pgtable definitions in stage-2 walk
KVM: arm64: Handle endianness in read helper for emulated PTW
KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
KVM: arm64: Call helper for reading descriptors directly
KVM: arm64: nv: Advertise support for FEAT_XNX
KVM: arm64: Teach ptdump about FEAT_XNX permissions
KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"These are the arm64 updates for 6.19.
The biggest part is the Arm MPAM driver under drivers/resctrl/.
There's a patch touching mm/ to handle spurious faults for huge pmd
(similar to the pte version). The corresponding arm64 part allows us
to avoid the TLB maintenance if a (huge) page is reused after a write
fault. There's EFI refactoring to allow runtime services with
preemption enabled and the rest is the usual perf/PMU updates and
several cleanups/typos.
Summary:
Core features:
- Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
Perf and PMU:
- Avoid cycle counter on multi-threaded CPUs
- Extend CSPMU device probing and add additional filtering support
for NVIDIA implementations
- Add support for the PMUs on the NoC S3 interconnect
- Add additional compatible strings for new Cortex and C1 CPUs
- Add support for data source filtering to the SPE driver
- Add support for i.MX8QM and "DB" PMU in the imx PMU driver
Memory managemennt:
- Avoid broadcast TLBI if page reused in write fault
- Elide TLB invalidation if the old PTE was not valid
- Drop redundant cpu_set_*_tcr_t0sz() macros
- Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
- Propagate return value from __change_memory_common()
ACPI and EFI:
- Call EFI runtime services without disabling preemption
- Remove unused ACPI function
Miscellaneous:
- ptrace support to disable streaming on SME-only systems
- Improve sysreg generation to include a 'Prefix' descriptor
- Replace __ASSEMBLY__ with __ASSEMBLER__
- Align register dumps in the kselftest zt-test
- Remove some no longer used macros/functions
- Various spelling corrections"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
arm64/pageattr: Propagate return value from __change_memory_common
arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
KVM: arm64: selftests: Consider all 7 possible levels of cache
KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
Documentation/arm64: Fix the typo of register names
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
...
|
|
Keep sparse quiet by explicitly casting endianness conversion
when swapping S1 and S2 descriptors.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511260246.JQDGsQKa-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511260344.9XehvH5Q-lkp@intel.com/
Fixes: c59ca4b5b0c3f ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW")
Fixes: 39db933ba67f8 ("KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251125204848.1136383-1-maz@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Everything is in place to update the access flag at S1 and S2. Expose
support for the access flag flavor of FEAT_HAFDBS to NV guests.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-15-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Give the stage-2 walk similar treatment to stage-1: update the access
flag during the table walk and do so for any walk context.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-14-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Use the existing page table definitions instead of magic numbers for the
stage-2 table walk.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-10-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Implementing FEAT_HAFDBS means adding another descriptor accessor that
needs to deal with the guest-configured endianness. Prepare by moving
the endianness handling into the read accessor and out of the main body
of the S1/S2 PTWs.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-9-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
The stage-2 table walker passes down the vCPU as a void pointer. That
might've made sense if the walker was generic although at this point it
is clear this will only ever be used in the context of a vCPU.
Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-8-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Going through a function pointer doesn't serve much purpose when there's
only one implementation.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-7-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Everything is in place to support FEAT_XNX, advertise support.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-6-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
Add support for FEAT_XNX to shadow stage-2 MMUs, being careful to only
evaluate XN[0] when the feature is actually exposed to the VM.
Restructure the layering of permissions in the fault handler to assume
pX and uX then restricting based on the guest's stage-2 afterwards.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Link: https://msgid.link/20251124190158.177318-4-oupton@kernel.org
Signed-off-by: Oliver Upton <oupton@kernel.org>
|
|
This patch corrects several minor typographical and spelling errors
in comments across multiple arm64 source files.
No functional changes.
Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Marc reports that the performance of running an L3 guest has regressed
by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture.
That's of course terrible for the single user of recursive NV ;-)
While there's nothing to be done on non-FGT systems, take advantage of
the precise write trap of MDSCR_EL1 and leave the rest of the debug
registers untrapped.
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.18
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This will
hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then allows
a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #3
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly programming
at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
where an xarray's spinlock was nested with a *raw* spinlock
- Permit stage-2 read permission aborts which are possible in the case
of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
[Pull into kvm/master to fix conflicts. - Paolo]
|
|
* kvm-arm64/el2-feature-control: (23 commits)
: .
: General rework of EL2 features that can be disabled to satisfy
: the requirement of migration between heterogeneous hosts:
:
: - Handle effective RES0 behaviour of undefined registers, making sure
: that disabling a feature affects full registeres, and not just
: individual control bits. (20250918151402.1665315-1-maz@kernel.org)
:
: - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace.
: (20250911114621.3724469-1-yangjinqian1@huawei.com)
:
: - Turn the NV feature management into a deny-list, and expose
: missing features to EL2 guests.
: (20250912212258.407350-1-oliver.upton@linux.dev)
: .
KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs
KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs
KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs
KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs
KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set
KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs
KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs
KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac
KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs
KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs
KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()
KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED}
KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace
KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits()
KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits()
KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2
KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2}
KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits()
KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2
KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-debug:
: .
: Fix handling of MDSCR_EL1 in NV context, which is unfortunately
: mishandled by the architecture. Patches courtesy of Oliver Upton
: (20250917203125.283116-2-oliver.upton@linux.dev)
: .
KVM: arm64: nv: Apply guest's MDCR traps in nested context
KVM: arm64: nv: Trap debug registers when in hyp context
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Adjust the computation of the max OA to account for 52bit PAs.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The changes to the debug architecture up to v8.8 are concerned with
external debug, which of course has no direct impact on VMs. Raise the
feature limit and document what's preventing us from raising it further.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
While KVM does not expose IMPDEF features to VMs, FEAT_TIDCP1 is an
architecturally-defined EL1 trap of a particular sysreg encoding range.
Furthermore, KVM already advertises this feature to non-NV VMs.
As there is no interaction with EL2 traps, expose the feature.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
FEAT_SpecSEI is an informational feature describing whether speculative
loads may generate SErrors. Since there are already cases where KVM
reinjects an SError into the VM it is already possible this may happen
due to a speculative load within the VM.
Stop hiding the feature from NV-enabled VMs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM now handles HCR_EL2.{TWEDEn,TWEDEL} correctly when computing the
effective HCR for a nested context. Advertise the feature.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
FEAT_AFP doesn't intersect with any EL2 trap behavior, expose to
NV-enabled VMs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The exact wording of the restrictions on branch prediction due to
FEAT_ECBHB in DDI0487L.b is as follows:
When FEAT_ECBHB is implemented, the branch history information created
in a context before an exception to a higher Exception level using
AArch64 cannot be used by code before that exception to exploitatively
control the execution of any indirect branches in code in a different
context after the exception.
While vEL2 and EL1 are multiplexed at EL1, they exist in different
hardware-described contexts as KVM uses different stage-2 MMUs to
represent the corresponding translation regimes. Additionally, exception
entries into vEL2 always imply a hardware exception entry into literal EL2
for the emulated regime change.
Given all of this, and the fact that FEAT_ECBHB places no limitation on
the EL of the protected context after the exception, we can claim
FEAT_ECBHB on supporting hardware.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM already supports FEAT_RASv1p1 for NV-enabled VMs but only when
advertised through the canonical field. Stop masking the silly frac
field to expose the feature on systems without FEAT_DF.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The supporting infrastructure in KVM's abort injection code was merged a
while ago, but the author (me!) forgot to relax the NV limitation when
FEAT_DF2 got exposed to non-NV VMs. Fix it.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature
fields where a non-negative value implies that a feature is implemented
and a negative value implies that it is not. While the intention of
masking this field was likely to hide the feature, KVM actually
advertises it, even on unsupporting hardware.
Remove FEAT_DoubleLock from the mask, making the NI value visible to the
VM. Take care to accept the old, incorrect values for this field as
we've lied to userspace.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Consistently use denylisting of features such that the limitations of
KVM's nested implementation are explicitly documented (rather than
implied).
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM needs to ensure the guest hypervisor's traps take effect when the
vCPU is in a nested context. While supporting infrastructure is in place
for most of the EL2 trap registers, MDCR_EL2 is not.
Fold the guest's trap configuration into the effective MDCR_EL2. Apply
it directly to the in-memory representation as it gets recomputed on
every vcpu_load() anyway.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In case you haven't realized it yet, the architecture is _slightly_
broken in the context of nested virt. Here we have another example of
FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually
affects execution at vEL2.
Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this
mess at the expense of unnecessarily trapping the breakpoint/watchpoint
registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for
obvious correctness to start.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The code for invalidating VNCR entries in both kvm_invalidate_vncr_ipa()
and invalidate_vncr_va() incorrectly uses a bitwise AND with `(size - 1)`
instead of `~(size - 1)` to align the start address. This results
in masking the address bits instead of aligning them down to the start
of the block.
This bug may cause stale VNCR TLB entries to remain valid even after a
TLBI or MMU notifier, leading to incorrect memory translation and
unexpected guest behavior.
Credit to Team 0xB6 in bob14: DongHa Lee, Gyujeong Jin, Daehyeon Ko,
Geonha Lee, Hyungyu Oh, and Jaewon Yang.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Dongha Lee <p@sswd.pw>
Link: https://lore.kernel.org/r/20250906040724.72960-1-p@sswd.pw
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
kvm_vncr_tlb_lookup() is supposed to return true when the cached VNCR
TLB entry is valid for the current context. For non-Global entries, that
means the entry’s ASID must match the current ASID.
The current code returns true when the ASIDs do *not* match, which
inverts the logic. This is a potential vulnerability:
- Valid entries are ignored and we fall back to kvm_translate_vncr(),
hurting performance.
- Mismatched entries are treated as permission faults (-EPERM) instead
of triggering a fresh translation.
- This can also cause stale translations to be (wrongly) considered
valid across address spaces.
Flip the predicate so non-Global entries only hit when ASIDs match.
Special credit to Team 0xB6 for reporting: DongHa Lee, Gyujeong Jin,
Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang.
Signed-off-by: Geonha Lee <w1nsom3gna@korea.ac.kr>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250903150421.90752-1-w1nsom3gna@korea.ac.kr
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, take #2
- Correctly handle 'invariant' system registers for protected VMs
- Improved handling of VNCR data aborts, including external aborts
- Fixes for handling of FEAT_RAS for NV guests, providing a sane
fault context during SEA injection and preventing the use of
RASv1p1 fault injection hardware
- Ensure that page table destruction when a VM is destroyed gives an
opportunity to reschedule
- Large fix to KVM's infrastructure for managing guest context loaded
on the CPU, addressing issues where the output of AT emulation
doesn't get reflected to the guest
- Fix AT S12 emulation to actually perform stage-2 translation when
necessary
- Avoid attempting vLPI irqbypass when GICv4 has been explicitly
disabled for a VM
- Minor KVM + selftest fixes
|
|
Handle faults for memslots backed by guest_memfd in arm64 nested
virtualization triggered by VNCR_EL2.
* Introduce is_gmem output parameter to kvm_translate_vncr(), indicating
whether the faulted memory slot is backed by guest_memfd.
* Dispatch faults backed by guest_memfd to kvm_gmem_get_pfn().
* Update kvm_handle_vncr_abort() to handle potential guest_memfd errors.
Some of the guest_memfd errors need to be handled by userspace instead
of attempting to (implicitly) retry by returning to the guest.
Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20250729225455.670324-20-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
System register accesses redirected to the VNCR page can also generate
external aborts just like any other form of memory access. Route to
kvm_handle_guest_sea() for potential APEI handling, falling back to a
vSError if the kernel didn't handle the abort.
Take the opportunity to throw out the useless kvm_ras.h which provided a
helper with a single callsite...
Cc: Jiaqi Yan <jiaqiyan@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250729182342.3281742-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|