summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/nested.c
AgeCommit message (Collapse)Author
2026-03-05KVM: arm64: nv: Inject a SEA if failed to read the descriptorZenghui Yu (Huawei)
Failure to read the descriptor (because it is outside of a memslot) should result in a SEA being injected in the guest. Suggested-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/86ms1m9lp3.wl-maz@kernel.org Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260225173515.20490-4-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-05KVM: arm64: nv: Report addrsz fault at level 0 with a bad VTTBR.BADDRZenghui Yu (Huawei)
As per R_BFHQH, " When an Address size fault is generated, the reported fault code indicates one of the following: If the fault was generated due to the TTBR_ELx used in the translation having nonzero address bits above the OA size, then a fault at level 0. " Fix the reported Address size fault level as being 0 if the base address is wrongly programmed by L1. Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic") Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260225173515.20490-3-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-05KVM: arm64: nv: Check S2 limits based on implemented PA sizeZenghui Yu (Huawei)
check_base_s2_limits() checks the validity of SL0 and inputsize against ia_size (inputsize again!) but the pseudocode from DDI0487 G.a AArch64.TranslationTableWalk() says that we should check against the implemented PA size. We would otherwise fail to walk S2 with a valid configuration. E.g., granule size = 4KB, inputsize = 40 bits, initial lookup level = 0 (no concatenation) on a system with 48 bits PA range supported is allowed by architecture. Fix it by obtaining PA size by kvm_get_pa_bits(). Note that kvm_get_pa_bits() returns the fixed limit now and should eventually reflect the per VM PARange (one day!). Given that the configured PARange should not be greater that kvm_ipa_limit, it at least fixes the problem described above. While at it, inject a level 0 translation fault to guest if check_base_s2_limits() fails, as per the pseudocode. Fixes: 61e30b9eef7f ("KVM: arm64: nv: Implement nested Stage-2 page table walk logic") Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260225173515.20490-2-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-28Merge tag 'kvmarm-fixes-7.0-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.0, take #1 - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info()
2026-02-25KVM: arm64: Deduplicate ASID retrieval codeMarc Zyngier
We currently have three versions of the ASID retrieval code, one in the S1 walker, and two in the VNCR handling (although the last two are limited to the EL2&0 translation regime). Make this code common, and take this opportunity to also simplify the code a bit while switching over to the TTBRx_EL1_ASID macro. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260225104718.14209-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-23KVM: arm64: Revert accidental drop of kvm_uninit_stage2_mmu() for non-NV VMsFuad Tabba
Commit 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported") added an early return to several functions in arch/arm64/kvm/nested.c to prevent a UBSAN shift-out-of-bounds error when accessing the pgt union for non-nested VMs. However, this early return was inadvertently applied to kvm_arch_flush_shadow_all() as well, causing it to skip the call to kvm_uninit_stage2_mmu(kvm) for all non-nested VMs. For pKVM, skipping this teardown means the host never unshares the guest's memory with the EL2 hypervisor. When the host kernel later recycles these leaked pages for a new VM, it attempts to re-share them. The hypervisor correctly rejects this with -EPERM, triggering a host WARN_ON and hanging the guest. Fix this by dropping the early return from kvm_arch_flush_shadow_all(). The for-loop guarding the nested MMU cleanup already bounds itself when nested_mmus_size == 0, allowing execution to proceed to kvm_uninit_stage2_mmu() as intended. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/60916cb6-f460-4751-b910-f63c58700ad0@sirena.org.uk/ Fixes: 0c4762e26879 ("KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported") Signed-off-by: Fuad Tabba <tabba@google.com> Tested-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20260222083352.89503-1-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-05Merge branch kvm-arm64/misc-6.20 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-6.20: : . : Misc KVM/arm64 changes for 6.20 : : - Trivial FPSIMD cleanups : : - Calculate hyp VA size only once, avoiding potential mapping issues when : VA bits is smaller than expected : : - Silence sparse warning for the HYP stack base : : - Fix error checking when handling FFA_VERSION : : - Add missing trap configuration for DBGWCR15_EL1 : : - Don't try to deal with nested S2 when NV isn't enabled for a guest : : - Various spelling fixes : . KVM: arm64: nv: Avoid NV stage-2 code when NV is not supported KVM: arm64: Fix various comments KVM: arm64: nv: Add trap config for DBGWCR<15>_EL1 KVM: arm64: Fix error checking for FFA_VERSION KVM: arm64: Fix missing <asm/stackpage/nvhe.h> include KVM: arm64: Calculate hyp VA size only once KVM: arm64: Remove ISB after writing FPEXC32_EL2 KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indices KVM: arm64: Fix comment in fpsimd_lazy_switch_to_host() Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05KVM: arm64: Add sanitisation to SCTLR_EL2Marc Zyngier
Sanitise SCTLR_EL2 the usual way. The most important aspect of this is that we benefit from SCTLR_EL2.SPAN being RES1 when HCR_EL2.E2H==0. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-20-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05KVM: arm64: Remove all traces of FEAT_TMEMarc Zyngier
FEAT_TME has been dropped from the architecture. Retrospectively. I'm sure someone is crying somewhere, but most of us won't. Clean-up time. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-18-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05KVM: arm64: Extend unified RESx handling to runtime sanitisationMarc Zyngier
Add a new helper to retrieve the RESx values for a given system register, and use it for the runtime sanitisation. This results in slightly better code generation for a fairly hot path in the hypervisor, and additionally covers all sanitised registers in all conditions, not just the VNCR-based ones. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-05KVM: arm64: Introduce data structure tracking both RES0 and RES1 bitsMarc Zyngier
We have so far mostly tracked RES0 bits, but only made a few attempts at being just as strict for RES1 bits (probably because they are both rarer and harder to handle). Start scratching the surface by introducing a data structure tracking RES0 and RES1 bits at the same time. Note that contrary to the usual idiom, this structure is mostly passed around by value -- the ABI handles it nicely, and the resulting code is much nicer. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202184329.2724080-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-02-02KVM: arm64: nv: Avoid NV stage-2 code when NV is not supportedFuad Tabba
The NV stage-2 manipulation functions kvm_nested_s2_unmap(), kvm_nested_s2_wp(), and others, are being called for any stage-2 manipulation regardless of whether nested virtualization is supported or enabled for the VM. For protected KVM (pKVM), `struct kvm_pgtable` uses the `pkvm_mappings` member of the union. This member aliases `ia_bits`, which is used by the non-protected NV code paths. Attempting to read `pgt->ia_bits` in these functions results in treating protected mapping pointers or state values as bit-shift amounts. This triggers a UBSAN shift-out-of-bounds error: UBSAN: shift-out-of-bounds in arch/arm64/kvm/nested.c:1127:34 shift exponent 174565952 is too large for 64-bit type 'unsigned long' Call trace: __ubsan_handle_shift_out_of_bounds+0x28c/0x2c0 kvm_nested_s2_unmap+0x228/0x248 kvm_arch_flush_shadow_memslot+0x98/0xc0 kvm_set_memslot+0x248/0xce0 Since pKVM and NV are mutually exclusive, prevent entry into these NV handling functions if the VM has not allocated any nested MMUs (i.e., `kvm->arch.nested_mmus_size` is 0). Fixes: 7270cc9157f47 ("KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers") Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260202152310.113467-1-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Convert VTCR_EL2 to config-driven sanitisationMarc Zyngier
Describe all the VTCR_EL2 fields and their respective configurations, making sure that we correctly ignore the bits that are not defined for a given guest configuration. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15arm64: Convert VTCR_EL2 to sysreg infratructureMarc Zyngier
Our definition of VTCR_EL2 is both partial (tons of fields are missing) and totally inconsistent (some constants are shifted, some are not). They are also expressed in terms of TCR, which is rather inconvenient. Replace the ad-hoc definitions with the the generated version. This results in a bunch of additional changes to make the code with the unshifted nature of generated enumerations. The register data was extracted from the BSD licenced AARCHMRS (AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0). Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-02Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set_*_tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...
2025-12-01KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()Marc Zyngier
Keep sparse quiet by explicitly casting endianness conversion when swapping S1 and S2 descriptors. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511260246.JQDGsQKa-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202511260344.9XehvH5Q-lkp@intel.com/ Fixes: c59ca4b5b0c3f ("KVM: arm64: Implement HW access flag management in stage-1 SW PTW") Fixes: 39db933ba67f8 ("KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW") Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251125204848.1136383-1-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: nv: Expose hardware access flag management to NV guestsOliver Upton
Everything is in place to update the access flag at S1 and S2. Expose support for the access flag flavor of FEAT_HAFDBS to NV guests. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-15-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTWOliver Upton
Give the stage-2 walk similar treatment to stage-1: update the access flag during the table walk and do so for any walk context. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-14-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: nv: Use pgtable definitions in stage-2 walkOliver Upton
Use the existing page table definitions instead of magic numbers for the stage-2 table walk. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-10-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: Handle endianness in read helper for emulated PTWOliver Upton
Implementing FEAT_HAFDBS means adding another descriptor accessor that needs to deal with the guest-configured endianness. Prepare by moving the endianness handling into the read accessor and out of the main body of the S1/S2 PTWs. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-9-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTWOliver Upton
The stage-2 table walker passes down the vCPU as a void pointer. That might've made sense if the walker was generic although at this point it is clear this will only ever be used in the context of a vCPU. Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-8-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: Call helper for reading descriptors directlyOliver Upton
Going through a function pointer doesn't serve much purpose when there's only one implementation. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-7-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-12-01KVM: arm64: nv: Advertise support for FEAT_XNXOliver Upton
Everything is in place to support FEAT_XNX, advertise support. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-6-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2Oliver Upton
Add support for FEAT_XNX to shadow stage-2 MMUs, being careful to only evaluate XN[0] when the feature is actually exposed to the VM. Restructure the layering of permissions in the fault handler to assume pX and uX then restricting based on the guest's stage-2 afterwards. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-4-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-12arm64: Fix typos and spelling errors in commentsmrigendrachaubey
This patch corrects several minor typographical and spelling errors in comments across multiple arm64 source files. No functional changes. Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-13KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when availableOliver Upton
Marc reports that the performance of running an L3 guest has regressed by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture. That's of course terrible for the single user of recursive NV ;-) While there's nothing to be done on non-FGT systems, take advantage of the precise write trap of MDSCR_EL1 and leave the rest of the debug registers untrapped. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-30Merge tag 'kvmarm-6.18' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.
2025-09-30Merge tag 'kvmarm-fixes-6.17-2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #3 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when visiting from an MMU notifier - Fixes to the TLB match process and TLB invalidation range for managing the VCNR pseudo-TLB - Prevent SPE from erroneously profiling guests due to UNKNOWN reset values in PMSCR_EL1 - Fix save/restore of host MDCR_EL2 to account for eagerly programming at vcpu_load() on VHE systems - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios where an xarray's spinlock was nested with a *raw* spinlock - Permit stage-2 read permission aborts which are possible in the case of NV depending on the guest hypervisor's stage-2 translation - Call raw_spin_unlock() instead of the internal spinlock API - Fix parameter ordering when assigning VBAR_EL1 [Pull into kvm/master to fix conflicts. - Paolo]
2025-09-20Merge branch kvm-arm64/el2-feature-control into kvmarm-master/nextMarc Zyngier
* kvm-arm64/el2-feature-control: (23 commits) : . : General rework of EL2 features that can be disabled to satisfy : the requirement of migration between heterogeneous hosts: : : - Handle effective RES0 behaviour of undefined registers, making sure : that disabling a feature affects full registeres, and not just : individual control bits. (20250918151402.1665315-1-maz@kernel.org) : : - Allow ID_AA64MMFR1_EL1.{TWED,HCX} to be disabled from userspace. : (20250911114621.3724469-1-yangjinqian1@huawei.com) : : - Turn the NV feature management into a deny-list, and expose : missing features to EL2 guests. : (20250912212258.407350-1-oliver.upton@linux.dev) : . KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMs KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMs KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMs KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMs KVM: arm64: nv: Exclude guest's TWED configuration when TWE isn't set KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMs KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMs KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_frac KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMs KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMs KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg() KVM: arm64: selftests: Test writes to ID_AA64MMFR1_EL1.{HCX, TWED} KVM: arm64: Make ID_AA64MMFR1_EL1.{HCX, TWED} writable from userspace KVM: arm64: Convert MDCR_EL2 RES0 handling to compute_reg_res0_bits() KVM: arm64: Convert SCTLR_EL1 RES0 handling to compute_reg_res0_bits() KVM: arm64: Enforce absence of FEAT_TCR2 on TCR2_EL2 KVM: arm64: Enforce absence of FEAT_SCTLR2 on SCTLR2_EL{1,2} KVM: arm64: Convert HCR_EL2 RES0 handling to compute_reg_res0_bits() KVM: arm64: Enforce absence of FEAT_HCX on HCRX_EL2 KVM: arm64: Enforce absence of FEAT_FGT2 on FGT2 registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-20Merge branch kvm-arm64/nv-debug into kvmarm-master/nextMarc Zyngier
* kvm-arm64/nv-debug: : . : Fix handling of MDSCR_EL1 in NV context, which is unfortunately : mishandled by the architecture. Patches courtesy of Oliver Upton : (20250917203125.283116-2-oliver.upton@linux.dev) : . KVM: arm64: nv: Apply guest's MDCR traps in nested context KVM: arm64: nv: Trap debug registers when in hyp context Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-20KVM: arm64: Account for 52bit when computing maximum OAMarc Zyngier
Adjust the computation of the max OA to account for 52bit PAs. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose up to FEAT_Debugv8p8 to NV-enabled VMsOliver Upton
The changes to the debug architecture up to v8.8 are concerned with external debug, which of course has no direct impact on VMs. Raise the feature limit and document what's preventing us from raising it further. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Advertise FEAT_TIDCP1 to NV-enabled VMsOliver Upton
While KVM does not expose IMPDEF features to VMs, FEAT_TIDCP1 is an architecturally-defined EL1 trap of a particular sysreg encoding range. Furthermore, KVM already advertises this feature to non-NV VMs. As there is no interaction with EL2 traps, expose the feature. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Advertise FEAT_SpecSEI to NV-enabled VMsOliver Upton
FEAT_SpecSEI is an informational feature describing whether speculative loads may generate SErrors. Since there are already cases where KVM reinjects an SError into the VM it is already possible this may happen due to a speculative load within the VM. Stop hiding the feature from NV-enabled VMs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_TWED to NV-enabled VMsOliver Upton
KVM now handles HCR_EL2.{TWEDEn,TWEDEL} correctly when computing the effective HCR for a nested context. Advertise the feature. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_AFP to NV-enabled VMsOliver Upton
FEAT_AFP doesn't intersect with any EL2 trap behavior, expose to NV-enabled VMs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_ECBHB to NV-enabled VMsOliver Upton
The exact wording of the restrictions on branch prediction due to FEAT_ECBHB in DDI0487L.b is as follows: When FEAT_ECBHB is implemented, the branch history information created in a context before an exception to a higher Exception level using AArch64 cannot be used by code before that exception to exploitatively control the execution of any indirect branches in code in a different context after the exception. While vEL2 and EL1 are multiplexed at EL1, they exist in different hardware-described contexts as KVM uses different stage-2 MMUs to represent the corresponding translation regimes. Additionally, exception entries into vEL2 always imply a hardware exception entry into literal EL2 for the emulated regime change. Given all of this, and the fact that FEAT_ECBHB places no limitation on the EL of the protected context after the exception, we can claim FEAT_ECBHB on supporting hardware. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_RASv1p1 via RAS_fracOliver Upton
KVM already supports FEAT_RASv1p1 for NV-enabled VMs but only when advertised through the canonical field. Stop masking the silly frac field to expose the feature on systems without FEAT_DF. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Expose FEAT_DF2 to NV-enabled VMsOliver Upton
The supporting infrastructure in KVM's abort injection code was merged a while ago, but the author (me!) forgot to relax the NV limitation when FEAT_DF2 got exposed to non-NV VMs. Fix it. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Don't erroneously claim FEAT_DoubleLock for NV VMsOliver Upton
ID_AA64DFR0_EL1.DoubleLock is one of those annoying signed feature fields where a non-negative value implies that a feature is implemented and a negative value implies that it is not. While the intention of masking this field was likely to hide the feature, KVM actually advertises it, even on unsupporting hardware. Remove FEAT_DoubleLock from the mask, making the NI value visible to the VM. Take care to accept the old, incorrect values for this field as we've lied to userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-19KVM: arm64: nv: Convert masks to denylists in limit_nv_id_reg()Oliver Upton
Consistently use denylisting of features such that the limitations of KVM's nested implementation are explicitly documented (rather than implied). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-18KVM: arm64: nv: Apply guest's MDCR traps in nested contextOliver Upton
KVM needs to ensure the guest hypervisor's traps take effect when the vCPU is in a nested context. While supporting infrastructure is in place for most of the EL2 trap registers, MDCR_EL2 is not. Fold the guest's trap configuration into the effective MDCR_EL2. Apply it directly to the in-memory representation as it gets recomputed on every vcpu_load() anyway. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-18KVM: arm64: nv: Trap debug registers when in hyp contextOliver Upton
In case you haven't realized it yet, the architecture is _slightly_ broken in the context of nested virt. Here we have another example of FEAT_NV2 redirecting a sysreg (MDSCR_EL1) to memory that actually affects execution at vEL2. Fortunately, MDCR_EL2.TDA provides the necessary traps to hide this mess at the expense of unnecessarily trapping the breakpoint/watchpoint registers. Yes, FEAT_FGT gives us a precise trap but let's just opt for obvious correctness to start. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-09-10KVM: arm64: nv: Fix incorrect VNCR invalidation range calculationDongha Lee
The code for invalidating VNCR entries in both kvm_invalidate_vncr_ipa() and invalidate_vncr_va() incorrectly uses a bitwise AND with `(size - 1)` instead of `~(size - 1)` to align the start address. This results in masking the address bits instead of aligning them down to the start of the block. This bug may cause stale VNCR TLB entries to remain valid even after a TLBI or MMU notifier, leading to incorrect memory translation and unexpected guest behavior. Credit to Team 0xB6 in bob14: DongHa Lee, Gyujeong Jin, Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang. Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Dongha Lee <p@sswd.pw> Link: https://lore.kernel.org/r/20250906040724.72960-1-p@sswd.pw Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-09-10KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entriesGeonha Lee
kvm_vncr_tlb_lookup() is supposed to return true when the cached VNCR TLB entry is valid for the current context. For non-Global entries, that means the entry’s ASID must match the current ASID. The current code returns true when the ASIDs do *not* match, which inverts the logic. This is a potential vulnerability: - Valid entries are ignored and we fall back to kvm_translate_vncr(), hurting performance. - Mismatched entries are treated as permission faults (-EPERM) instead of triggering a fresh translation. - This can also cause stale translations to be (wrongly) considered valid across address spaces. Flip the predicate so non-Global entries only hit when ASIDs match. Special credit to Team 0xB6 for reporting: DongHa Lee, Gyujeong Jin, Daehyeon Ko, Geonha Lee, Hyungyu Oh, and Jaewon Yang. Signed-off-by: Geonha Lee <w1nsom3gna@korea.ac.kr> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250903150421.90752-1-w1nsom3gna@korea.ac.kr Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-08-29Merge tag 'kvmarm-fixes-6.17-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes
2025-08-27KVM: arm64: nv: Handle VNCR_EL2-triggered faults backed by guest_memfdFuad Tabba
Handle faults for memslots backed by guest_memfd in arm64 nested virtualization triggered by VNCR_EL2. * Introduce is_gmem output parameter to kvm_translate_vncr(), indicating whether the faulted memory slot is backed by guest_memfd. * Dispatch faults backed by guest_memfd to kvm_gmem_get_pfn(). * Update kvm_handle_vncr_abort() to handle potential guest_memfd errors. Some of the guest_memfd errors need to be handled by userspace instead of attempting to (implicitly) retry by returning to the guest. Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250729225455.670324-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-08-04KVM: arm64: nv: Handle SEAs due to VNCR redirectionOliver Upton
System register accesses redirected to the VNCR page can also generate external aborts just like any other form of memory access. Route to kvm_handle_guest_sea() for potential APEI handling, falling back to a vSError if the kernel didn't handle the abort. Take the opportunity to throw out the useless kvm_ras.h which provided a helper with a single callsite... Cc: Jiaqi Yan <jiaqiyan@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250729182342.3281742-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>