* PSCI relay at EL2 when "protected KVM" is enabled
 * New exception injection code
 * Simplification of AArch32 system register handling
 * Fix PMU accesses when no PMU is enabled
 * Expose CSV3 on non-Meltdown hosts
 * Cache hierarchy discovery fixes
 * PV steal-time cleanups
 * Allow function pointers at EL2
 * Various host EL2 entry cleanups
 * Simplification of the EL2 vector allocation
 
 s390:
 * memcg accouting for s390 specific parts of kvm and gmap
 * selftest for diag318
 * new kvm_stat for when async_pf falls back to sync
 
 x86:
 * Tracepoints for the new pagetable code from 5.10
 * Catch VFIO and KVM irqfd events before userspace
 * Reporting dirty pages to userspace with a ring buffer
 * SEV-ES host support
 * Nested VMX support for wait-for-SIPI activity state
 * New feature flag (AVX512 FP16)
 * New system ioctl to report Hyper-V-compatible paravirtualization features
 
 Generic:
 * Selftest improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
 lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
 BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
 XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
 bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
 drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
 =ISud
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Much x86 work was pushed out to 5.12, but ARM more than made up for it.

  ARM:
   - PSCI relay at EL2 when "protected KVM" is enabled
   - New exception injection code
   - Simplification of AArch32 system register handling
   - Fix PMU accesses when no PMU is enabled
   - Expose CSV3 on non-Meltdown hosts
   - Cache hierarchy discovery fixes
   - PV steal-time cleanups
   - Allow function pointers at EL2
   - Various host EL2 entry cleanups
   - Simplification of the EL2 vector allocation

  s390:
   - memcg accouting for s390 specific parts of kvm and gmap
   - selftest for diag318
   - new kvm_stat for when async_pf falls back to sync

  x86:
   - Tracepoints for the new pagetable code from 5.10
   - Catch VFIO and KVM irqfd events before userspace
   - Reporting dirty pages to userspace with a ring buffer
   - SEV-ES host support
   - Nested VMX support for wait-for-SIPI activity state
   - New feature flag (AVX512 FP16)
   - New system ioctl to report Hyper-V-compatible paravirtualization features

  Generic:
   - Selftest improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
  KVM: SVM: fix 32-bit compilation
  KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
  KVM: SVM: Provide support to launch and run an SEV-ES guest
  KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
  KVM: SVM: Provide support for SEV-ES vCPU loading
  KVM: SVM: Provide support for SEV-ES vCPU creation/loading
  KVM: SVM: Update ASID allocation to support SEV-ES guests
  KVM: SVM: Set the encryption mask for the SVM host save area
  KVM: SVM: Add NMI support for an SEV-ES guest
  KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
  KVM: SVM: Do not report support for SMM for an SEV-ES guest
  KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
  KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
  KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
  KVM: SVM: Add support for EFER write traps for an SEV-ES guest
  KVM: SVM: Support string IO operations for an SEV-ES guest
  KVM: SVM: Support MMIO for an SEV-ES guest
  KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
  KVM: SVM: Create trace events for VMGEXIT processing
  ...
This commit is contained in:
Linus Torvalds 2020-12-20 10:44:05 -08:00
commit 6a447b0e31
171 changed files with 7254 additions and 2762 deletions

View file

@ -2254,6 +2254,16 @@
for all guests.
Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
kvm-arm.mode=
[KVM,ARM] Select one of KVM/arm64's modes of operation.
protected: nVHE-based mode with support for guests whose
state is kept private from the host.
Not valid if the kernel is running in EL2.
Defaults to VHE/nVHE based on hardware support and
the value of CONFIG_ARM64_VHE.
kvm-arm.vgic_v3_group0_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-0
system registers

View file

@ -97,7 +97,7 @@ hypervisor maps kernel pages in EL2 at a fixed (and potentially
random) offset from the linear mapping. See the kern_hyp_va macro and
kvm_update_va_mask function for more details. MMIO devices such as
GICv2 gets mapped next to the HYP idmap page, as do vectors when
ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
ARM64_SPECTRE_V3A is enabled for particular CPUs.
When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.

View file

@ -262,6 +262,18 @@ The KVM_RUN ioctl (cf.) communicates with userspace via a shared
memory region. This ioctl returns the size of that region. See the
KVM_RUN documentation for details.
Besides the size of the KVM_RUN communication region, other areas of
the VCPU file descriptor can be mmap-ed, including:
- if KVM_CAP_COALESCED_MMIO is available, a page at
KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE; for historical reasons,
this page is included in the result of KVM_GET_VCPU_MMAP_SIZE.
KVM_CAP_COALESCED_MMIO is not documented yet.
- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at
KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on
KVM_CAP_DIRTY_LOG_RING, see section 8.3.
4.6 KVM_SET_MEMORY_REGION
-------------------------
@ -4455,9 +4467,9 @@ that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present.
4.118 KVM_GET_SUPPORTED_HV_CPUID
--------------------------------
:Capability: KVM_CAP_HYPERV_CPUID
:Capability: KVM_CAP_HYPERV_CPUID (vcpu), KVM_CAP_SYS_HYPERV_CPUID (system)
:Architectures: x86
:Type: vcpu ioctl
:Type: system ioctl, vcpu ioctl
:Parameters: struct kvm_cpuid2 (in/out)
:Returns: 0 on success, -1 on error
@ -4502,9 +4514,6 @@ Currently, the following list of CPUID leaves are returned:
- HYPERV_CPUID_SYNDBG_INTERFACE
- HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES
HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
Userspace invokes KVM_GET_SUPPORTED_HV_CPUID by passing a kvm_cpuid2 structure
with the 'nent' field indicating the number of entries in the variable-size
array 'entries'. If the number of entries is too low to describe all Hyper-V
@ -4515,6 +4524,15 @@ number of valid entries in the 'entries' array, which is then filled.
'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
userspace should not expect to get any particular value there.
Note, vcpu version of KVM_GET_SUPPORTED_HV_CPUID is currently deprecated. Unlike
system ioctl which exposes all supported feature bits unconditionally, vcpu
version has the following quirks:
- HYPERV_CPUID_NESTED_FEATURES leaf and HV_X64_ENLIGHTENED_VMCS_RECOMMENDED
feature bit are only exposed when Enlightened VMCS was previously enabled
on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
- HV_STIMER_DIRECT_MODE_AVAILABLE bit is only exposed with in-kernel LAPIC.
(presumes KVM_CREATE_IRQCHIP has already been called).
4.119 KVM_ARM_VCPU_FINALIZE
---------------------------
@ -6390,3 +6408,91 @@ When enabled, KVM will disable paravirtual features provided to the
guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
(0x40000001). Otherwise, a guest may use the paravirtual features
regardless of what has actually been exposed through the CPUID leaf.
8.29 KVM_CAP_DIRTY_LOG_RING
---------------------------
:Architectures: x86
:Parameters: args[0] - size of the dirty log ring
KVM is capable of tracking dirty memory using ring buffers that are
mmaped into userspace; there is one dirty ring per vcpu.
The dirty ring is available to userspace as an array of
``struct kvm_dirty_gfn``. Each dirty entry it's defined as::
struct kvm_dirty_gfn {
__u32 flags;
__u32 slot; /* as_id | slot_id */
__u64 offset;
};
The following values are defined for the flags field to define the
current state of the entry::
#define KVM_DIRTY_GFN_F_DIRTY BIT(0)
#define KVM_DIRTY_GFN_F_RESET BIT(1)
#define KVM_DIRTY_GFN_F_MASK 0x3
Userspace should call KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM
ioctl to enable this capability for the new guest and set the size of
the rings. Enabling the capability is only allowed before creating any
vCPU, and the size of the ring must be a power of two. The larger the
ring buffer, the less likely the ring is full and the VM is forced to
exit to userspace. The optimal size depends on the workload, but it is
recommended that it be at least 64 KiB (4096 entries).
Just like for dirty page bitmaps, the buffer tracks writes to
all user memory regions for which the KVM_MEM_LOG_DIRTY_PAGES flag was
set in KVM_SET_USER_MEMORY_REGION. Once a memory region is registered
with the flag set, userspace can start harvesting dirty pages from the
ring buffer.
An entry in the ring buffer can be unused (flag bits ``00``),
dirty (flag bits ``01``) or harvested (flag bits ``1X``). The
state machine for the entry is as follows::
dirtied harvested reset
00 -----------> 01 -------------> 1X -------+
^ |
| |
+------------------------------------------+
To harvest the dirty pages, userspace accesses the mmaped ring buffer
to read the dirty GFNs. If the flags has the DIRTY bit set (at this stage
the RESET bit must be cleared), then it means this GFN is a dirty GFN.
The userspace should harvest this GFN and mark the flags from state
``01b`` to ``1Xb`` (bit 0 will be ignored by KVM, but bit 1 must be set
to show that this GFN is harvested and waiting for a reset), and move
on to the next GFN. The userspace should continue to do this until the
flags of a GFN have the DIRTY bit cleared, meaning that it has harvested
all the dirty GFNs that were available.
It's not necessary for userspace to harvest the all dirty GFNs at once.
However it must collect the dirty GFNs in sequence, i.e., the userspace
program cannot skip one dirty GFN to collect the one next to it.
After processing one or more entries in the ring buffer, userspace
calls the VM ioctl KVM_RESET_DIRTY_RINGS to notify the kernel about
it, so that the kernel will reprotect those collected GFNs.
Therefore, the ioctl must be called *before* reading the content of
the dirty pages.
The dirty ring can get full. When it happens, the KVM_RUN of the
vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL.
The dirty ring interface has a major difference comparing to the
KVM_GET_DIRTY_LOG interface in that, when reading the dirty ring from
userspace, it's still possible that the kernel has not yet flushed the
processor's dirty page buffers into the kernel buffer (with dirty bitmaps, the
flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one
needs to kick the vcpu out of KVM_RUN using a signal. The resulting
vmexit ensures that all dirty GFNs are flushed to the dirty rings.
NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling
KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
machine will switch to ring-buffer dirty page tracking and further
KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.

View file

@ -19,8 +19,8 @@ Two new SMCCC compatible hypercalls are defined:
These are only available in the SMC64/HVC64 calling convention as
paravirtualized time is not available to 32 bit Arm guests. The existence of
the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
mechanism before calling it.
the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1
ARCH_FEATURES mechanism before calling it.
PV_TIME_FEATURES
============= ======== ==========

View file

@ -19,7 +19,7 @@
#define ARM64_HAS_VIRT_HOST_EXTN 11
#define ARM64_WORKAROUND_CAVIUM_27456 12
#define ARM64_HAS_32BIT_EL0 13
#define ARM64_HARDEN_EL2_VECTORS 14
#define ARM64_SPECTRE_V3A 14
#define ARM64_HAS_CNP 15
#define ARM64_HAS_NO_FPSIMD 16
#define ARM64_WORKAROUND_REPEAT_TLBI 17
@ -65,7 +65,8 @@
#define ARM64_MTE 57
#define ARM64_WORKAROUND_1508412 58
#define ARM64_HAS_LDAPR 59
#define ARM64_KVM_PROTECTED_MODE 60
#define ARM64_NCAPS 60
#define ARM64_NCAPS 61
#endif /* __ASM_CPUCAPS_H */

View file

@ -705,6 +705,11 @@ static inline bool system_supports_generic_auth(void)
cpus_have_const_cap(ARM64_HAS_GENERIC_AUTH);
}
static inline bool system_has_full_ptr_auth(void)
{
return system_supports_address_auth() && system_supports_generic_auth();
}
static __always_inline bool system_uses_irq_prio_masking(void)
{
return IS_ENABLED(CONFIG_ARM64_PSEUDO_NMI) &&

View file

@ -0,0 +1,181 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2012,2013 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
*/
#ifndef __ARM_KVM_INIT_H__
#define __ARM_KVM_INIT_H__
#ifndef __ASSEMBLY__
#error Assembly-only header
#endif
#include <asm/kvm_arm.h>
#include <asm/ptrace.h>
#include <asm/sysreg.h>
#include <linux/irqchip/arm-gic-v3.h>
.macro __init_el2_sctlr
mov_q x0, INIT_SCTLR_EL2_MMU_OFF
msr sctlr_el2, x0
isb
.endm
/*
* Allow Non-secure EL1 and EL0 to access physical timer and counter.
* This is not necessary for VHE, since the host kernel runs in EL2,
* and EL0 accesses are configured in the later stage of boot process.
* Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
* as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
* to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
* to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
* EL2.
*/
.macro __init_el2_timers mode
.ifeqs "\mode", "nvhe"
mrs x0, cnthctl_el2
orr x0, x0, #3 // Enable EL1 physical timers
msr cnthctl_el2, x0
.endif
msr cntvoff_el2, xzr // Clear virtual offset
.endm
.macro __init_el2_debug mode
mrs x1, id_aa64dfr0_el1
sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
cmp x0, #1
b.lt 1f // Skip if no PMU present
mrs x0, pmcr_el0 // Disable debug access traps
ubfx x0, x0, #11, #5 // to EL2 and allow access to
1:
csel x2, xzr, x0, lt // all PMU counters from EL1
/* Statistical profiling */
ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
cbz x0, 3f // Skip if SPE not present
.ifeqs "\mode", "nvhe"
mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2,
and x0, x0, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
cbnz x0, 2f // then permit sampling of physical
mov x0, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
1 << SYS_PMSCR_EL2_PA_SHIFT)
msr_s SYS_PMSCR_EL2, x0 // addresses and physical counter
2:
mov x0, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
orr x2, x2, x0 // If we don't have VHE, then
// use EL1&0 translation.
.else
orr x2, x2, #MDCR_EL2_TPMS // For VHE, use EL2 translation
// and disable access from EL1
.endif
3:
msr mdcr_el2, x2 // Configure debug traps
.endm
/* LORegions */
.macro __init_el2_lor
mrs x1, id_aa64mmfr1_el1
ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
cbz x0, 1f
msr_s SYS_LORC_EL1, xzr
1:
.endm
/* Stage-2 translation */
.macro __init_el2_stage2
msr vttbr_el2, xzr
.endm
/* GICv3 system register access */
.macro __init_el2_gicv3
mrs x0, id_aa64pfr0_el1
ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
cbz x0, 1f
mrs_s x0, SYS_ICC_SRE_EL2
orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
msr_s SYS_ICC_SRE_EL2, x0
isb // Make sure SRE is now set
mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
tbz x0, #0, 1f // and check that it sticks
msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
1:
.endm
.macro __init_el2_hstr
msr hstr_el2, xzr // Disable CP15 traps to EL2
.endm
/* Virtual CPU ID registers */
.macro __init_el2_nvhe_idregs
mrs x0, midr_el1
mrs x1, mpidr_el1
msr vpidr_el2, x0
msr vmpidr_el2, x1
.endm
/* Coprocessor traps */
.macro __init_el2_nvhe_cptr
mov x0, #0x33ff
msr cptr_el2, x0 // Disable copro. traps to EL2
.endm
/* SVE register access */
.macro __init_el2_nvhe_sve
mrs x1, id_aa64pfr0_el1
ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
cbz x1, 1f
bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
msr cptr_el2, x0 // Disable copro. traps to EL2
isb
mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
msr_s SYS_ZCR_EL2, x1 // length for EL1.
1:
.endm
.macro __init_el2_nvhe_prepare_eret
mov x0, #INIT_PSTATE_EL1
msr spsr_el2, x0
.endm
/**
* Initialize EL2 registers to sane values. This should be called early on all
* cores that were booted in EL2.
*
* Regs: x0, x1 and x2 are clobbered.
*/
.macro init_el2_state mode
.ifnes "\mode", "vhe"
.ifnes "\mode", "nvhe"
.error "Invalid 'mode' argument"
.endif
.endif
__init_el2_sctlr
__init_el2_timers \mode
__init_el2_debug \mode
__init_el2_lor
__init_el2_stage2
__init_el2_gicv3
__init_el2_hstr
/*
* When VHE is not in use, early init of EL2 needs to be done here.
* When VHE _is_ in use, EL1 will not be used in the host and
* requires no configuration, and all non-hyp-specific EL2 setup
* will be done via the _EL1 system register aliases in __cpu_setup.
*/
.ifeqs "\mode", "nvhe"
__init_el2_nvhe_idregs
__init_el2_nvhe_cptr
__init_el2_nvhe_sve
__init_el2_nvhe_prepare_eret
.endif
.endm
#endif /* __ARM_KVM_INIT_H__ */

View file

@ -80,6 +80,7 @@
HCR_FMO | HCR_IMO | HCR_PTW )
#define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
/* TCR_EL2 Registers bits */

View file

@ -34,8 +34,6 @@
*/
#define KVM_VECTOR_PREAMBLE (2 * AARCH64_INSN_SIZE)
#define __SMCCC_WORKAROUND_1_SMC_SZ 36
#define KVM_HOST_SMCCC_ID(id) \
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
ARM_SMCCC_SMC_64, \
@ -150,6 +148,14 @@ extern void *__vhe_undefined_symbol;
#endif
struct kvm_nvhe_init_params {
unsigned long mair_el2;
unsigned long tcr_el2;
unsigned long tpidr_el2;
unsigned long stack_hyp_va;
phys_addr_t pgd_pa;
};
/* Translate a kernel address @ptr into its equivalent linear mapping */
#define kvm_ksym_ref(ptr) \
({ \
@ -165,17 +171,14 @@ struct kvm_vcpu;
struct kvm_s2_mmu;
DECLARE_KVM_NVHE_SYM(__kvm_hyp_init);
DECLARE_KVM_NVHE_SYM(__kvm_hyp_host_vector);
DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
#define __kvm_hyp_init CHOOSE_NVHE_SYM(__kvm_hyp_init)
#define __kvm_hyp_host_vector CHOOSE_NVHE_SYM(__kvm_hyp_host_vector)
#define __kvm_hyp_vector CHOOSE_HYP_SYM(__kvm_hyp_vector)
extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_SYM(__per_cpu_start);
DECLARE_KVM_NVHE_SYM(__per_cpu_end);
extern atomic_t arm64_el2_vector_last_slot;
DECLARE_KVM_HYP_SYM(__bp_harden_hyp_vecs);
#define __bp_harden_hyp_vecs CHOOSE_HYP_SYM(__bp_harden_hyp_vecs)
@ -189,8 +192,6 @@ extern void __kvm_timer_set_cntvoff(u64 cntvoff);
extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
extern void __kvm_enable_ssbs(void);
extern u64 __vgic_v3_get_ich_vtr_el2(void);
extern u64 __vgic_v3_read_vmcr(void);
extern void __vgic_v3_write_vmcr(u32 vmcr);
@ -198,8 +199,6 @@ extern void __vgic_v3_init_lrs(void);
extern u32 __kvm_get_mdcr_el2(void);
extern char __smccc_workaround_1_smc[__SMCCC_WORKAROUND_1_SMC_SZ];
#if defined(GCC_VERSION) && GCC_VERSION < 50000
#define SYM_CONSTRAINT "i"
#else

View file

@ -1,38 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2012,2013 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
*
* Derived from arch/arm/include/asm/kvm_coproc.h
* Copyright (C) 2012 Rusty Russell IBM Corporation
*/
#ifndef __ARM64_KVM_COPROC_H__
#define __ARM64_KVM_COPROC_H__
#include <linux/kvm_host.h>
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
struct kvm_sys_reg_table {
const struct sys_reg_desc *table;
size_t num;
};
int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu);
int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
#define kvm_coproc_table_init kvm_sys_reg_table_init
void kvm_sys_reg_table_init(void);
struct kvm_one_reg;
int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
unsigned long kvm_arm_num_sys_reg_descs(struct kvm_vcpu *vcpu);
#endif /* __ARM64_KVM_COPROC_H__ */

View file

@ -21,20 +21,25 @@
#include <asm/cputype.h>
#include <asm/virt.h>
unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
#define CURRENT_EL_SP_EL0_VECTOR 0x0
#define CURRENT_EL_SP_ELx_VECTOR 0x200
#define LOWER_EL_AArch64_VECTOR 0x400
#define LOWER_EL_AArch32_VECTOR 0x600
enum exception_type {
except_type_sync = 0,
except_type_irq = 0x80,
except_type_fiq = 0x100,
except_type_serror = 0x180,
};
bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
void kvm_skip_instr32(struct kvm_vcpu *vcpu);
void kvm_inject_undefined(struct kvm_vcpu *vcpu);
void kvm_inject_vabt(struct kvm_vcpu *vcpu);
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_undef32(struct kvm_vcpu *vcpu);
void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
{
@ -168,30 +173,6 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
vcpu_gp_regs(vcpu)->regs[reg_num] = val;
}
static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
{
if (vcpu_mode_is_32bit(vcpu))
return vcpu_read_spsr32(vcpu);
if (vcpu->arch.sysregs_loaded_on_cpu)
return read_sysreg_el1(SYS_SPSR);
else
return __vcpu_sys_reg(vcpu, SPSR_EL1);
}
static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
{
if (vcpu_mode_is_32bit(vcpu)) {
vcpu_write_spsr32(vcpu, v);
return;
}
if (vcpu->arch.sysregs_loaded_on_cpu)
write_sysreg_el1(v, SYS_SPSR);
else
__vcpu_sys_reg(vcpu, SPSR_EL1) = v;
}
/*
* The layout of SPSR for an AArch32 state is different when observed from an
* AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
@ -477,32 +458,9 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
return data; /* Leave LE untouched */
}
static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu)
{
if (vcpu_mode_is_32bit(vcpu)) {
kvm_skip_instr32(vcpu, is_wide_instr);
} else {
*vcpu_pc(vcpu) += 4;
*vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
}
/* advance the singlestep state machine */
*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
}
/*
* Skip an instruction which has been emulated at hyp while most guest sysregs
* are live.
*/
static __always_inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
{
*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC;
}
#endif /* __ARM64_KVM_EMULATE_H__ */

View file

@ -50,6 +50,16 @@
#define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
KVM_DIRTY_LOG_INITIALLY_SET)
/*
* Mode of operation configurable with kvm-arm.mode early param.
* See Documentation/admin-guide/kernel-parameters.txt for more information.
*/
enum kvm_mode {
KVM_MODE_DEFAULT,
KVM_MODE_PROTECTED,
};
enum kvm_mode kvm_get_mode(void);
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
extern unsigned int kvm_sve_max_vl;
@ -58,8 +68,6 @@ int kvm_arm_init_sve(void);
int __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
struct kvm_vmid {
/* The VMID generation used for the virt. memory system */
@ -89,6 +97,9 @@ struct kvm_s2_mmu {
struct kvm *kvm;
};
struct kvm_arch_memory_slot {
};
struct kvm_arch {
struct kvm_s2_mmu mmu;
@ -120,6 +131,7 @@ struct kvm_arch {
unsigned int pmuver;
u8 pfr0_csv2;
u8 pfr0_csv3;
};
struct kvm_vcpu_fault_info {
@ -203,48 +215,6 @@ enum vcpu_sysreg {
NR_SYS_REGS /* Nothing after this line! */
};
/* 32bit mapping */
#define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
#define c0_CSSELR (CSSELR_EL1 * 2)/* Cache Size Selection Register */
#define c1_SCTLR (SCTLR_EL1 * 2) /* System Control Register */
#define c1_ACTLR (ACTLR_EL1 * 2) /* Auxiliary Control Register */
#define c1_CPACR (CPACR_EL1 * 2) /* Coprocessor Access Control */
#define c2_TTBR0 (TTBR0_EL1 * 2) /* Translation Table Base Register 0 */
#define c2_TTBR0_high (c2_TTBR0 + 1) /* TTBR0 top 32 bits */
#define c2_TTBR1 (TTBR1_EL1 * 2) /* Translation Table Base Register 1 */
#define c2_TTBR1_high (c2_TTBR1 + 1) /* TTBR1 top 32 bits */
#define c2_TTBCR (TCR_EL1 * 2) /* Translation Table Base Control R. */
#define c3_DACR (DACR32_EL2 * 2)/* Domain Access Control Register */
#define c5_DFSR (ESR_EL1 * 2) /* Data Fault Status Register */
#define c5_IFSR (IFSR32_EL2 * 2)/* Instruction Fault Status Register */
#define c5_ADFSR (AFSR0_EL1 * 2) /* Auxiliary Data Fault Status R */
#define c5_AIFSR (AFSR1_EL1 * 2) /* Auxiliary Instr Fault Status R */
#define c6_DFAR (FAR_EL1 * 2) /* Data Fault Address Register */
#define c6_IFAR (c6_DFAR + 1) /* Instruction Fault Address Register */
#define c7_PAR (PAR_EL1 * 2) /* Physical Address Register */
#define c7_PAR_high (c7_PAR + 1) /* PAR top 32 bits */
#define c10_PRRR (MAIR_EL1 * 2) /* Primary Region Remap Register */
#define c10_NMRR (c10_PRRR + 1) /* Normal Memory Remap Register */
#define c12_VBAR (VBAR_EL1 * 2) /* Vector Base Address Register */
#define c13_CID (CONTEXTIDR_EL1 * 2) /* Context ID Register */
#define c13_TID_URW (TPIDR_EL0 * 2) /* Thread ID, User R/W */
#define c13_TID_URO (TPIDRRO_EL0 * 2)/* Thread ID, User R/O */
#define c13_TID_PRIV (TPIDR_EL1 * 2) /* Thread ID, Privileged */
#define c10_AMAIR0 (AMAIR_EL1 * 2) /* Aux Memory Attr Indirection Reg */
#define c10_AMAIR1 (c10_AMAIR0 + 1)/* Aux Memory Attr Indirection Reg */
#define c14_CNTKCTL (CNTKCTL_EL1 * 2) /* Timer Control Register (PL1) */
#define cp14_DBGDSCRext (MDSCR_EL1 * 2)
#define cp14_DBGBCR0 (DBGBCR0_EL1 * 2)
#define cp14_DBGBVR0 (DBGBVR0_EL1 * 2)
#define cp14_DBGBXVR0 (cp14_DBGBVR0 + 1)
#define cp14_DBGWCR0 (DBGWCR0_EL1 * 2)
#define cp14_DBGWVR0 (DBGWVR0_EL1 * 2)
#define cp14_DBGDCCINT (MDCCINT_EL1 * 2)
#define cp14_DBGVCR (DBGVCR32_EL2 * 2)
#define NR_COPRO_REGS (NR_SYS_REGS * 2)
struct kvm_cpu_context {
struct user_pt_regs regs; /* sp = sp_el0 */
@ -255,10 +225,7 @@ struct kvm_cpu_context {
struct user_fpsimd_state fp_regs;
union {
u64 sys_regs[NR_SYS_REGS];
u32 copro[NR_COPRO_REGS];
};
struct kvm_vcpu *__hyp_running_vcpu;
};
@ -409,6 +376,31 @@ struct kvm_vcpu_arch {
#define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */
#define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */
#define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */
#define KVM_ARM64_PENDING_EXCEPTION (1 << 8) /* Exception pending */
#define KVM_ARM64_EXCEPT_MASK (7 << 9) /* Target EL/MODE */
/*
* When KVM_ARM64_PENDING_EXCEPTION is set, KVM_ARM64_EXCEPT_MASK can
* take the following values:
*
* For AArch32 EL1:
*/
#define KVM_ARM64_EXCEPT_AA32_UND (0 << 9)
#define KVM_ARM64_EXCEPT_AA32_IABT (1 << 9)
#define KVM_ARM64_EXCEPT_AA32_DABT (2 << 9)
/* For AArch64: */
#define KVM_ARM64_EXCEPT_AA64_ELx_SYNC (0 << 9)
#define KVM_ARM64_EXCEPT_AA64_ELx_IRQ (1 << 9)
#define KVM_ARM64_EXCEPT_AA64_ELx_FIQ (2 << 9)
#define KVM_ARM64_EXCEPT_AA64_ELx_SERR (3 << 9)
#define KVM_ARM64_EXCEPT_AA64_EL1 (0 << 11)
#define KVM_ARM64_EXCEPT_AA64_EL2 (1 << 11)
/*
* Overlaps with KVM_ARM64_EXCEPT_MASK on purpose so that it can't be
* set together with an exception...
*/
#define KVM_ARM64_INCREMENT_PC (1 << 9) /* Increment PC */
#define vcpu_has_sve(vcpu) (system_supports_sve() && \
((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
@ -440,14 +432,96 @@ struct kvm_vcpu_arch {
u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg);
void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
/*
* CP14 and CP15 live in the same array, as they are backed by the
* same system registers.
static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
{
/*
* *** VHE ONLY ***
*
* System registers listed in the switch are not saved on every
* exit from the guest but are only saved on vcpu_put.
*
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the guest cannot modify its
* own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
* thread when emulating cross-VCPU communication.
*/
#define CPx_BIAS IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)
if (!has_vhe())
return false;
#define vcpu_cp14(v,r) ((v)->arch.ctxt.copro[(r) ^ CPx_BIAS])
#define vcpu_cp15(v,r) ((v)->arch.ctxt.copro[(r) ^ CPx_BIAS])
switch (reg) {
case CSSELR_EL1: *val = read_sysreg_s(SYS_CSSELR_EL1); break;
case SCTLR_EL1: *val = read_sysreg_s(SYS_SCTLR_EL12); break;
case CPACR_EL1: *val = read_sysreg_s(SYS_CPACR_EL12); break;
case TTBR0_EL1: *val = read_sysreg_s(SYS_TTBR0_EL12); break;
case TTBR1_EL1: *val = read_sysreg_s(SYS_TTBR1_EL12); break;
case TCR_EL1: *val = read_sysreg_s(SYS_TCR_EL12); break;
case ESR_EL1: *val = read_sysreg_s(SYS_ESR_EL12); break;
case AFSR0_EL1: *val = read_sysreg_s(SYS_AFSR0_EL12); break;
case AFSR1_EL1: *val = read_sysreg_s(SYS_AFSR1_EL12); break;
case FAR_EL1: *val = read_sysreg_s(SYS_FAR_EL12); break;
case MAIR_EL1: *val = read_sysreg_s(SYS_MAIR_EL12); break;
case VBAR_EL1: *val = read_sysreg_s(SYS_VBAR_EL12); break;
case CONTEXTIDR_EL1: *val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
case TPIDR_EL0: *val = read_sysreg_s(SYS_TPIDR_EL0); break;
case TPIDRRO_EL0: *val = read_sysreg_s(SYS_TPIDRRO_EL0); break;
case TPIDR_EL1: *val = read_sysreg_s(SYS_TPIDR_EL1); break;
case AMAIR_EL1: *val = read_sysreg_s(SYS_AMAIR_EL12); break;
case CNTKCTL_EL1: *val = read_sysreg_s(SYS_CNTKCTL_EL12); break;
case ELR_EL1: *val = read_sysreg_s(SYS_ELR_EL12); break;
case PAR_EL1: *val = read_sysreg_par(); break;
case DACR32_EL2: *val = read_sysreg_s(SYS_DACR32_EL2); break;
case IFSR32_EL2: *val = read_sysreg_s(SYS_IFSR32_EL2); break;
case DBGVCR32_EL2: *val = read_sysreg_s(SYS_DBGVCR32_EL2); break;
default: return false;
}
return true;
}
static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
{
/*
* *** VHE ONLY ***
*
* System registers listed in the switch are not restored on every
* entry to the guest but are only restored on vcpu_load.
*
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the MPIDR should only be set
* once, before running the VCPU, and never changed later.
*/
if (!has_vhe())
return false;
switch (reg) {
case CSSELR_EL1: write_sysreg_s(val, SYS_CSSELR_EL1); break;
case SCTLR_EL1: write_sysreg_s(val, SYS_SCTLR_EL12); break;
case CPACR_EL1: write_sysreg_s(val, SYS_CPACR_EL12); break;
case TTBR0_EL1: write_sysreg_s(val, SYS_TTBR0_EL12); break;
case TTBR1_EL1: write_sysreg_s(val, SYS_TTBR1_EL12); break;
case TCR_EL1: write_sysreg_s(val, SYS_TCR_EL12); break;
case ESR_EL1: write_sysreg_s(val, SYS_ESR_EL12); break;
case AFSR0_EL1: write_sysreg_s(val, SYS_AFSR0_EL12); break;
case AFSR1_EL1: write_sysreg_s(val, SYS_AFSR1_EL12); break;
case FAR_EL1: write_sysreg_s(val, SYS_FAR_EL12); break;
case MAIR_EL1: write_sysreg_s(val, SYS_MAIR_EL12); break;
case VBAR_EL1: write_sysreg_s(val, SYS_VBAR_EL12); break;
case CONTEXTIDR_EL1: write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
case TPIDR_EL0: write_sysreg_s(val, SYS_TPIDR_EL0); break;
case TPIDRRO_EL0: write_sysreg_s(val, SYS_TPIDRRO_EL0); break;
case TPIDR_EL1: write_sysreg_s(val, SYS_TPIDR_EL1); break;
case AMAIR_EL1: write_sysreg_s(val, SYS_AMAIR_EL12); break;
case CNTKCTL_EL1: write_sysreg_s(val, SYS_CNTKCTL_EL12); break;
case ELR_EL1: write_sysreg_s(val, SYS_ELR_EL12); break;
case PAR_EL1: write_sysreg_s(val, SYS_PAR_EL1); break;
case DACR32_EL2: write_sysreg_s(val, SYS_DACR32_EL2); break;
case IFSR32_EL2: write_sysreg_s(val, SYS_IFSR32_EL2); break;
case DBGVCR32_EL2: write_sysreg_s(val, SYS_DBGVCR32_EL2); break;
default: return false;
}
return true;
}
struct kvm_vm_stat {
ulong remote_tlb_flush;
@ -473,6 +547,12 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
unsigned long kvm_arm_num_sys_reg_descs(struct kvm_vcpu *vcpu);
int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
struct kvm_vcpu_events *events);
@ -535,6 +615,17 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
int handle_exit(struct kvm_vcpu *vcpu, int exception_index);
void handle_exit_early(struct kvm_vcpu *vcpu, int exception_index);
int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu);
int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
void kvm_sys_reg_table_init(void);
/* MMIO helpers */
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
@ -654,4 +745,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
#define kvm_arm_vcpu_sve_finalized(vcpu) \
((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
#define kvm_vcpu_has_pmu(vcpu) \
(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
#endif /* __ARM64_KVM_HOST_H__ */

View file

@ -14,6 +14,7 @@
DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DECLARE_PER_CPU(unsigned long, kvm_hyp_vector);
DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
#define read_sysreg_elx(r,nvh,vh) \
({ \
@ -92,10 +93,11 @@ void deactivate_traps_vhe_put(void);
u64 __guest_enter(struct kvm_vcpu *vcpu);
bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt);
void __noreturn hyp_panic(void);
#ifdef __KVM_NVHE_HYPERVISOR__
void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
#endif
#endif /* __ARM64_KVM_HYP_H__ */

View file

@ -72,6 +72,52 @@ alternative_cb kvm_update_va_mask
alternative_cb_end
.endm
/*
* Convert a kernel image address to a PA
* reg: kernel address to be converted in place
* tmp: temporary register
*
* The actual code generation takes place in kvm_get_kimage_voffset, and
* the instructions below are only there to reserve the space and
* perform the register allocation (kvm_get_kimage_voffset uses the
* specific registers encoded in the instructions).
*/
.macro kimg_pa reg, tmp
alternative_cb kvm_get_kimage_voffset
movz \tmp, #0
movk \tmp, #0, lsl #16
movk \tmp, #0, lsl #32
movk \tmp, #0, lsl #48
alternative_cb_end
/* reg = __pa(reg) */
sub \reg, \reg, \tmp
.endm
/*
* Convert a kernel image address to a hyp VA
* reg: kernel address to be converted in place
* tmp: temporary register
*
* The actual code generation takes place in kvm_get_kimage_voffset, and
* the instructions below are only there to reserve the space and
* perform the register allocation (kvm_update_kimg_phys_offset uses the
* specific registers encoded in the instructions).
*/
.macro kimg_hyp_va reg, tmp
alternative_cb kvm_update_kimg_phys_offset
movz \tmp, #0
movk \tmp, #0, lsl #16
movk \tmp, #0, lsl #32
movk \tmp, #0, lsl #48
alternative_cb_end
sub \reg, \reg, \tmp
mov_q \tmp, PAGE_OFFSET
orr \reg, \reg, \tmp
kern_hyp_va \reg
.endm
#else
#include <linux/pgtable.h>
@ -98,6 +144,24 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
#define kern_hyp_va(v) ((typeof(v))(__kern_hyp_va((unsigned long)(v))))
static __always_inline unsigned long __kimg_hyp_va(unsigned long v)
{
unsigned long offset;
asm volatile(ALTERNATIVE_CB("movz %0, #0\n"
"movk %0, #0, lsl #16\n"
"movk %0, #0, lsl #32\n"
"movk %0, #0, lsl #48\n",
kvm_update_kimg_phys_offset)
: "=r" (offset));
return __kern_hyp_va((v - offset) | PAGE_OFFSET);
}
#define kimg_fn_hyp_va(v) ((typeof(*v))(__kimg_hyp_va((unsigned long)(v))))
#define kimg_fn_ptr(x) (typeof(x) **)(x)
/*
* We currently support using a VM-specified IPA size. For backward
* compatibility, the default IPA size is fixed to 40bits.
@ -208,52 +272,6 @@ static inline int kvm_write_guest_lock(struct kvm *kvm, gpa_t gpa,
return ret;
}
/*
* EL2 vectors can be mapped and rerouted in a number of ways,
* depending on the kernel configuration and CPU present:
*
* - If the CPU is affected by Spectre-v2, the hardening sequence is
* placed in one of the vector slots, which is executed before jumping
* to the real vectors.
*
* - If the CPU also has the ARM64_HARDEN_EL2_VECTORS cap, the slot
* containing the hardening sequence is mapped next to the idmap page,
* and executed before jumping to the real vectors.
*
* - If the CPU only has the ARM64_HARDEN_EL2_VECTORS cap, then an
* empty slot is selected, mapped next to the idmap page, and
* executed before jumping to the real vectors.
*
* Note that ARM64_HARDEN_EL2_VECTORS is somewhat incompatible with
* VHE, as we don't have hypervisor-specific mappings. If the system
* is VHE and yet selects this capability, it will be ignored.
*/
extern void *__kvm_bp_vect_base;
extern int __kvm_harden_el2_vector_slot;
static inline void *kvm_get_hyp_vector(void)
{
struct bp_hardening_data *data = arm64_get_bp_hardening_data();
void *vect = kern_hyp_va(kvm_ksym_ref(__kvm_hyp_vector));
int slot = -1;
if (cpus_have_const_cap(ARM64_SPECTRE_V2) && data->fn) {
vect = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs));
slot = data->hyp_vectors_slot;
}
if (this_cpu_has_cap(ARM64_HARDEN_EL2_VECTORS) && !has_vhe()) {
vect = __kvm_bp_vect_base;
if (slot == -1)
slot = __kvm_harden_el2_vector_slot;
}
if (slot != -1)
vect += slot * SZ_2K;
return vect;
}
#define kvm_phys_to_vttbr(addr) phys_to_ttbr(addr)
static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)

View file

@ -12,9 +12,6 @@
#define USER_ASID_FLAG (UL(1) << USER_ASID_BIT)
#define TTBR_ASID_MASK (UL(0xffff) << 48)
#define BP_HARDEN_EL2_SLOTS 4
#define __BP_HARDEN_HYP_VECS_SZ (BP_HARDEN_EL2_SLOTS * SZ_2K)
#ifndef __ASSEMBLY__
#include <linux/refcount.h>
@ -41,32 +38,6 @@ static inline bool arm64_kernel_unmapped_at_el0(void)
return cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0);
}
typedef void (*bp_hardening_cb_t)(void);
struct bp_hardening_data {
int hyp_vectors_slot;
bp_hardening_cb_t fn;
};
DECLARE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
static inline struct bp_hardening_data *arm64_get_bp_hardening_data(void)
{
return this_cpu_ptr(&bp_hardening_data);
}
static inline void arm64_apply_bp_hardening(void)
{
struct bp_hardening_data *d;
if (!cpus_have_const_cap(ARM64_SPECTRE_V2))
return;
d = arm64_get_bp_hardening_data();
if (d->fn)
d->fn();
}
extern void arm64_memblock_init(void);
extern void paging_init(void);
extern void bootmem_init(void);

View file

@ -239,6 +239,12 @@ PERCPU_RET_OP(add, add, ldadd)
#define this_cpu_cmpxchg_8(pcp, o, n) \
_pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
#ifdef __KVM_NVHE_HYPERVISOR__
extern unsigned long __hyp_per_cpu_offset(unsigned int cpu);
#define __per_cpu_offset
#define per_cpu_offset(cpu) __hyp_per_cpu_offset((cpu))
#endif
#include <asm-generic/percpu.h>
/* Redefine macros for nVHE hyp under DEBUG_PREEMPT to avoid its dependencies. */

View file

@ -11,6 +11,7 @@ extern char __alt_instructions[], __alt_instructions_end[];
extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
extern char __hyp_text_start[], __hyp_text_end[];
extern char __hyp_data_ro_after_init_start[], __hyp_data_ro_after_init_end[];
extern char __idmap_text_start[], __idmap_text_end[];
extern char __initdata_begin[], __initdata_end[];
extern char __inittext_begin[], __inittext_end[];

View file

@ -46,9 +46,9 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
* Logical CPU mapping.
*/
extern u64 __cpu_logical_map[NR_CPUS];
extern u64 cpu_logical_map(int cpu);
extern u64 cpu_logical_map(unsigned int cpu);
static inline void set_cpu_logical_map(int cpu, u64 hwid)
static inline void set_cpu_logical_map(unsigned int cpu, u64 hwid)
{
__cpu_logical_map[cpu] = hwid;
}

View file

@ -9,7 +9,15 @@
#ifndef __ASM_SPECTRE_H
#define __ASM_SPECTRE_H
#define BP_HARDEN_EL2_SLOTS 4
#define __BP_HARDEN_HYP_VECS_SZ ((BP_HARDEN_EL2_SLOTS - 1) * SZ_2K)
#ifndef __ASSEMBLY__
#include <linux/percpu.h>
#include <asm/cpufeature.h>
#include <asm/virt.h>
/* Watch out, ordering is important here. */
enum mitigation_state {
@ -20,13 +28,70 @@ enum mitigation_state {
struct task_struct;
/*
* Note: the order of this enum corresponds to __bp_harden_hyp_vecs and
* we rely on having the direct vectors first.
*/
enum arm64_hyp_spectre_vector {
/*
* Take exceptions directly to __kvm_hyp_vector. This must be
* 0 so that it used by default when mitigations are not needed.
*/
HYP_VECTOR_DIRECT,
/*
* Bounce via a slot in the hypervisor text mapping of
* __bp_harden_hyp_vecs, which contains an SMC call.
*/
HYP_VECTOR_SPECTRE_DIRECT,
/*
* Bounce via a slot in a special mapping of __bp_harden_hyp_vecs
* next to the idmap page.
*/
HYP_VECTOR_INDIRECT,
/*
* Bounce via a slot in a special mapping of __bp_harden_hyp_vecs
* next to the idmap page, which contains an SMC call.
*/
HYP_VECTOR_SPECTRE_INDIRECT,
};
typedef void (*bp_hardening_cb_t)(void);
struct bp_hardening_data {
enum arm64_hyp_spectre_vector slot;
bp_hardening_cb_t fn;
};
DECLARE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
static inline void arm64_apply_bp_hardening(void)
{
struct bp_hardening_data *d;
if (!cpus_have_const_cap(ARM64_SPECTRE_V2))
return;
d = this_cpu_ptr(&bp_hardening_data);
if (d->fn)
d->fn();
}
enum mitigation_state arm64_get_spectre_v2_state(void);
bool has_spectre_v2(const struct arm64_cpu_capabilities *cap, int scope);
void spectre_v2_enable_mitigation(const struct arm64_cpu_capabilities *__unused);
bool has_spectre_v3a(const struct arm64_cpu_capabilities *cap, int scope);
void spectre_v3a_enable_mitigation(const struct arm64_cpu_capabilities *__unused);
enum mitigation_state arm64_get_spectre_v4_state(void);
bool has_spectre_v4(const struct arm64_cpu_capabilities *cap, int scope);
void spectre_v4_enable_mitigation(const struct arm64_cpu_capabilities *__unused);
void spectre_v4_enable_task_mitigation(struct task_struct *tsk);
enum mitigation_state arm64_get_meltdown_state(void);
#endif /* __ASSEMBLY__ */
#endif /* __ASM_SPECTRE_H */

View file

@ -469,6 +469,7 @@
#define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7)
#define SYS_SCTLR_EL2 sys_reg(3, 4, 1, 0, 0)
#define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0)
#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)

View file

@ -65,9 +65,19 @@ extern u32 __boot_cpu_mode[2];
void __hyp_set_vectors(phys_addr_t phys_vector_base);
void __hyp_reset_vectors(void);
DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
/*
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
return true;
return (__boot_cpu_mode[0] == BOOT_CPU_MODE_EL2 &&
__boot_cpu_mode[1] == BOOT_CPU_MODE_EL2);
}
@ -75,6 +85,14 @@ static inline bool is_hyp_mode_available(void)
/* Check if the bootloader has booted CPUs in different modes */
static inline bool is_hyp_mode_mismatched(void)
{
/*
* If KVM protected mode is initialized, all CPUs must have been booted
* in EL2. Avoid checking __boot_cpu_mode as CPUs now come up in EL1.
*/
if (IS_ENABLED(CONFIG_KVM) &&
static_branch_likely(&kvm_protected_mode_initialized))
return false;
return __boot_cpu_mode[0] != __boot_cpu_mode[1];
}
@ -97,6 +115,14 @@ static __always_inline bool has_vhe(void)
return cpus_have_final_cap(ARM64_HAS_VIRT_HOST_EXTN);
}
static __always_inline bool is_protected_kvm_enabled(void)
{
if (is_vhe_hyp_code())
return false;
else
return cpus_have_final_cap(ARM64_KVM_PROTECTED_MODE);
}
#endif /* __ASSEMBLY__ */
#endif /* ! __ASM__VIRT_H */

View file

@ -156,9 +156,6 @@ struct kvm_sync_regs {
__u64 device_irq_level;
};
struct kvm_arch_memory_slot {
};
/*
* PMU filter structure. Describe a range of events with a particular
* action. To be used with KVM_ARM_VCPU_PMU_V3_FILTER.

View file

@ -109,6 +109,11 @@ int main(void)
DEFINE(CPU_APGAKEYLO_EL1, offsetof(struct kvm_cpu_context, sys_regs[APGAKEYLO_EL1]));
DEFINE(HOST_CONTEXT_VCPU, offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
DEFINE(HOST_DATA_CONTEXT, offsetof(struct kvm_host_data, host_ctxt));
DEFINE(NVHE_INIT_MAIR_EL2, offsetof(struct kvm_nvhe_init_params, mair_el2));
DEFINE(NVHE_INIT_TCR_EL2, offsetof(struct kvm_nvhe_init_params, tcr_el2));
DEFINE(NVHE_INIT_TPIDR_EL2, offsetof(struct kvm_nvhe_init_params, tpidr_el2));
DEFINE(NVHE_INIT_STACK_HYP_VA, offsetof(struct kvm_nvhe_init_params, stack_hyp_va));
DEFINE(NVHE_INIT_PGD_PA, offsetof(struct kvm_nvhe_init_params, pgd_pa));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_CTX_SP, offsetof(struct cpu_suspend_ctx, sp));

View file

@ -196,16 +196,6 @@ has_neoverse_n1_erratum_1542419(const struct arm64_cpu_capabilities *entry,
return is_midr_in_range(midr, &range) && has_dic;
}
#ifdef CONFIG_RANDOMIZE_BASE
static const struct midr_range ca57_a72[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
{},
};
#endif
#ifdef CONFIG_ARM64_WORKAROUND_REPEAT_TLBI
static const struct arm64_cpu_capabilities arm64_repeat_tlbi_list[] = {
#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1009
@ -461,9 +451,12 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
},
#ifdef CONFIG_RANDOMIZE_BASE
{
.desc = "EL2 vector hardening",
.capability = ARM64_HARDEN_EL2_VECTORS,
ERRATA_MIDR_RANGE_LIST(ca57_a72),
/* Must come after the Spectre-v2 entry */
.desc = "Spectre-v3a",
.capability = ARM64_SPECTRE_V3A,
.type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
.matches = has_spectre_v3a,
.cpu_enable = spectre_v3a_enable_mitigation,
},
#endif
{

View file

@ -74,6 +74,7 @@
#include <asm/cpufeature.h>
#include <asm/cpu_ops.h>
#include <asm/fpsimd.h>
#include <asm/kvm_host.h>
#include <asm/mmu_context.h>
#include <asm/mte.h>
#include <asm/processor.h>
@ -1712,6 +1713,21 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
}
#endif /* CONFIG_ARM64_MTE */
#ifdef CONFIG_KVM
static bool is_kvm_protected_mode(const struct arm64_cpu_capabilities *entry, int __unused)
{
if (kvm_get_mode() != KVM_MODE_PROTECTED)
return false;
if (is_kernel_in_hyp_mode()) {
pr_warn("Protected KVM not available with VHE\n");
return false;
}
return true;
}
#endif /* CONFIG_KVM */
/* Internal helper functions to match cpu capability type */
static bool
cpucap_late_cpu_optional(const struct arm64_cpu_capabilities *cap)
@ -1803,6 +1819,12 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.field_pos = ID_AA64PFR0_EL1_SHIFT,
.min_field_value = ID_AA64PFR0_EL1_32BIT_64BIT,
},
{
.desc = "Protected KVM",
.capability = ARM64_KVM_PROTECTED_MODE,
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.matches = is_kvm_protected_mode,
},
#endif
{
.desc = "Kernel page table isolation (KPTI)",
@ -2831,14 +2853,28 @@ static int __init enable_mrs_emulation(void)
core_initcall(enable_mrs_emulation);
enum mitigation_state arm64_get_meltdown_state(void)
{
if (__meltdown_safe)
return SPECTRE_UNAFFECTED;
if (arm64_kernel_unmapped_at_el0())
return SPECTRE_MITIGATED;
return SPECTRE_VULNERABLE;
}
ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr,
char *buf)
{
if (__meltdown_safe)
switch (arm64_get_meltdown_state()) {
case SPECTRE_UNAFFECTED:
return sprintf(buf, "Not affected\n");
if (arm64_kernel_unmapped_at_el0())
case SPECTRE_MITIGATED:
return sprintf(buf, "Mitigation: PTI\n");
default:
return sprintf(buf, "Vulnerable\n");
}
}

View file

@ -11,7 +11,6 @@
#include <linux/linkage.h>
#include <linux/init.h>
#include <linux/irqchip/arm-gic-v3.h>
#include <linux/pgtable.h>
#include <asm/asm_pointer_auth.h>
@ -21,6 +20,7 @@
#include <asm/asm-offsets.h>
#include <asm/cache.h>
#include <asm/cputype.h>
#include <asm/el2_setup.h>
#include <asm/elf.h>
#include <asm/image.h>
#include <asm/kernel-pgtable.h>
@ -493,155 +493,56 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
eret
SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
mov_q x0, INIT_SCTLR_EL2_MMU_OFF
msr sctlr_el2, x0
#ifdef CONFIG_ARM64_VHE
/*
* Check for VHE being present. For the rest of the EL2 setup,
* x2 being non-zero indicates that we do have VHE, and that the
* kernel is intended to run at EL2.
* Check for VHE being present. x2 being non-zero indicates that we
* do have VHE, and that the kernel is intended to run at EL2.
*/
mrs x2, id_aa64mmfr1_el1
ubfx x2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
#else
mov x2, xzr
#endif
cbz x2, init_el2_nvhe
/* Hyp configuration. */
mov_q x0, HCR_HOST_NVHE_FLAGS
cbz x2, set_hcr
/*
* When VHE _is_ in use, EL1 will not be used in the host and
* requires no configuration, and all non-hyp-specific EL2 setup
* will be done via the _EL1 system register aliases in __cpu_setup.
*/
mov_q x0, HCR_HOST_VHE_FLAGS
set_hcr:
msr hcr_el2, x0
isb
/*
* Allow Non-secure EL1 and EL0 to access physical timer and counter.
* This is not necessary for VHE, since the host kernel runs in EL2,
* and EL0 accesses are configured in the later stage of boot process.
* Note that when HCR_EL2.E2H == 1, CNTHCTL_EL2 has the same bit layout
* as CNTKCTL_EL1, and CNTKCTL_EL1 accessing instructions are redefined
* to access CNTHCTL_EL2. This allows the kernel designed to run at EL1
* to transparently mess with the EL0 bits via CNTKCTL_EL1 access in
* EL2.
*/
cbnz x2, 1f
mrs x0, cnthctl_el2
orr x0, x0, #3 // Enable EL1 physical timers
msr cnthctl_el2, x0
1:
msr cntvoff_el2, xzr // Clear virtual offset
#ifdef CONFIG_ARM_GIC_V3
/* GICv3 system register access */
mrs x0, id_aa64pfr0_el1
ubfx x0, x0, #ID_AA64PFR0_GIC_SHIFT, #4
cbz x0, 3f
mrs_s x0, SYS_ICC_SRE_EL2
orr x0, x0, #ICC_SRE_EL2_SRE // Set ICC_SRE_EL2.SRE==1
orr x0, x0, #ICC_SRE_EL2_ENABLE // Set ICC_SRE_EL2.Enable==1
msr_s SYS_ICC_SRE_EL2, x0
isb // Make sure SRE is now set
mrs_s x0, SYS_ICC_SRE_EL2 // Read SRE back,
tbz x0, #0, 3f // and check that it sticks
msr_s SYS_ICH_HCR_EL2, xzr // Reset ICC_HCR_EL2 to defaults
3:
#endif
/* Populate ID registers. */
mrs x0, midr_el1
mrs x1, mpidr_el1
msr vpidr_el2, x0
msr vmpidr_el2, x1
#ifdef CONFIG_COMPAT
msr hstr_el2, xzr // Disable CP15 traps to EL2
#endif
/* EL2 debug */
mrs x1, id_aa64dfr0_el1
sbfx x0, x1, #ID_AA64DFR0_PMUVER_SHIFT, #4
cmp x0, #1
b.lt 4f // Skip if no PMU present
mrs x0, pmcr_el0 // Disable debug access traps
ubfx x0, x0, #11, #5 // to EL2 and allow access to
4:
csel x3, xzr, x0, lt // all PMU counters from EL1
/* Statistical profiling */
ubfx x0, x1, #ID_AA64DFR0_PMSVER_SHIFT, #4
cbz x0, 7f // Skip if SPE not present
cbnz x2, 6f // VHE?
mrs_s x4, SYS_PMBIDR_EL1 // If SPE available at EL2,
and x4, x4, #(1 << SYS_PMBIDR_EL1_P_SHIFT)
cbnz x4, 5f // then permit sampling of physical
mov x4, #(1 << SYS_PMSCR_EL2_PCT_SHIFT | \
1 << SYS_PMSCR_EL2_PA_SHIFT)
msr_s SYS_PMSCR_EL2, x4 // addresses and physical counter
5:
mov x1, #(MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT)
orr x3, x3, x1 // If we don't have VHE, then
b 7f // use EL1&0 translation.
6: // For VHE, use EL2 translation
orr x3, x3, #MDCR_EL2_TPMS // and disable access from EL1
7:
msr mdcr_el2, x3 // Configure debug traps
/* LORegions */
mrs x1, id_aa64mmfr1_el1
ubfx x0, x1, #ID_AA64MMFR1_LOR_SHIFT, 4
cbz x0, 1f
msr_s SYS_LORC_EL1, xzr
1:
/* Stage-2 translation */
msr vttbr_el2, xzr
cbz x2, install_el2_stub
init_el2_state vhe
isb
mov_q x0, INIT_PSTATE_EL2
msr spsr_el2, x0
msr elr_el2, lr
mov w0, #BOOT_CPU_MODE_EL2
eret
SYM_INNER_LABEL(install_el2_stub, SYM_L_LOCAL)
SYM_INNER_LABEL(init_el2_nvhe, SYM_L_LOCAL)
/*
* When VHE is not in use, early init of EL2 and EL1 needs to be
* done here.
* When VHE _is_ in use, EL1 will not be used in the host and
* requires no configuration, and all non-hyp-specific EL2 setup
* will be done via the _EL1 system register aliases in __cpu_setup.
*/
mov_q x0, INIT_SCTLR_EL1_MMU_OFF
msr sctlr_el1, x0
/* Coprocessor traps. */
mov x0, #0x33ff
msr cptr_el2, x0 // Disable copro. traps to EL2
/* SVE register access */
mrs x1, id_aa64pfr0_el1
ubfx x1, x1, #ID_AA64PFR0_SVE_SHIFT, #4
cbz x1, 7f
bic x0, x0, #CPTR_EL2_TZ // Also disable SVE traps
msr cptr_el2, x0 // Disable copro. traps to EL2
mov_q x0, HCR_HOST_NVHE_FLAGS
msr hcr_el2, x0
isb
mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
msr_s SYS_ZCR_EL2, x1 // length for EL1.
init_el2_state nvhe
/* Hypervisor stub */
7: adr_l x0, __hyp_stub_vectors
adr_l x0, __hyp_stub_vectors
msr vbar_el2, x0
isb
mov x0, #INIT_PSTATE_EL1
msr spsr_el2, x0
msr elr_el2, lr
mov w0, #BOOT_CPU_MODE_EL2
eret

View file

@ -64,13 +64,12 @@ __efistub__ctype = _ctype;
/* Alternative callbacks for init-time patching of nVHE hyp code. */
KVM_NVHE_ALIAS(kvm_patch_vector_branch);
KVM_NVHE_ALIAS(kvm_update_va_mask);
KVM_NVHE_ALIAS(kvm_update_kimg_phys_offset);
KVM_NVHE_ALIAS(kvm_get_kimage_voffset);
/* Global kernel state accessed by nVHE hyp code. */
KVM_NVHE_ALIAS(kvm_vgic_global_state);
/* Kernel constant needed to compute idmap addresses. */
KVM_NVHE_ALIAS(kimage_voffset);
/* Kernel symbols used to call panic() from nVHE hyp code (via ERET). */
KVM_NVHE_ALIAS(__hyp_panic_string);
KVM_NVHE_ALIAS(panic);
@ -78,9 +77,6 @@ KVM_NVHE_ALIAS(panic);
/* Vectors installed by hyp-init on reset HVC. */
KVM_NVHE_ALIAS(__hyp_stub_vectors);
/* IDMAP TCR_EL1.T0SZ as computed by the EL1 init code */
KVM_NVHE_ALIAS(idmap_t0sz);
/* Kernel symbol used by icache_is_vpipt(). */
KVM_NVHE_ALIAS(__icache_flags);
@ -103,6 +99,9 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
KVM_NVHE_ALIAS(__start___kvm_ex_table);
KVM_NVHE_ALIAS(__stop___kvm_ex_table);
/* Array containing bases of nVHE per-CPU memory regions. */
KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
#endif /* CONFIG_KVM */
#endif /* __ARM64_KERNEL_IMAGE_VARS_H */

View file

@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Handle detection, reporting and mitigation of Spectre v1, v2 and v4, as
* Handle detection, reporting and mitigation of Spectre v1, v2, v3a and v4, as
* detailed at:
*
* https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability
@ -27,6 +27,7 @@
#include <asm/insn.h>
#include <asm/spectre.h>
#include <asm/traps.h>
#include <asm/virt.h>
/*
* We try to ensure that the mitigation state can never change as the result of
@ -171,72 +172,26 @@ bool has_spectre_v2(const struct arm64_cpu_capabilities *entry, int scope)
return true;
}
DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
enum mitigation_state arm64_get_spectre_v2_state(void)
{
return spectre_v2_state;
}
#ifdef CONFIG_KVM
#include <asm/cacheflush.h>
#include <asm/kvm_asm.h>
atomic_t arm64_el2_vector_last_slot = ATOMIC_INIT(-1);
static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
const char *hyp_vecs_end)
{
void *dst = lm_alias(__bp_harden_hyp_vecs + slot * SZ_2K);
int i;
for (i = 0; i < SZ_2K; i += 0x80)
memcpy(dst + i, hyp_vecs_start, hyp_vecs_end - hyp_vecs_start);
__flush_icache_range((uintptr_t)dst, (uintptr_t)dst + SZ_2K);
}
DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, bp_hardening_data);
static void install_bp_hardening_cb(bp_hardening_cb_t fn)
{
static DEFINE_RAW_SPINLOCK(bp_lock);
int cpu, slot = -1;
const char *hyp_vecs_start = __smccc_workaround_1_smc;
const char *hyp_vecs_end = __smccc_workaround_1_smc +
__SMCCC_WORKAROUND_1_SMC_SZ;
__this_cpu_write(bp_hardening_data.fn, fn);
/*
* Vinz Clortho takes the hyp_vecs start/end "keys" at
* the door when we're a guest. Skip the hyp-vectors work.
*/
if (!is_hyp_mode_available()) {
__this_cpu_write(bp_hardening_data.fn, fn);
if (!is_hyp_mode_available())
return;
}
raw_spin_lock(&bp_lock);
for_each_possible_cpu(cpu) {
if (per_cpu(bp_hardening_data.fn, cpu) == fn) {
slot = per_cpu(bp_hardening_data.hyp_vectors_slot, cpu);
break;
}
}
if (slot == -1) {
slot = atomic_inc_return(&arm64_el2_vector_last_slot);
BUG_ON(slot >= BP_HARDEN_EL2_SLOTS);
__copy_hyp_vect_bpi(slot, hyp_vecs_start, hyp_vecs_end);
}
__this_cpu_write(bp_hardening_data.hyp_vectors_slot, slot);
__this_cpu_write(bp_hardening_data.fn, fn);
raw_spin_unlock(&bp_lock);
__this_cpu_write(bp_hardening_data.slot, HYP_VECTOR_SPECTRE_DIRECT);
}
#else
static void install_bp_hardening_cb(bp_hardening_cb_t fn)
{
__this_cpu_write(bp_hardening_data.fn, fn);
}
#endif /* CONFIG_KVM */
static void call_smc_arch_workaround_1(void)
{
@ -317,6 +272,33 @@ void spectre_v2_enable_mitigation(const struct arm64_cpu_capabilities *__unused)
update_mitigation_state(&spectre_v2_state, state);
}
/*
* Spectre-v3a.
*
* Phew, there's not an awful lot to do here! We just instruct EL2 to use
* an indirect trampoline for the hyp vectors so that guests can't read
* VBAR_EL2 to defeat randomisation of the hypervisor VA layout.
*/
bool has_spectre_v3a(const struct arm64_cpu_capabilities *entry, int scope)
{
static const struct midr_range spectre_v3a_unsafe_list[] = {
MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
{},
};
WARN_ON(scope != SCOPE_LOCAL_CPU || preemptible());
return is_midr_in_range_list(read_cpuid_id(), spectre_v3a_unsafe_list);
}
void spectre_v3a_enable_mitigation(const struct arm64_cpu_capabilities *__unused)
{
struct bp_hardening_data *data = this_cpu_ptr(&bp_hardening_data);
if (this_cpu_has_cap(ARM64_SPECTRE_V3A))
data->slot += HYP_VECTOR_INDIRECT;
}
/*
* Spectre v4.
*

View file

@ -276,7 +276,7 @@ arch_initcall(reserve_memblock_reserved_regions);
u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
u64 cpu_logical_map(int cpu)
u64 cpu_logical_map(unsigned int cpu)
{
return __cpu_logical_map[cpu];
}

View file

@ -30,6 +30,13 @@ jiffies = jiffies_64;
*(__kvm_ex_table) \
__stop___kvm_ex_table = .;
#define HYPERVISOR_DATA_SECTIONS \
HYP_SECTION_NAME(.data..ro_after_init) : { \
__hyp_data_ro_after_init_start = .; \
*(HYP_SECTION_NAME(.data..ro_after_init)) \
__hyp_data_ro_after_init_end = .; \
}
#define HYPERVISOR_PERCPU_SECTION \
. = ALIGN(PAGE_SIZE); \
HYP_SECTION_NAME(.data..percpu) : { \
@ -37,6 +44,7 @@ jiffies = jiffies_64;
}
#else /* CONFIG_KVM */
#define HYPERVISOR_EXTABLE
#define HYPERVISOR_DATA_SECTIONS
#define HYPERVISOR_PERCPU_SECTION
#endif
@ -232,6 +240,8 @@ SECTIONS
_sdata = .;
RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
HYPERVISOR_DATA_SECTIONS
/*
* Data written with the MMU off but read with the MMU on requires
* cache lines to be invalidated, discarding up to a Cache Writeback

View file

@ -13,10 +13,10 @@ obj-$(CONFIG_KVM) += hyp/
kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
$(KVM)/vfio.o $(KVM)/irqchip.o \
arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
inject_fault.o regmap.o va_layout.o handle_exit.o \
inject_fault.o va_layout.o handle_exit.o \
guest.o debug.o reset.o sys_regs.o \
vgic-sys-reg-v3.o fpsimd.o pmu.o \
aarch32.o arch_timer.o \
arch_timer.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
vgic/vgic-v3.o vgic/vgic-v4.o \

View file

@ -1,232 +0,0 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* (not much of an) Emulation layer for 32bit guests.
*
* Copyright (C) 2012,2013 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
*
* based on arch/arm/kvm/emulate.c
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
*/
#include <linux/bits.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
#define DFSR_FSC_EXTABT_LPAE 0x10
#define DFSR_FSC_EXTABT_nLPAE 0x08
#define DFSR_LPAE BIT(9)
/*
* Table taken from ARMv8 ARM DDI0487B-B, table G1-10.
*/
static const u8 return_offsets[8][2] = {
[0] = { 0, 0 }, /* Reset, unused */
[1] = { 4, 2 }, /* Undefined */
[2] = { 0, 0 }, /* SVC, unused */
[3] = { 4, 4 }, /* Prefetch abort */
[4] = { 8, 8 }, /* Data abort */
[5] = { 0, 0 }, /* HVC, unused */
[6] = { 4, 4 }, /* IRQ, unused */
[7] = { 4, 4 }, /* FIQ, unused */
};
static bool pre_fault_synchronize(struct kvm_vcpu *vcpu)
{
preempt_disable();
if (vcpu->arch.sysregs_loaded_on_cpu) {
kvm_arch_vcpu_put(vcpu);
return true;
}
preempt_enable();
return false;
}
static void post_fault_synchronize(struct kvm_vcpu *vcpu, bool loaded)
{
if (loaded) {
kvm_arch_vcpu_load(vcpu, smp_processor_id());
preempt_enable();
}
}
/*
* When an exception is taken, most CPSR fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]).
*
* The SPSR/SPSR_ELx layouts differ, and the below is intended to work with
* either format. Note: SPSR.J bit doesn't exist in SPSR_ELx, but this bit was
* obsoleted by the ARMv7 virtualization extensions and is RES0.
*
* For the SPSR layout seen from AArch32, see:
* - ARM DDI 0406C.d, page B1-1148
* - ARM DDI 0487E.a, page G8-6264
*
* For the SPSR_ELx layout for AArch32 seen from AArch64, see:
* - ARM DDI 0487E.a, page C5-426
*
* Here we manipulate the fields in order of the AArch32 SPSR_ELx layout, from
* MSB to LSB.
*/
static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
{
u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
unsigned long old, new;
old = *vcpu_cpsr(vcpu);
new = 0;
new |= (old & PSR_AA32_N_BIT);
new |= (old & PSR_AA32_Z_BIT);
new |= (old & PSR_AA32_C_BIT);
new |= (old & PSR_AA32_V_BIT);
new |= (old & PSR_AA32_Q_BIT);
// CPSR.IT[7:0] are set to zero upon any exception
// See ARM DDI 0487E.a, section G1.12.3
// See ARM DDI 0406C.d, section B1.8.3
new |= (old & PSR_AA32_DIT_BIT);
// CPSR.SSBS is set to SCTLR.DSSBS upon any exception
// See ARM DDI 0487E.a, page G8-6244
if (sctlr & BIT(31))
new |= PSR_AA32_SSBS_BIT;
// CPSR.PAN is unchanged unless SCTLR.SPAN == 0b0
// SCTLR.SPAN is RES1 when ARMv8.1-PAN is not implemented
// See ARM DDI 0487E.a, page G8-6246
new |= (old & PSR_AA32_PAN_BIT);
if (!(sctlr & BIT(23)))
new |= PSR_AA32_PAN_BIT;
// SS does not exist in AArch32, so ignore
// CPSR.IL is set to zero upon any exception
// See ARM DDI 0487E.a, page G1-5527
new |= (old & PSR_AA32_GE_MASK);
// CPSR.IT[7:0] are set to zero upon any exception
// See prior comment above
// CPSR.E is set to SCTLR.EE upon any exception
// See ARM DDI 0487E.a, page G8-6245
// See ARM DDI 0406C.d, page B4-1701
if (sctlr & BIT(25))
new |= PSR_AA32_E_BIT;
// CPSR.A is unchanged upon an exception to Undefined, Supervisor
// CPSR.A is set upon an exception to other modes
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= (old & PSR_AA32_A_BIT);
if (mode != PSR_AA32_MODE_UND && mode != PSR_AA32_MODE_SVC)
new |= PSR_AA32_A_BIT;
// CPSR.I is set upon any exception
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= PSR_AA32_I_BIT;
// CPSR.F is set upon an exception to FIQ
// CPSR.F is unchanged upon an exception to other modes
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= (old & PSR_AA32_F_BIT);
if (mode == PSR_AA32_MODE_FIQ)
new |= PSR_AA32_F_BIT;
// CPSR.T is set to SCTLR.TE upon any exception
// See ARM DDI 0487E.a, page G8-5514
// See ARM DDI 0406C.d, page B1-1181
if (sctlr & BIT(30))
new |= PSR_AA32_T_BIT;
new |= mode;
return new;
}
static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
unsigned long spsr = *vcpu_cpsr(vcpu);
bool is_thumb = (spsr & PSR_AA32_T_BIT);
u32 return_offset = return_offsets[vect_offset >> 2][is_thumb];
u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR);
*vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
/* Note: These now point to the banked copies */
vcpu_write_spsr(vcpu, host_spsr_to_spsr32(spsr));
*vcpu_reg32(vcpu, 14) = *vcpu_pc(vcpu) + return_offset;
/* Branch to exception vector */
if (sctlr & (1 << 13))
vect_offset += 0xffff0000;
else /* always have security exceptions */
vect_offset += vcpu_cp15(vcpu, c12_VBAR);
*vcpu_pc(vcpu) = vect_offset;
}
void kvm_inject_undef32(struct kvm_vcpu *vcpu)
{
bool loaded = pre_fault_synchronize(vcpu);
prepare_fault32(vcpu, PSR_AA32_MODE_UND, 4);
post_fault_synchronize(vcpu, loaded);
}
/*
* Modelled after TakeDataAbortException() and TakePrefetchAbortException
* pseudocode.
*/
static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
unsigned long addr)
{
u32 vect_offset;
u32 *far, *fsr;
bool is_lpae;
bool loaded;
loaded = pre_fault_synchronize(vcpu);
if (is_pabt) {
vect_offset = 12;
far = &vcpu_cp15(vcpu, c6_IFAR);
fsr = &vcpu_cp15(vcpu, c5_IFSR);
} else { /* !iabt */
vect_offset = 16;
far = &vcpu_cp15(vcpu, c6_DFAR);
fsr = &vcpu_cp15(vcpu, c5_DFSR);
}
prepare_fault32(vcpu, PSR_AA32_MODE_ABT, vect_offset);
*far = addr;
/* Give the guest an IMPLEMENTATION DEFINED exception */
is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
if (is_lpae) {
*fsr = DFSR_LPAE | DFSR_FSC_EXTABT_LPAE;
} else {
/* no need to shuffle FS[4] into DFSR[10] as its 0 */
*fsr = DFSR_FSC_EXTABT_nLPAE;
}
post_fault_synchronize(vcpu, loaded);
}
void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr)
{
inject_abt32(vcpu, false, addr);
}
void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr)
{
inject_abt32(vcpu, true, addr);
}

View file

@ -19,6 +19,7 @@
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
#include <linux/sched/stat.h>
#include <linux/psci.h>
#include <trace/events/kvm.h>
#define CREATE_TRACE_POINTS
@ -35,7 +36,6 @@
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_coproc.h>
#include <asm/sections.h>
#include <kvm/arm_hypercalls.h>
@ -46,10 +46,14 @@
__asm__(".arch_extension virt");
#endif
static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT;
DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
/* The VMID used in the VTTBR */
static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
@ -61,6 +65,10 @@ static bool vgic_present;
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
DEFINE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
extern u64 kvm_nvhe_sym(__cpu_logical_map)[NR_CPUS];
extern u32 kvm_nvhe_sym(kvm_host_psci_version);
extern struct psci_0_1_function_ids kvm_nvhe_sym(kvm_host_psci_0_1_function_ids);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
@ -102,7 +110,7 @@ static int kvm_arm_default_max_vcpus(void)
return vgic_present ? kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
}
static void set_default_csv2(struct kvm *kvm)
static void set_default_spectre(struct kvm *kvm)
{
/*
* The default is to expose CSV2 == 1 if the HW isn't affected.
@ -114,6 +122,8 @@ static void set_default_csv2(struct kvm *kvm)
*/
if (arm64_get_spectre_v2_state() == SPECTRE_UNAFFECTED)
kvm->arch.pfr0_csv2 = 1;
if (arm64_get_meltdown_state() == SPECTRE_UNAFFECTED)
kvm->arch.pfr0_csv3 = 1;
}
/**
@ -141,7 +151,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
/* The maximum number of VCPUs is limited by the host's GIC model */
kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
set_default_csv2(kvm);
set_default_spectre(kvm);
return ret;
out_free_stage2_pgd:
@ -198,6 +208,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
case KVM_CAP_ARM_NISV_TO_USER:
case KVM_CAP_ARM_INJECT_EXT_DABT:
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
r = 1;
break;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
@ -229,10 +241,35 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_STEAL_TIME:
r = kvm_arm_pvtime_supported();
break;
default:
r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
case KVM_CAP_ARM_EL1_32BIT:
r = cpus_have_const_cap(ARM64_HAS_32BIT_EL1);
break;
case KVM_CAP_GUEST_DEBUG_HW_BPS:
r = get_num_brps();
break;
case KVM_CAP_GUEST_DEBUG_HW_WPS:
r = get_num_wrps();
break;
case KVM_CAP_ARM_PMU_V3:
r = kvm_arm_support_pmu_v3();
break;
case KVM_CAP_ARM_INJECT_SERROR_ESR:
r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
break;
case KVM_CAP_ARM_VM_IPA_SIZE:
r = get_kvm_ipa_limit();
break;
case KVM_CAP_ARM_SVE:
r = system_supports_sve();
break;
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
r = system_has_full_ptr_auth();
break;
default:
r = 0;
}
return r;
}
@ -1311,47 +1348,52 @@ static unsigned long nvhe_percpu_order(void)
return size ? get_order(size) : 0;
}
static int kvm_map_vectors(void)
/* A lookup table holding the hypervisor VA for each vector slot */
static void *hyp_spectre_vector_selector[BP_HARDEN_EL2_SLOTS];
static int __kvm_vector_slot2idx(enum arm64_hyp_spectre_vector slot)
{
/*
* SV2 = ARM64_SPECTRE_V2
* HEL2 = ARM64_HARDEN_EL2_VECTORS
*
* !SV2 + !HEL2 -> use direct vectors
* SV2 + !HEL2 -> use hardened vectors in place
* !SV2 + HEL2 -> allocate one vector slot and use exec mapping
* SV2 + HEL2 -> use hardened vectors and use exec mapping
*/
if (cpus_have_const_cap(ARM64_SPECTRE_V2)) {
__kvm_bp_vect_base = kvm_ksym_ref(__bp_harden_hyp_vecs);
__kvm_bp_vect_base = kern_hyp_va(__kvm_bp_vect_base);
}
if (cpus_have_const_cap(ARM64_HARDEN_EL2_VECTORS)) {
phys_addr_t vect_pa = __pa_symbol(__bp_harden_hyp_vecs);
unsigned long size = __BP_HARDEN_HYP_VECS_SZ;
/*
* Always allocate a spare vector slot, as we don't
* know yet which CPUs have a BP hardening slot that
* we can reuse.
*/
__kvm_harden_el2_vector_slot = atomic_inc_return(&arm64_el2_vector_last_slot);
BUG_ON(__kvm_harden_el2_vector_slot >= BP_HARDEN_EL2_SLOTS);
return create_hyp_exec_mappings(vect_pa, size,
&__kvm_bp_vect_base);
return slot - (slot != HYP_VECTOR_DIRECT);
}
static void kvm_init_vector_slot(void *base, enum arm64_hyp_spectre_vector slot)
{
int idx = __kvm_vector_slot2idx(slot);
hyp_spectre_vector_selector[slot] = base + (idx * SZ_2K);
}
static int kvm_init_vector_slots(void)
{
int err;
void *base;
base = kern_hyp_va(kvm_ksym_ref(__kvm_hyp_vector));
kvm_init_vector_slot(base, HYP_VECTOR_DIRECT);
base = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs));
kvm_init_vector_slot(base, HYP_VECTOR_SPECTRE_DIRECT);
if (!cpus_have_const_cap(ARM64_SPECTRE_V3A))
return 0;
if (!has_vhe()) {
err = create_hyp_exec_mappings(__pa_symbol(__bp_harden_hyp_vecs),
__BP_HARDEN_HYP_VECS_SZ, &base);
if (err)
return err;
}
kvm_init_vector_slot(base, HYP_VECTOR_INDIRECT);
kvm_init_vector_slot(base, HYP_VECTOR_SPECTRE_INDIRECT);
return 0;
}
static void cpu_init_hyp_mode(void)
{
phys_addr_t pgd_ptr;
unsigned long hyp_stack_ptr;
unsigned long vector_ptr;
unsigned long tpidr_el2;
struct kvm_nvhe_init_params *params = this_cpu_ptr_nvhe_sym(kvm_init_params);
struct arm_smccc_res res;
unsigned long tcr;
/* Switch from the HYP stub to our own HYP init vector */
__hyp_set_vectors(kvm_get_idmap_vector());
@ -1361,13 +1403,38 @@ static void cpu_init_hyp_mode(void)
* kernel's mapping to the linear mapping, and store it in tpidr_el2
* so that we can use adr_l to access per-cpu variables in EL2.
*/
tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
params->tpidr_el2 = (unsigned long)this_cpu_ptr_nvhe_sym(__per_cpu_start) -
(unsigned long)kvm_ksym_ref(CHOOSE_NVHE_SYM(__per_cpu_start));
pgd_ptr = kvm_mmu_get_httbr();
hyp_stack_ptr = __this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE;
hyp_stack_ptr = kern_hyp_va(hyp_stack_ptr);
vector_ptr = (unsigned long)kern_hyp_va(kvm_ksym_ref(__kvm_hyp_host_vector));
params->mair_el2 = read_sysreg(mair_el1);
/*
* The ID map may be configured to use an extended virtual address
* range. This is only the case if system RAM is out of range for the
* currently configured page size and VA_BITS, in which case we will
* also need the extended virtual range for the HYP ID map, or we won't
* be able to enable the EL2 MMU.
*
* However, at EL2, there is only one TTBR register, and we can't switch
* between translation tables *and* update TCR_EL2.T0SZ at the same
* time. Bottom line: we need to use the extended range with *both* our
* translation tables.
*
* So use the same T0SZ value we use for the ID map.
*/
tcr = (read_sysreg(tcr_el1) & TCR_EL2_MASK) | TCR_EL2_RES1;
tcr &= ~TCR_T0SZ_MASK;
tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
params->tcr_el2 = tcr;
params->stack_hyp_va = kern_hyp_va(__this_cpu_read(kvm_arm_hyp_stack_page) + PAGE_SIZE);
params->pgd_pa = kvm_mmu_get_httbr();
/*
* Flush the init params from the data cache because the struct will
* be read while the MMU is off.
*/
kvm_flush_dcache_to_poc(params, sizeof(*params));
/*
* Call initialization code, and switch to the full blown HYP code.
@ -1376,8 +1443,7 @@ static void cpu_init_hyp_mode(void)
* cpus_have_const_cap() wrapper.
*/
BUG_ON(!system_capabilities_finalized());
arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init),
pgd_ptr, tpidr_el2, hyp_stack_ptr, vector_ptr, &res);
arm_smccc_1_1_hvc(KVM_HOST_SMCCC_FUNC(__kvm_hyp_init), virt_to_phys(params), &res);
WARN_ON(res.a0 != SMCCC_RET_SUCCESS);
/*
@ -1396,13 +1462,40 @@ static void cpu_hyp_reset(void)
__hyp_reset_vectors();
}
/*
* EL2 vectors can be mapped and rerouted in a number of ways,
* depending on the kernel configuration and CPU present:
*
* - If the CPU is affected by Spectre-v2, the hardening sequence is
* placed in one of the vector slots, which is executed before jumping
* to the real vectors.
*
* - If the CPU also has the ARM64_SPECTRE_V3A cap, the slot
* containing the hardening sequence is mapped next to the idmap page,
* and executed before jumping to the real vectors.
*
* - If the CPU only has the ARM64_SPECTRE_V3A cap, then an
* empty slot is selected, mapped next to the idmap page, and
* executed before jumping to the real vectors.
*
* Note that ARM64_SPECTRE_V3A is somewhat incompatible with
* VHE, as we don't have hypervisor-specific mappings. If the system
* is VHE and yet selects this capability, it will be ignored.
*/
static void cpu_set_hyp_vector(void)
{
struct bp_hardening_data *data = this_cpu_ptr(&bp_hardening_data);
void *vector = hyp_spectre_vector_selector[data->slot];
*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)vector;
}
static void cpu_hyp_reinit(void)
{
kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
cpu_hyp_reset();
*this_cpu_ptr_hyp_sym(kvm_hyp_vector) = (unsigned long)kvm_get_hyp_vector();
cpu_set_hyp_vector();
if (is_kernel_in_hyp_mode())
kvm_timer_init_vhe();
@ -1439,6 +1532,7 @@ static void _kvm_arch_hardware_disable(void *discard)
void kvm_arch_hardware_disable(void)
{
if (!is_protected_kvm_enabled())
_kvm_arch_hardware_disable(NULL);
}
@ -1482,10 +1576,12 @@ static struct notifier_block hyp_init_cpu_pm_nb = {
static void __init hyp_cpu_pm_init(void)
{
if (!is_protected_kvm_enabled())
cpu_pm_register_notifier(&hyp_init_cpu_pm_nb);
}
static void __init hyp_cpu_pm_exit(void)
{
if (!is_protected_kvm_enabled())
cpu_pm_unregister_notifier(&hyp_init_cpu_pm_nb);
}
#else
@ -1497,6 +1593,36 @@ static inline void hyp_cpu_pm_exit(void)
}
#endif
static void init_cpu_logical_map(void)
{
unsigned int cpu;
/*
* Copy the MPIDR <-> logical CPU ID mapping to hyp.
* Only copy the set of online CPUs whose features have been chacked
* against the finalized system capabilities. The hypervisor will not
* allow any other CPUs from the `possible` set to boot.
*/
for_each_online_cpu(cpu)
kvm_nvhe_sym(__cpu_logical_map)[cpu] = cpu_logical_map(cpu);
}
static bool init_psci_relay(void)
{
/*
* If PSCI has not been initialized, protected KVM cannot install
* itself on newly booted CPUs.
*/
if (!psci_ops.get_version) {
kvm_err("Cannot initialize protected mode without PSCI\n");
return false;
}
kvm_nvhe_sym(kvm_host_psci_version) = psci_ops.get_version();
kvm_nvhe_sym(kvm_host_psci_0_1_function_ids) = get_psci_0_1_function_ids();
return true;
}
static int init_common_resources(void)
{
return kvm_set_ipa_limit();
@ -1541,9 +1667,10 @@ static int init_subsystems(void)
goto out;
kvm_perf_init();
kvm_coproc_table_init();
kvm_sys_reg_table_init();
out:
if (err || !is_protected_kvm_enabled())
on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
return err;
@ -1618,6 +1745,14 @@ static int init_hyp_mode(void)
goto out_err;
}
err = create_hyp_mappings(kvm_ksym_ref(__hyp_data_ro_after_init_start),
kvm_ksym_ref(__hyp_data_ro_after_init_end),
PAGE_HYP_RO);
if (err) {
kvm_err("Cannot map .hyp.data..ro_after_init section\n");
goto out_err;
}
err = create_hyp_mappings(kvm_ksym_ref(__start_rodata),
kvm_ksym_ref(__end_rodata), PAGE_HYP_RO);
if (err) {
@ -1632,12 +1767,6 @@ static int init_hyp_mode(void)
goto out_err;
}
err = kvm_map_vectors();
if (err) {
kvm_err("Cannot map vectors\n");
goto out_err;
}
/*
* Map the Hyp stack pages
*/
@ -1667,6 +1796,13 @@ static int init_hyp_mode(void)
}
}
if (is_protected_kvm_enabled()) {
init_cpu_logical_map();
if (!init_psci_relay())
goto out_err;
}
return 0;
out_err:
@ -1781,14 +1917,24 @@ int kvm_arch_init(void *opaque)
goto out_err;
}
err = kvm_init_vector_slots();
if (err) {
kvm_err("Cannot initialise vector slots\n");
goto out_err;
}
err = init_subsystems();
if (err)
goto out_hyp;
if (in_hyp_mode)
if (is_protected_kvm_enabled()) {
static_branch_enable(&kvm_protected_mode_initialized);
kvm_info("Protected nVHE mode initialized successfully\n");
} else if (in_hyp_mode) {
kvm_info("VHE mode initialized successfully\n");
else
} else {
kvm_info("Hyp mode initialized successfully\n");
}
return 0;
@ -1806,6 +1952,25 @@ void kvm_arch_exit(void)
kvm_perf_teardown();
}
static int __init early_kvm_mode_cfg(char *arg)
{
if (!arg)
return -EINVAL;
if (strcmp(arg, "protected") == 0) {
kvm_mode = KVM_MODE_PROTECTED;
return 0;
}
return -EINVAL;
}
early_param("kvm-arm.mode", early_kvm_mode_cfg);
enum kvm_mode kvm_get_mode(void)
{
return kvm_mode;
}
static int arm_init(void)
{
int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);

View file

@ -24,7 +24,6 @@
#include <asm/fpsimd.h>
#include <asm/kvm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_coproc.h>
#include <asm/sigcontext.h>
#include "trace.h"
@ -252,10 +251,32 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
memcpy(addr, valp, KVM_REG_SIZE(reg->id));
if (*vcpu_cpsr(vcpu) & PSR_MODE32_BIT) {
int i;
int i, nr_reg;
for (i = 0; i < 16; i++)
*vcpu_reg32(vcpu, i) = (u32)*vcpu_reg32(vcpu, i);
switch (*vcpu_cpsr(vcpu)) {
/*
* Either we are dealing with user mode, and only the
* first 15 registers (+ PC) must be narrowed to 32bit.
* AArch32 r0-r14 conveniently map to AArch64 x0-x14.
*/
case PSR_AA32_MODE_USR:
case PSR_AA32_MODE_SYS:
nr_reg = 15;
break;
/*
* Otherwide, this is a priviledged mode, and *all* the
* registers must be narrowed to 32bit.
*/
default:
nr_reg = 31;
break;
}
for (i = 0; i < nr_reg; i++)
vcpu_set_reg(vcpu, i, (u32)vcpu_get_reg(vcpu, i));
*vcpu_pc(vcpu) = (u32)*vcpu_pc(vcpu);
}
out:
return err;

View file

@ -14,7 +14,6 @@
#include <asm/esr.h>
#include <asm/exception.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_coproc.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/debug-monitors.h>
@ -61,7 +60,7 @@ static int handle_smc(struct kvm_vcpu *vcpu)
* otherwise return to the same address...
*/
vcpu_set_reg(vcpu, 0, ~0UL);
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
kvm_incr_pc(vcpu);
return 1;
}
@ -100,7 +99,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
kvm_clear_request(KVM_REQ_UNHALT, vcpu);
}
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
kvm_incr_pc(vcpu);
return 1;
}
@ -221,7 +220,7 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu)
* that fail their condition code check"
*/
if (!kvm_condition_valid(vcpu)) {
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
kvm_incr_pc(vcpu);
handled = 1;
} else {
exit_handle_fn exit_handler;
@ -241,23 +240,6 @@ int handle_exit(struct kvm_vcpu *vcpu, int exception_index)
{
struct kvm_run *run = vcpu->run;
if (ARM_SERROR_PENDING(exception_index)) {
u8 esr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
/*
* HVC/SMC already have an adjusted PC, which we need
* to correct in order to return to after having
* injected the SError.
*/
if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64 ||
esr_ec == ESR_ELx_EC_SMC32 || esr_ec == ESR_ELx_EC_SMC64) {
u32 adj = kvm_vcpu_trap_il_is32bit(vcpu) ? 4 : 2;
*vcpu_pc(vcpu) -= adj;
}
return 1;
}
exception_index = ARM_EXCEPTION_CODE(exception_index);
switch (exception_index) {

View file

@ -10,4 +10,4 @@ subdir-ccflags-y := -I$(incdir) \
-DDISABLE_BRANCH_PROFILING \
$(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o smccc_wa.o
obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o

View file

@ -123,13 +123,13 @@ static void kvm_adjust_itstate(struct kvm_vcpu *vcpu)
* kvm_skip_instr - skip a trapped instruction and proceed to the next
* @vcpu: The vcpu pointer
*/
void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr)
void kvm_skip_instr32(struct kvm_vcpu *vcpu)
{
u32 pc = *vcpu_pc(vcpu);
bool is_thumb;
is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_AA32_T_BIT);
if (is_thumb && !is_wide_instr)
if (is_thumb && !kvm_vcpu_trap_il_is32bit(vcpu))
pc += 2;
else
pc += 4;

View file

@ -0,0 +1,331 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Fault injection for both 32 and 64bit guests.
*
* Copyright (C) 2012,2013 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
*
* Based on arch/arm/kvm/emulate.c
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
*/
#include <hyp/adjust_pc.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
#if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
#error Hypervisor code only!
#endif
static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
{
u64 val;
if (__vcpu_read_sys_reg_from_cpu(reg, &val))
return val;
return __vcpu_sys_reg(vcpu, reg);
}
static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
{
if (__vcpu_write_sys_reg_to_cpu(val, reg))
return;
__vcpu_sys_reg(vcpu, reg) = val;
}
static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
{
write_sysreg_el1(val, SYS_SPSR);
}
static void __vcpu_write_spsr_abt(struct kvm_vcpu *vcpu, u64 val)
{
if (has_vhe())
write_sysreg(val, spsr_abt);
else
vcpu->arch.ctxt.spsr_abt = val;
}
static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
{
if (has_vhe())
write_sysreg(val, spsr_und);
else
vcpu->arch.ctxt.spsr_und = val;
}
/*
* This performs the exception entry at a given EL (@target_mode), stashing PC
* and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
* The EL passed to this function *must* be a non-secure, privileged mode with
* bit 0 being set (PSTATE.SP == 1).
*
* When an exception is taken, most PSTATE fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
* of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
* layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
*
* For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
* For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
*
* Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
* MSB to LSB.
*/
static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
enum exception_type type)
{
unsigned long sctlr, vbar, old, new, mode;
u64 exc_offset;
mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
if (mode == target_mode)
exc_offset = CURRENT_EL_SP_ELx_VECTOR;
else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
exc_offset = CURRENT_EL_SP_EL0_VECTOR;
else if (!(mode & PSR_MODE32_BIT))
exc_offset = LOWER_EL_AArch64_VECTOR;
else
exc_offset = LOWER_EL_AArch32_VECTOR;
switch (target_mode) {
case PSR_MODE_EL1h:
vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
break;
default:
/* Don't do that */
BUG();
}
*vcpu_pc(vcpu) = vbar + exc_offset + type;
old = *vcpu_cpsr(vcpu);
new = 0;
new |= (old & PSR_N_BIT);
new |= (old & PSR_Z_BIT);
new |= (old & PSR_C_BIT);
new |= (old & PSR_V_BIT);
// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
new |= (old & PSR_DIT_BIT);
// PSTATE.UAO is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D5-2579.
// PSTATE.PAN is unchanged unless SCTLR_ELx.SPAN == 0b0
// SCTLR_ELx.SPAN is RES1 when ARMv8.1-PAN is not implemented
// See ARM DDI 0487E.a, page D5-2578.
new |= (old & PSR_PAN_BIT);
if (!(sctlr & SCTLR_EL1_SPAN))
new |= PSR_PAN_BIT;
// PSTATE.SS is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D2-2452.
// PSTATE.IL is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D1-2306.
// PSTATE.SSBS is set to SCTLR_ELx.DSSBS upon any exception to AArch64
// See ARM DDI 0487E.a, page D13-3258
if (sctlr & SCTLR_ELx_DSSBS)
new |= PSR_SSBS_BIT;
// PSTATE.BTYPE is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, pages D1-2293 to D1-2294.
new |= PSR_D_BIT;
new |= PSR_A_BIT;
new |= PSR_I_BIT;
new |= PSR_F_BIT;
new |= target_mode;
*vcpu_cpsr(vcpu) = new;
__vcpu_write_spsr(vcpu, old);
}
/*
* When an exception is taken, most CPSR fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]).
*
* The SPSR/SPSR_ELx layouts differ, and the below is intended to work with
* either format. Note: SPSR.J bit doesn't exist in SPSR_ELx, but this bit was
* obsoleted by the ARMv7 virtualization extensions and is RES0.
*
* For the SPSR layout seen from AArch32, see:
* - ARM DDI 0406C.d, page B1-1148
* - ARM DDI 0487E.a, page G8-6264
*
* For the SPSR_ELx layout for AArch32 seen from AArch64, see:
* - ARM DDI 0487E.a, page C5-426
*
* Here we manipulate the fields in order of the AArch32 SPSR_ELx layout, from
* MSB to LSB.
*/
static unsigned long get_except32_cpsr(struct kvm_vcpu *vcpu, u32 mode)
{
u32 sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
unsigned long old, new;
old = *vcpu_cpsr(vcpu);
new = 0;
new |= (old & PSR_AA32_N_BIT);
new |= (old & PSR_AA32_Z_BIT);
new |= (old & PSR_AA32_C_BIT);
new |= (old & PSR_AA32_V_BIT);
new |= (old & PSR_AA32_Q_BIT);
// CPSR.IT[7:0] are set to zero upon any exception
// See ARM DDI 0487E.a, section G1.12.3
// See ARM DDI 0406C.d, section B1.8.3
new |= (old & PSR_AA32_DIT_BIT);
// CPSR.SSBS is set to SCTLR.DSSBS upon any exception
// See ARM DDI 0487E.a, page G8-6244
if (sctlr & BIT(31))
new |= PSR_AA32_SSBS_BIT;
// CPSR.PAN is unchanged unless SCTLR.SPAN == 0b0
// SCTLR.SPAN is RES1 when ARMv8.1-PAN is not implemented
// See ARM DDI 0487E.a, page G8-6246
new |= (old & PSR_AA32_PAN_BIT);
if (!(sctlr & BIT(23)))
new |= PSR_AA32_PAN_BIT;
// SS does not exist in AArch32, so ignore
// CPSR.IL is set to zero upon any exception
// See ARM DDI 0487E.a, page G1-5527
new |= (old & PSR_AA32_GE_MASK);
// CPSR.IT[7:0] are set to zero upon any exception
// See prior comment above
// CPSR.E is set to SCTLR.EE upon any exception
// See ARM DDI 0487E.a, page G8-6245
// See ARM DDI 0406C.d, page B4-1701
if (sctlr & BIT(25))
new |= PSR_AA32_E_BIT;
// CPSR.A is unchanged upon an exception to Undefined, Supervisor
// CPSR.A is set upon an exception to other modes
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= (old & PSR_AA32_A_BIT);
if (mode != PSR_AA32_MODE_UND && mode != PSR_AA32_MODE_SVC)
new |= PSR_AA32_A_BIT;
// CPSR.I is set upon any exception
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= PSR_AA32_I_BIT;
// CPSR.F is set upon an exception to FIQ
// CPSR.F is unchanged upon an exception to other modes
// See ARM DDI 0487E.a, pages G1-5515 to G1-5516
// See ARM DDI 0406C.d, page B1-1182
new |= (old & PSR_AA32_F_BIT);
if (mode == PSR_AA32_MODE_FIQ)
new |= PSR_AA32_F_BIT;
// CPSR.T is set to SCTLR.TE upon any exception
// See ARM DDI 0487E.a, page G8-5514
// See ARM DDI 0406C.d, page B1-1181
if (sctlr & BIT(30))
new |= PSR_AA32_T_BIT;
new |= mode;
return new;
}
/*
* Table taken from ARMv8 ARM DDI0487B-B, table G1-10.
*/
static const u8 return_offsets[8][2] = {
[0] = { 0, 0 }, /* Reset, unused */
[1] = { 4, 2 }, /* Undefined */
[2] = { 0, 0 }, /* SVC, unused */
[3] = { 4, 4 }, /* Prefetch abort */
[4] = { 8, 8 }, /* Data abort */
[5] = { 0, 0 }, /* HVC, unused */
[6] = { 4, 4 }, /* IRQ, unused */
[7] = { 4, 4 }, /* FIQ, unused */
};
static void enter_exception32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
unsigned long spsr = *vcpu_cpsr(vcpu);
bool is_thumb = (spsr & PSR_AA32_T_BIT);
u32 sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
u32 return_address;
*vcpu_cpsr(vcpu) = get_except32_cpsr(vcpu, mode);
return_address = *vcpu_pc(vcpu);
return_address += return_offsets[vect_offset >> 2][is_thumb];
/* KVM only enters the ABT and UND modes, so only deal with those */
switch(mode) {
case PSR_AA32_MODE_ABT:
__vcpu_write_spsr_abt(vcpu, host_spsr_to_spsr32(spsr));
vcpu_gp_regs(vcpu)->compat_lr_abt = return_address;
break;
case PSR_AA32_MODE_UND:
__vcpu_write_spsr_und(vcpu, host_spsr_to_spsr32(spsr));
vcpu_gp_regs(vcpu)->compat_lr_und = return_address;
break;
}
/* Branch to exception vector */
if (sctlr & (1 << 13))
vect_offset += 0xffff0000;
else /* always have security exceptions */
vect_offset += __vcpu_read_sys_reg(vcpu, VBAR_EL1);
*vcpu_pc(vcpu) = vect_offset;
}
void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
if (vcpu_el1_is_32bit(vcpu)) {
switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
case KVM_ARM64_EXCEPT_AA32_UND:
enter_exception32(vcpu, PSR_AA32_MODE_UND, 4);
break;
case KVM_ARM64_EXCEPT_AA32_IABT:
enter_exception32(vcpu, PSR_AA32_MODE_ABT, 12);
break;
case KVM_ARM64_EXCEPT_AA32_DABT:
enter_exception32(vcpu, PSR_AA32_MODE_ABT, 16);
break;
default:
/* Err... */
break;
}
} else {
switch (vcpu->arch.flags & KVM_ARM64_EXCEPT_MASK) {
case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
KVM_ARM64_EXCEPT_AA64_EL1):
enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
break;
default:
/*
* Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
* will be implemented at some point. Everything
* else gets silently ignored.
*/
break;
}
}
}

View file

@ -13,6 +13,7 @@
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/mmu.h>
#include <asm/spectre.h>
.macro save_caller_saved_regs_vect
/* x0 and x1 were saved in the vector entry */
@ -187,21 +188,29 @@ SYM_CODE_START(__kvm_hyp_vector)
valid_vect el1_error // Error 32-bit EL1
SYM_CODE_END(__kvm_hyp_vector)
.macro hyp_ventry
.macro spectrev2_smccc_wa1_smc
sub sp, sp, #(8 * 4)
stp x2, x3, [sp, #(8 * 0)]
stp x0, x1, [sp, #(8 * 2)]
mov w0, #ARM_SMCCC_ARCH_WORKAROUND_1
smc #0
ldp x2, x3, [sp, #(8 * 0)]
add sp, sp, #(8 * 2)
.endm
.macro hyp_ventry indirect, spectrev2
.align 7
1: esb
.rept 26
nop
.endr
/*
* The default sequence is to directly branch to the KVM vectors,
* using the computed offset. This applies for VHE as well as
* !ARM64_HARDEN_EL2_VECTORS. The first vector must always run the preamble.
.if \spectrev2 != 0
spectrev2_smccc_wa1_smc
.else
stp x0, x1, [sp, #-16]!
.endif
.if \indirect != 0
alternative_cb kvm_patch_vector_branch
/*
* For ARM64_SPECTRE_V3A configurations, these NOPs get replaced with:
*
* For ARM64_HARDEN_EL2_VECTORS configurations, this gets replaced
* with:
*
* stp x0, x1, [sp, #-16]!
* movz x0, #(addr & 0xffff)
* movk x0, #((addr >> 16) & 0xffff), lsl #16
* movk x0, #((addr >> 32) & 0xffff), lsl #32
@ -211,28 +220,28 @@ SYM_CODE_END(__kvm_hyp_vector)
* addr = kern_hyp_va(__kvm_hyp_vector) + vector-offset + KVM_VECTOR_PREAMBLE.
* See kvm_patch_vector_branch for details.
*/
alternative_cb kvm_patch_vector_branch
stp x0, x1, [sp, #-16]!
nop
nop
nop
nop
alternative_cb_end
.endif
b __kvm_hyp_vector + (1b - 0b + KVM_VECTOR_PREAMBLE)
nop
nop
nop
alternative_cb_end
.endm
.macro generate_vectors
.macro generate_vectors indirect, spectrev2
0:
.rept 16
hyp_ventry
hyp_ventry \indirect, \spectrev2
.endr
.org 0b + SZ_2K // Safety measure
.endm
.align 11
SYM_CODE_START(__bp_harden_hyp_vecs)
.rept BP_HARDEN_EL2_SLOTS
generate_vectors
.endr
generate_vectors indirect = 0, spectrev2 = 1 // HYP_VECTOR_SPECTRE_DIRECT
generate_vectors indirect = 1, spectrev2 = 0 // HYP_VECTOR_INDIRECT
generate_vectors indirect = 1, spectrev2 = 1 // HYP_VECTOR_SPECTRE_INDIRECT
1: .org __bp_harden_hyp_vecs + __BP_HARDEN_HYP_VECS_SZ
.org 1b
SYM_CODE_END(__bp_harden_hyp_vecs)

View file

@ -0,0 +1,62 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Guest PC manipulation helpers
*
* Copyright (C) 2012,2013 - ARM Ltd
* Copyright (C) 2020 - Google LLC
* Author: Marc Zyngier <maz@kernel.org>
*/
#ifndef __ARM64_KVM_HYP_ADJUST_PC_H__
#define __ARM64_KVM_HYP_ADJUST_PC_H__
#include <asm/kvm_emulate.h>
#include <asm/kvm_host.h>
void kvm_inject_exception(struct kvm_vcpu *vcpu);
static inline void kvm_skip_instr(struct kvm_vcpu *vcpu)
{
if (vcpu_mode_is_32bit(vcpu)) {
kvm_skip_instr32(vcpu);
} else {
*vcpu_pc(vcpu) += 4;
*vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK;
}
/* advance the singlestep state machine */
*vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS;
}
/*
* Skip an instruction which has been emulated at hyp while most guest sysregs
* are live.
*/
static inline void __kvm_skip_instr(struct kvm_vcpu *vcpu)
{
*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
kvm_skip_instr(vcpu);
write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
}
/*
* Adjust the guest PC on entry, depending on flags provided by EL1
* for the purpose of emulation (MMIO, sysreg) or exception injection.
*/
static inline void __adjust_pc(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.flags & KVM_ARM64_PENDING_EXCEPTION) {
kvm_inject_exception(vcpu);
vcpu->arch.flags &= ~(KVM_ARM64_PENDING_EXCEPTION |
KVM_ARM64_EXCEPT_MASK);
} else if (vcpu->arch.flags & KVM_ARM64_INCREMENT_PC) {
kvm_skip_instr(vcpu);
vcpu->arch.flags &= ~KVM_ARM64_INCREMENT_PC;
}
}
#endif

View file

@ -7,6 +7,8 @@
#ifndef __ARM64_KVM_HYP_SWITCH_H__
#define __ARM64_KVM_HYP_SWITCH_H__
#include <hyp/adjust_pc.h>
#include <linux/arm-smccc.h>
#include <linux/kvm_host.h>
#include <linux/types.h>
@ -409,6 +411,21 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
if (ARM_SERROR_PENDING(*exit_code)) {
u8 esr_ec = kvm_vcpu_trap_get_class(vcpu);
/*
* HVC already have an adjusted PC, which we need to
* correct in order to return to after having injected
* the SError.
*
* SMC, on the other hand, is *trapped*, meaning its
* preferred return address is the SMC itself.
*/
if (esr_ec == ESR_ELx_EC_HVC32 || esr_ec == ESR_ELx_EC_HVC64)
write_sysreg_el2(read_sysreg_el2(SYS_ELR) - 4, SYS_ELR);
}
/*
* We're using the raw exception code in order to only process
* the trap if no SError is pending. We will come back to the

View file

@ -0,0 +1,18 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Trap handler helpers.
*
* Copyright (C) 2020 - Google LLC
* Author: Marc Zyngier <maz@kernel.org>
*/
#ifndef __ARM64_KVM_NVHE_TRAP_HANDLER_H__
#define __ARM64_KVM_NVHE_TRAP_HANDLER_H__
#include <asm/kvm_host.h>
#define cpu_reg(ctxt, r) (ctxt)->regs.regs[r]
#define DECLARE_REG(type, name, ctxt, reg) \
type name = (type)cpu_reg(ctxt, (reg))
#endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */

View file

@ -6,9 +6,10 @@
asflags-y := -D__KVM_NVHE_HYPERVISOR__
ccflags-y := -D__KVM_NVHE_HYPERVISOR__
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o hyp-main.o
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
hyp-main.o hyp-smp.o psci-relay.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o
../fpsimd.o ../hyp-entry.o ../exception.o
##
## Build rules for compiling nVHE hyp code

View file

@ -13,8 +13,6 @@
.text
SYM_FUNC_START(__host_exit)
stp x0, x1, [sp, #-16]!
get_host_ctxt x0, x1
/* Store the host regs x2 and x3 */
@ -41,6 +39,7 @@ SYM_FUNC_START(__host_exit)
bl handle_trap
/* Restore host regs x0-x17 */
__host_enter_restore_full:
ldp x0, x1, [x29, #CPU_XREG_OFFSET(0)]
ldp x2, x3, [x29, #CPU_XREG_OFFSET(2)]
ldp x4, x5, [x29, #CPU_XREG_OFFSET(4)]
@ -63,6 +62,14 @@ __host_enter_without_restoring:
sb
SYM_FUNC_END(__host_exit)
/*
* void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
*/
SYM_FUNC_START(__host_enter)
mov x29, x0
b __host_enter_restore_full
SYM_FUNC_END(__host_enter)
/*
* void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
*/
@ -99,13 +106,15 @@ SYM_FUNC_END(__hyp_do_panic)
mrs x0, esr_el2
lsr x0, x0, #ESR_ELx_EC_SHIFT
cmp x0, #ESR_ELx_EC_HVC64
ldp x0, x1, [sp], #16
b.ne __host_exit
ldp x0, x1, [sp] // Don't fixup the stack yet
/* Check for a stub HVC call */
cmp x0, #HVC_STUB_HCALL_NR
b.hs __host_exit
add sp, sp, #16
/*
* Compute the idmap address of __kvm_handle_stub_hvc and
* jump there. Since we use kimage_voffset, do not use the
@ -115,10 +124,7 @@ SYM_FUNC_END(__hyp_do_panic)
* Preserve x0-x4, which may contain stub parameters.
*/
ldr x5, =__kvm_handle_stub_hvc
ldr_l x6, kimage_voffset
/* x5 = __pa(x5) */
sub x5, x5, x6
kimg_pa x5, x6
br x5
.L__vect_end\@:
.if ((.L__vect_end\@ - .L__vect_start\@) > 0x80)
@ -183,3 +189,41 @@ SYM_CODE_START(__kvm_hyp_host_vector)
invalid_host_el1_vect // FIQ 32-bit EL1
invalid_host_el1_vect // Error 32-bit EL1
SYM_CODE_END(__kvm_hyp_host_vector)
/*
* Forward SMC with arguments in struct kvm_cpu_context, and
* store the result into the same struct. Assumes SMCCC 1.2 or older.
*
* x0: struct kvm_cpu_context*
*/
SYM_CODE_START(__kvm_hyp_host_forward_smc)
/*
* Use x18 to keep the pointer to the host context because
* x18 is callee-saved in SMCCC but not in AAPCS64.
*/
mov x18, x0
ldp x0, x1, [x18, #CPU_XREG_OFFSET(0)]
ldp x2, x3, [x18, #CPU_XREG_OFFSET(2)]
ldp x4, x5, [x18, #CPU_XREG_OFFSET(4)]
ldp x6, x7, [x18, #CPU_XREG_OFFSET(6)]
ldp x8, x9, [x18, #CPU_XREG_OFFSET(8)]
ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
smc #0
stp x0, x1, [x18, #CPU_XREG_OFFSET(0)]
stp x2, x3, [x18, #CPU_XREG_OFFSET(2)]
stp x4, x5, [x18, #CPU_XREG_OFFSET(4)]
stp x6, x7, [x18, #CPU_XREG_OFFSET(6)]
stp x8, x9, [x18, #CPU_XREG_OFFSET(8)]
stp x10, x11, [x18, #CPU_XREG_OFFSET(10)]
stp x12, x13, [x18, #CPU_XREG_OFFSET(12)]
stp x14, x15, [x18, #CPU_XREG_OFFSET(14)]
stp x16, x17, [x18, #CPU_XREG_OFFSET(16)]
ret
SYM_CODE_END(__kvm_hyp_host_forward_smc)

View file

@ -9,6 +9,7 @@
#include <asm/alternative.h>
#include <asm/assembler.h>
#include <asm/el2_setup.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>
@ -47,10 +48,7 @@ __invalid:
/*
* x0: SMCCC function ID
* x1: HYP pgd
* x2: per-CPU offset
* x3: HYP stack
* x4: HYP vectors
* x1: struct kvm_nvhe_init_params PA
*/
__do_hyp_init:
/* Check for a stub HVC call */
@ -71,48 +69,53 @@ __do_hyp_init:
mov x0, #SMCCC_RET_NOT_SUPPORTED
eret
1:
/* Set tpidr_el2 for use by HYP to free a register */
msr tpidr_el2, x2
1: mov x0, x1
mov x4, lr
bl ___kvm_hyp_init
mov lr, x4
phys_to_ttbr x0, x1
alternative_if ARM64_HAS_CNP
orr x0, x0, #TTBR_CNP_BIT
alternative_else_nop_endif
msr ttbr0_el2, x0
/* Hello, World! */
mov x0, #SMCCC_RET_SUCCESS
eret
SYM_CODE_END(__kvm_hyp_init)
mrs x0, tcr_el1
mov_q x1, TCR_EL2_MASK
and x0, x0, x1
mov x1, #TCR_EL2_RES1
orr x0, x0, x1
/*
* The ID map may be configured to use an extended virtual address
* range. This is only the case if system RAM is out of range for the
* currently configured page size and VA_BITS, in which case we will
* also need the extended virtual range for the HYP ID map, or we won't
* be able to enable the EL2 MMU.
/*
* Initialize the hypervisor in EL2.
*
* However, at EL2, there is only one TTBR register, and we can't switch
* between translation tables *and* update TCR_EL2.T0SZ at the same
* time. Bottom line: we need to use the extended range with *both* our
* translation tables.
* Only uses x0..x3 so as to not clobber callee-saved SMCCC registers
* and leave x4 for the caller.
*
* So use the same T0SZ value we use for the ID map.
* x0: struct kvm_nvhe_init_params PA
*/
ldr_l x1, idmap_t0sz
bfi x0, x1, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
SYM_CODE_START_LOCAL(___kvm_hyp_init)
alternative_if ARM64_KVM_PROTECTED_MODE
mov_q x1, HCR_HOST_NVHE_PROTECTED_FLAGS
msr hcr_el2, x1
alternative_else_nop_endif
ldr x1, [x0, #NVHE_INIT_TPIDR_EL2]
msr tpidr_el2, x1
ldr x1, [x0, #NVHE_INIT_STACK_HYP_VA]
mov sp, x1
ldr x1, [x0, #NVHE_INIT_MAIR_EL2]
msr mair_el2, x1
ldr x1, [x0, #NVHE_INIT_PGD_PA]
phys_to_ttbr x2, x1
alternative_if ARM64_HAS_CNP
orr x2, x2, #TTBR_CNP_BIT
alternative_else_nop_endif
msr ttbr0_el2, x2
/*
* Set the PS bits in TCR_EL2.
*/
tcr_compute_pa_size x0, #TCR_EL2_PS_SHIFT, x1, x2
ldr x1, [x0, #NVHE_INIT_TCR_EL2]
tcr_compute_pa_size x1, #TCR_EL2_PS_SHIFT, x2, x3
msr tcr_el2, x1
msr tcr_el2, x0
mrs x0, mair_el1
msr mair_el2, x0
isb
/* Invalidate the stale TLBs from Bootloader */
@ -134,14 +137,70 @@ alternative_else_nop_endif
msr sctlr_el2, x0
isb
/* Set the stack and new vectors */
mov sp, x3
msr vbar_el2, x4
/* Set the host vector */
ldr x0, =__kvm_hyp_host_vector
kimg_hyp_va x0, x1
msr vbar_el2, x0
/* Hello, World! */
mov x0, #SMCCC_RET_SUCCESS
eret
SYM_CODE_END(__kvm_hyp_init)
ret
SYM_CODE_END(___kvm_hyp_init)
/*
* PSCI CPU_ON entry point
*
* x0: struct kvm_nvhe_init_params PA
*/
SYM_CODE_START(kvm_hyp_cpu_entry)
mov x1, #1 // is_cpu_on = true
b __kvm_hyp_init_cpu
SYM_CODE_END(kvm_hyp_cpu_entry)
/*
* PSCI CPU_SUSPEND / SYSTEM_SUSPEND entry point
*
* x0: struct kvm_nvhe_init_params PA
*/
SYM_CODE_START(kvm_hyp_cpu_resume)
mov x1, #0 // is_cpu_on = false
b __kvm_hyp_init_cpu
SYM_CODE_END(kvm_hyp_cpu_resume)
/*
* Common code for CPU entry points. Initializes EL2 state and
* installs the hypervisor before handing over to a C handler.
*
* x0: struct kvm_nvhe_init_params PA
* x1: bool is_cpu_on
*/
SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu)
mov x28, x0 // Stash arguments
mov x29, x1
/* Check that the core was booted in EL2. */
mrs x0, CurrentEL
cmp x0, #CurrentEL_EL2
b.eq 2f
/* The core booted in EL1. KVM cannot be initialized on it. */
1: wfe
wfi
b 1b
2: msr SPsel, #1 // We want to use SP_EL{1,2}
/* Initialize EL2 CPU state to sane values. */
init_el2_state nvhe // Clobbers x0..x2
/* Enable MMU, set vectors and stack. */
mov x0, x28
bl ___kvm_hyp_init // Clobbers x0..x3
/* Leave idmap. */
mov x0, x29
ldr x1, =kvm_host_psci_cpu_entry
kimg_hyp_va x1, x2
br x1
SYM_CODE_END(__kvm_hyp_init_cpu)
SYM_CODE_START(__kvm_handle_stub_hvc)
cmp x0, #HVC_SOFT_RESTART
@ -176,6 +235,11 @@ reset:
msr sctlr_el2, x5
isb
alternative_if ARM64_KVM_PROTECTED_MODE
mov_q x5, HCR_HOST_NVHE_FLAGS
msr hcr_el2, x5
alternative_else_nop_endif
/* Install stub vectors */
adr_l x5, __hyp_stub_vectors
msr vbar_el2, x5

View file

@ -12,106 +12,183 @@
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
#include <kvm/arm_hypercalls.h>
#include <nvhe/trap_handler.h>
static void handle_host_hcall(unsigned long func_id,
struct kvm_cpu_context *host_ctxt)
DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
unsigned long ret = 0;
DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
switch (func_id) {
case KVM_HOST_SMCCC_FUNC(__kvm_vcpu_run): {
unsigned long r1 = host_ctxt->regs.regs[1];
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)r1;
cpu_reg(host_ctxt, 1) = __kvm_vcpu_run(kern_hyp_va(vcpu));
}
ret = __kvm_vcpu_run(kern_hyp_va(vcpu));
break;
}
case KVM_HOST_SMCCC_FUNC(__kvm_flush_vm_context):
static void handle___kvm_flush_vm_context(struct kvm_cpu_context *host_ctxt)
{
__kvm_flush_vm_context();
break;
case KVM_HOST_SMCCC_FUNC(__kvm_tlb_flush_vmid_ipa): {
unsigned long r1 = host_ctxt->regs.regs[1];
struct kvm_s2_mmu *mmu = (struct kvm_s2_mmu *)r1;
phys_addr_t ipa = host_ctxt->regs.regs[2];
int level = host_ctxt->regs.regs[3];
}
static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
DECLARE_REG(phys_addr_t, ipa, host_ctxt, 2);
DECLARE_REG(int, level, host_ctxt, 3);
__kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
break;
}
case KVM_HOST_SMCCC_FUNC(__kvm_tlb_flush_vmid): {
unsigned long r1 = host_ctxt->regs.regs[1];
struct kvm_s2_mmu *mmu = (struct kvm_s2_mmu *)r1;
}
static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
__kvm_tlb_flush_vmid(kern_hyp_va(mmu));
break;
}
case KVM_HOST_SMCCC_FUNC(__kvm_tlb_flush_local_vmid): {
unsigned long r1 = host_ctxt->regs.regs[1];
struct kvm_s2_mmu *mmu = (struct kvm_s2_mmu *)r1;
}
static void handle___kvm_tlb_flush_local_vmid(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
__kvm_tlb_flush_local_vmid(kern_hyp_va(mmu));
break;
}
case KVM_HOST_SMCCC_FUNC(__kvm_timer_set_cntvoff): {
u64 cntvoff = host_ctxt->regs.regs[1];
}
__kvm_timer_set_cntvoff(cntvoff);
break;
}
case KVM_HOST_SMCCC_FUNC(__kvm_enable_ssbs):
__kvm_enable_ssbs();
break;
case KVM_HOST_SMCCC_FUNC(__vgic_v3_get_ich_vtr_el2):
ret = __vgic_v3_get_ich_vtr_el2();
break;
case KVM_HOST_SMCCC_FUNC(__vgic_v3_read_vmcr):
ret = __vgic_v3_read_vmcr();
break;
case KVM_HOST_SMCCC_FUNC(__vgic_v3_write_vmcr): {
u32 vmcr = host_ctxt->regs.regs[1];
static void handle___kvm_timer_set_cntvoff(struct kvm_cpu_context *host_ctxt)
{
__kvm_timer_set_cntvoff(cpu_reg(host_ctxt, 1));
}
__vgic_v3_write_vmcr(vmcr);
break;
}
case KVM_HOST_SMCCC_FUNC(__vgic_v3_init_lrs):
static void handle___kvm_enable_ssbs(struct kvm_cpu_context *host_ctxt)
{
u64 tmp;
tmp = read_sysreg_el2(SYS_SCTLR);
tmp |= SCTLR_ELx_DSSBS;
write_sysreg_el2(tmp, SYS_SCTLR);
}
static void handle___vgic_v3_get_ich_vtr_el2(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __vgic_v3_get_ich_vtr_el2();
}
static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __vgic_v3_read_vmcr();
}
static void handle___vgic_v3_write_vmcr(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_write_vmcr(cpu_reg(host_ctxt, 1));
}
static void handle___vgic_v3_init_lrs(struct kvm_cpu_context *host_ctxt)
{
__vgic_v3_init_lrs();
break;
case KVM_HOST_SMCCC_FUNC(__kvm_get_mdcr_el2):
ret = __kvm_get_mdcr_el2();
break;
case KVM_HOST_SMCCC_FUNC(__vgic_v3_save_aprs): {
unsigned long r1 = host_ctxt->regs.regs[1];
struct vgic_v3_cpu_if *cpu_if = (struct vgic_v3_cpu_if *)r1;
}
static void handle___kvm_get_mdcr_el2(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __kvm_get_mdcr_el2();
}
static void handle___vgic_v3_save_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_save_aprs(kern_hyp_va(cpu_if));
break;
}
case KVM_HOST_SMCCC_FUNC(__vgic_v3_restore_aprs): {
unsigned long r1 = host_ctxt->regs.regs[1];
struct vgic_v3_cpu_if *cpu_if = (struct vgic_v3_cpu_if *)r1;
}
static void handle___vgic_v3_restore_aprs(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v3_cpu_if *, cpu_if, host_ctxt, 1);
__vgic_v3_restore_aprs(kern_hyp_va(cpu_if));
break;
}
default:
/* Invalid host HVC. */
host_ctxt->regs.regs[0] = SMCCC_RET_NOT_SUPPORTED;
return;
}
}
host_ctxt->regs.regs[0] = SMCCC_RET_SUCCESS;
host_ctxt->regs.regs[1] = ret;
typedef void (*hcall_t)(struct kvm_cpu_context *);
#define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = kimg_fn_ptr(handle_##x)
static const hcall_t *host_hcall[] = {
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
HANDLE_FUNC(__kvm_tlb_flush_vmid),
HANDLE_FUNC(__kvm_tlb_flush_local_vmid),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__kvm_enable_ssbs),
HANDLE_FUNC(__vgic_v3_get_ich_vtr_el2),
HANDLE_FUNC(__vgic_v3_read_vmcr),
HANDLE_FUNC(__vgic_v3_write_vmcr),
HANDLE_FUNC(__vgic_v3_init_lrs),
HANDLE_FUNC(__kvm_get_mdcr_el2),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_aprs),
};
static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(unsigned long, id, host_ctxt, 0);
const hcall_t *kfn;
hcall_t hfn;
id -= KVM_HOST_SMCCC_ID(0);
if (unlikely(id >= ARRAY_SIZE(host_hcall)))
goto inval;
kfn = host_hcall[id];
if (unlikely(!kfn))
goto inval;
cpu_reg(host_ctxt, 0) = SMCCC_RET_SUCCESS;
hfn = kimg_fn_hyp_va(kfn);
hfn(host_ctxt);
return;
inval:
cpu_reg(host_ctxt, 0) = SMCCC_RET_NOT_SUPPORTED;
}
static void default_host_smc_handler(struct kvm_cpu_context *host_ctxt)
{
__kvm_hyp_host_forward_smc(host_ctxt);
}
static void skip_host_instruction(void)
{
write_sysreg_el2(read_sysreg_el2(SYS_ELR) + 4, SYS_ELR);
}
static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
{
bool handled;
handled = kvm_host_psci_handler(host_ctxt);
if (!handled)
default_host_smc_handler(host_ctxt);
/*
* Unlike HVC, the return address of an SMC is the instruction's PC.
* Move the return address past the instruction.
*/
skip_host_instruction();
}
void handle_trap(struct kvm_cpu_context *host_ctxt)
{
u64 esr = read_sysreg_el2(SYS_ESR);
unsigned long func_id;
if (ESR_ELx_EC(esr) != ESR_ELx_EC_HVC64)
switch (ESR_ELx_EC(esr)) {
case ESR_ELx_EC_HVC64:
handle_host_hcall(host_ctxt);
break;
case ESR_ELx_EC_SMC64:
handle_host_smc(host_ctxt);
break;
default:
hyp_panic();
func_id = host_ctxt->regs.regs[0];
handle_host_hcall(func_id, host_ctxt);
}
}

View file

@ -0,0 +1,40 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2020 - Google LLC
* Author: David Brazdil <dbrazdil@google.com>
*/
#include <asm/kvm_asm.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
/*
* nVHE copy of data structures tracking available CPU cores.
* Only entries for CPUs that were online at KVM init are populated.
* Other CPUs should not be allowed to boot because their features were
* not checked against the finalized system capabilities.
*/
u64 __ro_after_init __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
u64 cpu_logical_map(unsigned int cpu)
{
if (cpu >= ARRAY_SIZE(__cpu_logical_map))
hyp_panic();
return __cpu_logical_map[cpu];
}
unsigned long __hyp_per_cpu_offset(unsigned int cpu)
{
unsigned long *cpu_base_array;
unsigned long this_cpu_base;
unsigned long elf_base;
if (cpu >= ARRAY_SIZE(kvm_arm_hyp_percpu_base))
hyp_panic();
cpu_base_array = (unsigned long *)hyp_symbol_addr(kvm_arm_hyp_percpu_base);
this_cpu_base = kern_hyp_va(cpu_base_array[cpu]);
elf_base = (unsigned long)hyp_symbol_addr(__per_cpu_start);
return this_cpu_base - elf_base;
}

View file

@ -21,4 +21,5 @@ SECTIONS {
HYP_SECTION_NAME(.data..percpu) : {
PERCPU_INPUT(L1_CACHE_BYTES)
}
HYP_SECTION(.data..ro_after_init)
}

View file

@ -0,0 +1,324 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2020 - Google LLC
* Author: David Brazdil <dbrazdil@google.com>
*/
#include <asm/kvm_asm.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
#include <kvm/arm_hypercalls.h>
#include <linux/arm-smccc.h>
#include <linux/kvm_host.h>
#include <linux/psci.h>
#include <kvm/arm_psci.h>
#include <uapi/linux/psci.h>
#include <nvhe/trap_handler.h>
void kvm_hyp_cpu_entry(unsigned long r0);
void kvm_hyp_cpu_resume(unsigned long r0);
void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt);
/* Config options set by the host. */
__ro_after_init u32 kvm_host_psci_version;
__ro_after_init struct psci_0_1_function_ids kvm_host_psci_0_1_function_ids;
__ro_after_init s64 hyp_physvirt_offset;
#define __hyp_pa(x) ((phys_addr_t)((x)) + hyp_physvirt_offset)
#define INVALID_CPU_ID UINT_MAX
struct psci_boot_args {
atomic_t lock;
unsigned long pc;
unsigned long r0;
};
#define PSCI_BOOT_ARGS_UNLOCKED 0
#define PSCI_BOOT_ARGS_LOCKED 1
#define PSCI_BOOT_ARGS_INIT \
((struct psci_boot_args){ \
.lock = ATOMIC_INIT(PSCI_BOOT_ARGS_UNLOCKED), \
})
static DEFINE_PER_CPU(struct psci_boot_args, cpu_on_args) = PSCI_BOOT_ARGS_INIT;
static DEFINE_PER_CPU(struct psci_boot_args, suspend_args) = PSCI_BOOT_ARGS_INIT;
static u64 get_psci_func_id(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(u64, func_id, host_ctxt, 0);
return func_id;
}
static bool is_psci_0_1_call(u64 func_id)
{
return (func_id == kvm_host_psci_0_1_function_ids.cpu_suspend) ||
(func_id == kvm_host_psci_0_1_function_ids.cpu_on) ||
(func_id == kvm_host_psci_0_1_function_ids.cpu_off) ||
(func_id == kvm_host_psci_0_1_function_ids.migrate);
}
static bool is_psci_0_2_call(u64 func_id)
{
/* SMCCC reserves IDs 0x00-1F with the given 32/64-bit base for PSCI. */
return (PSCI_0_2_FN(0) <= func_id && func_id <= PSCI_0_2_FN(31)) ||
(PSCI_0_2_FN64(0) <= func_id && func_id <= PSCI_0_2_FN64(31));
}
static bool is_psci_call(u64 func_id)
{
switch (kvm_host_psci_version) {
case PSCI_VERSION(0, 1):
return is_psci_0_1_call(func_id);
default:
return is_psci_0_2_call(func_id);
}
}
static unsigned long psci_call(unsigned long fn, unsigned long arg0,
unsigned long arg1, unsigned long arg2)
{
struct arm_smccc_res res;
arm_smccc_1_1_smc(fn, arg0, arg1, arg2, &res);
return res.a0;
}
static unsigned long psci_forward(struct kvm_cpu_context *host_ctxt)
{
return psci_call(cpu_reg(host_ctxt, 0), cpu_reg(host_ctxt, 1),
cpu_reg(host_ctxt, 2), cpu_reg(host_ctxt, 3));
}
static __noreturn unsigned long psci_forward_noreturn(struct kvm_cpu_context *host_ctxt)
{
psci_forward(host_ctxt);
hyp_panic(); /* unreachable */
}
static unsigned int find_cpu_id(u64 mpidr)
{
unsigned int i;
/* Reject invalid MPIDRs */
if (mpidr & ~MPIDR_HWID_BITMASK)
return INVALID_CPU_ID;
for (i = 0; i < NR_CPUS; i++) {
if (cpu_logical_map(i) == mpidr)
return i;
}
return INVALID_CPU_ID;
}
static __always_inline bool try_acquire_boot_args(struct psci_boot_args *args)
{
return atomic_cmpxchg_acquire(&args->lock,
PSCI_BOOT_ARGS_UNLOCKED,
PSCI_BOOT_ARGS_LOCKED) ==
PSCI_BOOT_ARGS_UNLOCKED;
}
static __always_inline void release_boot_args(struct psci_boot_args *args)
{
atomic_set_release(&args->lock, PSCI_BOOT_ARGS_UNLOCKED);
}
static int psci_cpu_on(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(u64, mpidr, host_ctxt, 1);
DECLARE_REG(unsigned long, pc, host_ctxt, 2);
DECLARE_REG(unsigned long, r0, host_ctxt, 3);
unsigned int cpu_id;
struct psci_boot_args *boot_args;
struct kvm_nvhe_init_params *init_params;
int ret;
/*
* Find the logical CPU ID for the given MPIDR. The search set is
* the set of CPUs that were online at the point of KVM initialization.
* Booting other CPUs is rejected because their cpufeatures were not
* checked against the finalized capabilities. This could be relaxed
* by doing the feature checks in hyp.
*/
cpu_id = find_cpu_id(mpidr);
if (cpu_id == INVALID_CPU_ID)
return PSCI_RET_INVALID_PARAMS;
boot_args = per_cpu_ptr(hyp_symbol_addr(cpu_on_args), cpu_id);
init_params = per_cpu_ptr(hyp_symbol_addr(kvm_init_params), cpu_id);
/* Check if the target CPU is already being booted. */
if (!try_acquire_boot_args(boot_args))
return PSCI_RET_ALREADY_ON;
boot_args->pc = pc;
boot_args->r0 = r0;
wmb();
ret = psci_call(func_id, mpidr,
__hyp_pa(hyp_symbol_addr(kvm_hyp_cpu_entry)),
__hyp_pa(init_params));
/* If successful, the lock will be released by the target CPU. */
if (ret != PSCI_RET_SUCCESS)
release_boot_args(boot_args);
return ret;
}
static int psci_cpu_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(u64, power_state, host_ctxt, 1);
DECLARE_REG(unsigned long, pc, host_ctxt, 2);
DECLARE_REG(unsigned long, r0, host_ctxt, 3);
struct psci_boot_args *boot_args;
struct kvm_nvhe_init_params *init_params;
boot_args = this_cpu_ptr(hyp_symbol_addr(suspend_args));
init_params = this_cpu_ptr(hyp_symbol_addr(kvm_init_params));
/*
* No need to acquire a lock before writing to boot_args because a core
* can only suspend itself. Racy CPU_ON calls use a separate struct.
*/
boot_args->pc = pc;
boot_args->r0 = r0;
/*
* Will either return if shallow sleep state, or wake up into the entry
* point if it is a deep sleep state.
*/
return psci_call(func_id, power_state,
__hyp_pa(hyp_symbol_addr(kvm_hyp_cpu_resume)),
__hyp_pa(init_params));
}
static int psci_system_suspend(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(unsigned long, pc, host_ctxt, 1);
DECLARE_REG(unsigned long, r0, host_ctxt, 2);
struct psci_boot_args *boot_args;
struct kvm_nvhe_init_params *init_params;
boot_args = this_cpu_ptr(hyp_symbol_addr(suspend_args));
init_params = this_cpu_ptr(hyp_symbol_addr(kvm_init_params));
/*
* No need to acquire a lock before writing to boot_args because a core
* can only suspend itself. Racy CPU_ON calls use a separate struct.
*/
boot_args->pc = pc;
boot_args->r0 = r0;
/* Will only return on error. */
return psci_call(func_id,
__hyp_pa(hyp_symbol_addr(kvm_hyp_cpu_resume)),
__hyp_pa(init_params), 0);
}
asmlinkage void __noreturn kvm_host_psci_cpu_entry(bool is_cpu_on)
{
struct psci_boot_args *boot_args;
struct kvm_cpu_context *host_ctxt;
host_ctxt = &this_cpu_ptr(hyp_symbol_addr(kvm_host_data))->host_ctxt;
if (is_cpu_on)
boot_args = this_cpu_ptr(hyp_symbol_addr(cpu_on_args));
else
boot_args = this_cpu_ptr(hyp_symbol_addr(suspend_args));
cpu_reg(host_ctxt, 0) = boot_args->r0;
write_sysreg_el2(boot_args->pc, SYS_ELR);
if (is_cpu_on)
release_boot_args(boot_args);
__host_enter(host_ctxt);
}
static unsigned long psci_0_1_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
if ((func_id == kvm_host_psci_0_1_function_ids.cpu_off) ||
(func_id == kvm_host_psci_0_1_function_ids.migrate))
return psci_forward(host_ctxt);
else if (func_id == kvm_host_psci_0_1_function_ids.cpu_on)
return psci_cpu_on(func_id, host_ctxt);
else if (func_id == kvm_host_psci_0_1_function_ids.cpu_suspend)
return psci_cpu_suspend(func_id, host_ctxt);
else
return PSCI_RET_NOT_SUPPORTED;
}
static unsigned long psci_0_2_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
switch (func_id) {
case PSCI_0_2_FN_PSCI_VERSION:
case PSCI_0_2_FN_CPU_OFF:
case PSCI_0_2_FN64_AFFINITY_INFO:
case PSCI_0_2_FN64_MIGRATE:
case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
case PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU:
return psci_forward(host_ctxt);
case PSCI_0_2_FN_SYSTEM_OFF:
case PSCI_0_2_FN_SYSTEM_RESET:
psci_forward_noreturn(host_ctxt);
unreachable();
case PSCI_0_2_FN64_CPU_SUSPEND:
return psci_cpu_suspend(func_id, host_ctxt);
case PSCI_0_2_FN64_CPU_ON:
return psci_cpu_on(func_id, host_ctxt);
default:
return PSCI_RET_NOT_SUPPORTED;
}
}
static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_ctxt)
{
switch (func_id) {
case PSCI_1_0_FN_PSCI_FEATURES:
case PSCI_1_0_FN_SET_SUSPEND_MODE:
case PSCI_1_1_FN64_SYSTEM_RESET2:
return psci_forward(host_ctxt);
case PSCI_1_0_FN64_SYSTEM_SUSPEND:
return psci_system_suspend(func_id, host_ctxt);
default:
return psci_0_2_handler(func_id, host_ctxt);
}
}
bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt)
{
u64 func_id = get_psci_func_id(host_ctxt);
unsigned long ret;
if (!is_psci_call(func_id))
return false;
switch (kvm_host_psci_version) {
case PSCI_VERSION(0, 1):
ret = psci_0_1_handler(func_id, host_ctxt);
break;
case PSCI_VERSION(0, 2):
ret = psci_0_2_handler(func_id, host_ctxt);
break;
default:
ret = psci_1_0_handler(func_id, host_ctxt);
break;
}
cpu_reg(host_ctxt, 0) = ret;
cpu_reg(host_ctxt, 1) = 0;
cpu_reg(host_ctxt, 2) = 0;
cpu_reg(host_ctxt, 3) = 0;
return true;
}

View file

@ -4,6 +4,7 @@
* Author: Marc Zyngier <marc.zyngier@arm.com>
*/
#include <hyp/adjust_pc.h>
#include <hyp/switch.h>
#include <hyp/sysreg-sr.h>
@ -96,6 +97,9 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
mdcr_el2 |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT;
write_sysreg(mdcr_el2, mdcr_el2);
if (is_protected_kvm_enabled())
write_sysreg(HCR_HOST_NVHE_PROTECTED_FLAGS, hcr_el2);
else
write_sysreg(HCR_HOST_NVHE_FLAGS, hcr_el2);
write_sysreg(CPTR_EL2_DEFAULT, cptr_el2);
write_sysreg(__kvm_hyp_host_vector, vbar_el2);
@ -189,6 +193,8 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
__sysreg_save_state_nvhe(host_ctxt);
__adjust_pc(vcpu);
/*
* We must restore the 32-bit state before the sysregs, thanks
* to erratum #852523 (Cortex-A57) or #853709 (Cortex-A72).

View file

@ -33,14 +33,3 @@ void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
__sysreg_restore_user_state(ctxt);
__sysreg_restore_el2_return_state(ctxt);
}
void __kvm_enable_ssbs(void)
{
u64 tmp;
asm volatile(
"mrs %0, sctlr_el2\n"
"orr %0, %0, %1\n"
"msr sctlr_el2, %0"
: "=&r" (tmp) : "L" (SCTLR_ELx_DSSBS));
}

View file

@ -1,32 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright (C) 2015-2018 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
*/
#include <linux/arm-smccc.h>
#include <linux/linkage.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>
/*
* This is not executed directly and is instead copied into the vectors
* by install_bp_hardening_cb().
*/
.data
.pushsection .rodata
.global __smccc_workaround_1_smc
SYM_DATA_START(__smccc_workaround_1_smc)
esb
sub sp, sp, #(8 * 4)
stp x2, x3, [sp, #(8 * 0)]
stp x0, x1, [sp, #(8 * 2)]
mov w0, #ARM_SMCCC_ARCH_WORKAROUND_1
smc #0
ldp x2, x3, [sp, #(8 * 0)]
ldp x0, x1, [sp, #(8 * 2)]
add sp, sp, #(8 * 4)
1: .org __smccc_workaround_1_smc + __SMCCC_WORKAROUND_1_SMC_SZ
.org 1b
SYM_DATA_END(__smccc_workaround_1_smc)

View file

@ -4,6 +4,8 @@
* Author: Marc Zyngier <marc.zyngier@arm.com>
*/
#include <hyp/adjust_pc.h>
#include <linux/compiler.h>
#include <linux/irqchip/arm-gic.h>
#include <linux/kvm_host.h>

View file

@ -4,6 +4,8 @@
* Author: Marc Zyngier <marc.zyngier@arm.com>
*/
#include <hyp/adjust_pc.h>
#include <linux/compiler.h>
#include <linux/irqchip/arm-gic-v3.h>
#include <linux/kvm_host.h>

View file

@ -8,4 +8,4 @@ ccflags-y := -D__KVM_VHE_HYPERVISOR__
obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o
obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
../fpsimd.o ../hyp-entry.o
../fpsimd.o ../hyp-entry.o ../exception.o

View file

@ -4,6 +4,7 @@
* Author: Marc Zyngier <marc.zyngier@arm.com>
*/
#include <hyp/adjust_pc.h>
#include <hyp/switch.h>
#include <linux/arm-smccc.h>
@ -133,6 +134,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
__load_guest_stage2(vcpu->arch.hw_mmu);
__activate_traps(vcpu);
__adjust_pc(vcpu);
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);

View file

@ -14,119 +14,15 @@
#include <asm/kvm_emulate.h>
#include <asm/esr.h>
#define CURRENT_EL_SP_EL0_VECTOR 0x0
#define CURRENT_EL_SP_ELx_VECTOR 0x200
#define LOWER_EL_AArch64_VECTOR 0x400
#define LOWER_EL_AArch32_VECTOR 0x600
enum exception_type {
except_type_sync = 0,
except_type_irq = 0x80,
except_type_fiq = 0x100,
except_type_serror = 0x180,
};
/*
* This performs the exception entry at a given EL (@target_mode), stashing PC
* and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
* The EL passed to this function *must* be a non-secure, privileged mode with
* bit 0 being set (PSTATE.SP == 1).
*
* When an exception is taken, most PSTATE fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
* of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
* layouts, so we don't need to shuffle these for exceptions from AArch32 EL0.
*
* For the SPSR_ELx layout for AArch64, see ARM DDI 0487E.a page C5-429.
* For the SPSR_ELx layout for AArch32, see ARM DDI 0487E.a page C5-426.
*
* Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
* MSB to LSB.
*/
static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
enum exception_type type)
{
unsigned long sctlr, vbar, old, new, mode;
u64 exc_offset;
mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
if (mode == target_mode)
exc_offset = CURRENT_EL_SP_ELx_VECTOR;
else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
exc_offset = CURRENT_EL_SP_EL0_VECTOR;
else if (!(mode & PSR_MODE32_BIT))
exc_offset = LOWER_EL_AArch64_VECTOR;
else
exc_offset = LOWER_EL_AArch32_VECTOR;
switch (target_mode) {
case PSR_MODE_EL1h:
vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
break;
default:
/* Don't do that */
BUG();
}
*vcpu_pc(vcpu) = vbar + exc_offset + type;
old = *vcpu_cpsr(vcpu);
new = 0;
new |= (old & PSR_N_BIT);
new |= (old & PSR_Z_BIT);
new |= (old & PSR_C_BIT);
new |= (old & PSR_V_BIT);
// TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
new |= (old & PSR_DIT_BIT);
// PSTATE.UAO is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D5-2579.
// PSTATE.PAN is unchanged unless SCTLR_ELx.SPAN == 0b0
// SCTLR_ELx.SPAN is RES1 when ARMv8.1-PAN is not implemented
// See ARM DDI 0487E.a, page D5-2578.
new |= (old & PSR_PAN_BIT);
if (!(sctlr & SCTLR_EL1_SPAN))
new |= PSR_PAN_BIT;
// PSTATE.SS is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D2-2452.
// PSTATE.IL is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, page D1-2306.
// PSTATE.SSBS is set to SCTLR_ELx.DSSBS upon any exception to AArch64
// See ARM DDI 0487E.a, page D13-3258
if (sctlr & SCTLR_ELx_DSSBS)
new |= PSR_SSBS_BIT;
// PSTATE.BTYPE is set to zero upon any exception to AArch64
// See ARM DDI 0487E.a, pages D1-2293 to D1-2294.
new |= PSR_D_BIT;
new |= PSR_A_BIT;
new |= PSR_I_BIT;
new |= PSR_F_BIT;
new |= target_mode;
*vcpu_cpsr(vcpu) = new;
vcpu_write_spsr(vcpu, old);
}
static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
{
unsigned long cpsr = *vcpu_cpsr(vcpu);
bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
u32 esr = 0;
enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1 |
KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
KVM_ARM64_PENDING_EXCEPTION);
vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
@ -156,7 +52,9 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
{
u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1 |
KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
KVM_ARM64_PENDING_EXCEPTION);
/*
* Build an unknown exception, depending on the instruction
@ -168,6 +66,53 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
}
#define DFSR_FSC_EXTABT_LPAE 0x10
#define DFSR_FSC_EXTABT_nLPAE 0x08
#define DFSR_LPAE BIT(9)
#define TTBCR_EAE BIT(31)
static void inject_undef32(struct kvm_vcpu *vcpu)
{
vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_UND |
KVM_ARM64_PENDING_EXCEPTION);
}
/*
* Modelled after TakeDataAbortException() and TakePrefetchAbortException
* pseudocode.
*/
static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt, u32 addr)
{
u64 far;
u32 fsr;
/* Give the guest an IMPLEMENTATION DEFINED exception */
if (vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE) {
fsr = DFSR_LPAE | DFSR_FSC_EXTABT_LPAE;
} else {
/* no need to shuffle FS[4] into DFSR[10] as its 0 */
fsr = DFSR_FSC_EXTABT_nLPAE;
}
far = vcpu_read_sys_reg(vcpu, FAR_EL1);
if (is_pabt) {
vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_IABT |
KVM_ARM64_PENDING_EXCEPTION);
far &= GENMASK(31, 0);
far |= (u64)addr << 32;
vcpu_write_sys_reg(vcpu, fsr, IFSR32_EL2);
} else { /* !iabt */
vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA32_DABT |
KVM_ARM64_PENDING_EXCEPTION);
far &= GENMASK(63, 32);
far |= addr;
vcpu_write_sys_reg(vcpu, fsr, ESR_EL1);
}
vcpu_write_sys_reg(vcpu, far, FAR_EL1);
}
/**
* kvm_inject_dabt - inject a data abort into the guest
* @vcpu: The VCPU to receive the data abort
@ -179,7 +124,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
{
if (vcpu_el1_is_32bit(vcpu))
kvm_inject_dabt32(vcpu, addr);
inject_abt32(vcpu, false, addr);
else
inject_abt64(vcpu, false, addr);
}
@ -195,7 +140,7 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
{
if (vcpu_el1_is_32bit(vcpu))
kvm_inject_pabt32(vcpu, addr);
inject_abt32(vcpu, true, addr);
else
inject_abt64(vcpu, true, addr);
}
@ -210,7 +155,7 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
void kvm_inject_undefined(struct kvm_vcpu *vcpu)
{
if (vcpu_el1_is_32bit(vcpu))
kvm_inject_undef32(vcpu);
inject_undef32(vcpu);
else
inject_undef64(vcpu);
}

View file

@ -115,7 +115,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
* The MMIO instruction is emulated and should not be re-executed
* in the guest.
*/
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
kvm_incr_pc(vcpu);
return 0;
}

View file

@ -1023,7 +1023,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
* cautious, and skip the instruction.
*/
if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
kvm_incr_pc(vcpu);
ret = 1;
goto out_unlock;
}

View file

@ -384,7 +384,7 @@ static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
struct kvm_pmu *pmu = &vcpu->arch.pmu;
bool overflow;
if (!kvm_arm_pmu_v3_ready(vcpu))
if (!kvm_vcpu_has_pmu(vcpu))
return;
overflow = !!kvm_pmu_overflow_status(vcpu);
@ -825,9 +825,12 @@ bool kvm_arm_support_pmu_v3(void)
int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
{
if (!vcpu->arch.pmu.created)
if (!kvm_vcpu_has_pmu(vcpu))
return 0;
if (!vcpu->arch.pmu.created)
return -EINVAL;
/*
* A valid interrupt configuration for the PMU is either to have a
* properly configured interrupt number and using an in-kernel
@ -835,9 +838,6 @@ int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
*/
if (irqchip_in_kernel(vcpu->kvm)) {
int irq = vcpu->arch.pmu.irq_num;
if (!kvm_arm_pmu_irq_initialized(vcpu))
return -EINVAL;
/*
* If we are using an in-kernel vgic, at this point we know
* the vgic will be initialized, so we can check the PMU irq
@ -851,7 +851,6 @@ int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
}
kvm_pmu_vcpu_reset(vcpu);
vcpu->arch.pmu.ready = true;
return 0;
}
@ -913,8 +912,7 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
{
if (!kvm_arm_support_pmu_v3() ||
!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
if (!kvm_vcpu_has_pmu(vcpu))
return -ENODEV;
if (vcpu->arch.pmu.created)
@ -1015,7 +1013,7 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
if (!irqchip_in_kernel(vcpu->kvm))
return -EINVAL;
if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
if (!kvm_vcpu_has_pmu(vcpu))
return -ENODEV;
if (!kvm_arm_pmu_irq_initialized(vcpu))
@ -1035,8 +1033,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
case KVM_ARM_VCPU_PMU_V3_IRQ:
case KVM_ARM_VCPU_PMU_V3_INIT:
case KVM_ARM_VCPU_PMU_V3_FILTER:
if (kvm_arm_support_pmu_v3() &&
test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features))
if (kvm_vcpu_has_pmu(vcpu))
return 0;
}

View file

@ -53,7 +53,6 @@ gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu)
struct pvclock_vcpu_stolen_time init_values = {};
struct kvm *kvm = vcpu->kvm;
u64 base = vcpu->arch.steal.base;
int idx;
if (base == GPA_INVALID)
return base;
@ -63,10 +62,7 @@ gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu)
* the feature enabled.
*/
vcpu->arch.steal.last_steal = current->sched_info.run_delay;
idx = srcu_read_lock(&kvm->srcu);
kvm_write_guest(kvm, base, &init_values, sizeof(init_values));
srcu_read_unlock(&kvm->srcu, idx);
kvm_write_guest_lock(kvm, base, &init_values, sizeof(init_values));
return base;
}

View file

@ -1,224 +0,0 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
* Copyright (C) 2012,2013 - ARM Ltd
* Author: Marc Zyngier <marc.zyngier@arm.com>
*
* Derived from arch/arm/kvm/emulate.c:
* Copyright (C) 2012 - Virtual Open Systems and Columbia University
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
*/
#include <linux/mm.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/ptrace.h>
#define VCPU_NR_MODES 6
#define REG_OFFSET(_reg) \
(offsetof(struct user_pt_regs, _reg) / sizeof(unsigned long))
#define USR_REG_OFFSET(R) REG_OFFSET(compat_usr(R))
static const unsigned long vcpu_reg_offsets[VCPU_NR_MODES][16] = {
/* USR Registers */
{
USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
USR_REG_OFFSET(12), USR_REG_OFFSET(13), USR_REG_OFFSET(14),
REG_OFFSET(pc)
},
/* FIQ Registers */
{
USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
USR_REG_OFFSET(6), USR_REG_OFFSET(7),
REG_OFFSET(compat_r8_fiq), /* r8 */
REG_OFFSET(compat_r9_fiq), /* r9 */
REG_OFFSET(compat_r10_fiq), /* r10 */
REG_OFFSET(compat_r11_fiq), /* r11 */
REG_OFFSET(compat_r12_fiq), /* r12 */
REG_OFFSET(compat_sp_fiq), /* r13 */
REG_OFFSET(compat_lr_fiq), /* r14 */
REG_OFFSET(pc)
},
/* IRQ Registers */
{
USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
USR_REG_OFFSET(12),
REG_OFFSET(compat_sp_irq), /* r13 */
REG_OFFSET(compat_lr_irq), /* r14 */
REG_OFFSET(pc)
},
/* SVC Registers */
{
USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
USR_REG_OFFSET(12),
REG_OFFSET(compat_sp_svc), /* r13 */
REG_OFFSET(compat_lr_svc), /* r14 */
REG_OFFSET(pc)
},
/* ABT Registers */
{
USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
USR_REG_OFFSET(12),
REG_OFFSET(compat_sp_abt), /* r13 */
REG_OFFSET(compat_lr_abt), /* r14 */
REG_OFFSET(pc)
},
/* UND Registers */
{
USR_REG_OFFSET(0), USR_REG_OFFSET(1), USR_REG_OFFSET(2),
USR_REG_OFFSET(3), USR_REG_OFFSET(4), USR_REG_OFFSET(5),
USR_REG_OFFSET(6), USR_REG_OFFSET(7), USR_REG_OFFSET(8),
USR_REG_OFFSET(9), USR_REG_OFFSET(10), USR_REG_OFFSET(11),
USR_REG_OFFSET(12),
REG_OFFSET(compat_sp_und), /* r13 */
REG_OFFSET(compat_lr_und), /* r14 */
REG_OFFSET(pc)
},
};
/*
* Return a pointer to the register number valid in the current mode of
* the virtual CPU.
*/
unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
{
unsigned long *reg_array = (unsigned long *)&vcpu->arch.ctxt.regs;
unsigned long mode = *vcpu_cpsr(vcpu) & PSR_AA32_MODE_MASK;
switch (mode) {
case PSR_AA32_MODE_USR ... PSR_AA32_MODE_SVC:
mode &= ~PSR_MODE32_BIT; /* 0 ... 3 */
break;
case PSR_AA32_MODE_ABT:
mode = 4;
break;
case PSR_AA32_MODE_UND:
mode = 5;
break;
case PSR_AA32_MODE_SYS:
mode = 0; /* SYS maps to USR */
break;
default:
BUG();
}
return reg_array + vcpu_reg_offsets[mode][reg_num];
}
/*
* Return the SPSR for the current mode of the virtual CPU.
*/
static int vcpu_spsr32_mode(const struct kvm_vcpu *vcpu)
{
unsigned long mode = *vcpu_cpsr(vcpu) & PSR_AA32_MODE_MASK;
switch (mode) {
case PSR_AA32_MODE_SVC: return KVM_SPSR_SVC;
case PSR_AA32_MODE_ABT: return KVM_SPSR_ABT;
case PSR_AA32_MODE_UND: return KVM_SPSR_UND;
case PSR_AA32_MODE_IRQ: return KVM_SPSR_IRQ;
case PSR_AA32_MODE_FIQ: return KVM_SPSR_FIQ;
default: BUG();
}
}
unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
{
int spsr_idx = vcpu_spsr32_mode(vcpu);
if (!vcpu->arch.sysregs_loaded_on_cpu) {
switch (spsr_idx) {
case KVM_SPSR_SVC:
return __vcpu_sys_reg(vcpu, SPSR_EL1);
case KVM_SPSR_ABT:
return vcpu->arch.ctxt.spsr_abt;
case KVM_SPSR_UND:
return vcpu->arch.ctxt.spsr_und;
case KVM_SPSR_IRQ:
return vcpu->arch.ctxt.spsr_irq;
case KVM_SPSR_FIQ:
return vcpu->arch.ctxt.spsr_fiq;
}
}
switch (spsr_idx) {
case KVM_SPSR_SVC:
return read_sysreg_el1(SYS_SPSR);
case KVM_SPSR_ABT:
return read_sysreg(spsr_abt);
case KVM_SPSR_UND:
return read_sysreg(spsr_und);
case KVM_SPSR_IRQ:
return read_sysreg(spsr_irq);
case KVM_SPSR_FIQ:
return read_sysreg(spsr_fiq);
default:
BUG();
}
}
void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
{
int spsr_idx = vcpu_spsr32_mode(vcpu);
if (!vcpu->arch.sysregs_loaded_on_cpu) {
switch (spsr_idx) {
case KVM_SPSR_SVC:
__vcpu_sys_reg(vcpu, SPSR_EL1) = v;
break;
case KVM_SPSR_ABT:
vcpu->arch.ctxt.spsr_abt = v;
break;
case KVM_SPSR_UND:
vcpu->arch.ctxt.spsr_und = v;
break;
case KVM_SPSR_IRQ:
vcpu->arch.ctxt.spsr_irq = v;
break;
case KVM_SPSR_FIQ:
vcpu->arch.ctxt.spsr_fiq = v;
break;
}
return;
}
switch (spsr_idx) {
case KVM_SPSR_SVC:
write_sysreg_el1(v, SYS_SPSR);
break;
case KVM_SPSR_ABT:
write_sysreg(v, spsr_abt);
break;
case KVM_SPSR_UND:
write_sysreg(v, spsr_und);
break;
case KVM_SPSR_IRQ:
write_sysreg(v, spsr_irq);
break;
case KVM_SPSR_FIQ:
write_sysreg(v, spsr_fiq);
break;
}
}

View file

@ -25,7 +25,6 @@
#include <asm/ptrace.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_coproc.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/virt.h>
@ -42,58 +41,6 @@ static u32 kvm_ipa_limit;
#define VCPU_RESET_PSTATE_SVC (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
PSR_AA32_I_BIT | PSR_AA32_F_BIT)
static bool system_has_full_ptr_auth(void)
{
return system_supports_address_auth() && system_supports_generic_auth();
}
/**
* kvm_arch_vm_ioctl_check_extension
*
* We currently assume that the number of HW registers is uniform
* across all CPUs (see cpuinfo_sanity_check).
*/
int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
{
int r;
switch (ext) {
case KVM_CAP_ARM_EL1_32BIT:
r = cpus_have_const_cap(ARM64_HAS_32BIT_EL1);
break;
case KVM_CAP_GUEST_DEBUG_HW_BPS:
r = get_num_brps();
break;
case KVM_CAP_GUEST_DEBUG_HW_WPS:
r = get_num_wrps();
break;
case KVM_CAP_ARM_PMU_V3:
r = kvm_arm_support_pmu_v3();
break;
case KVM_CAP_ARM_INJECT_SERROR_ESR:
r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
break;
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
r = 1;
break;
case KVM_CAP_ARM_VM_IPA_SIZE:
r = kvm_ipa_limit;
break;
case KVM_CAP_ARM_SVE:
r = system_supports_sve();
break;
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
r = system_has_full_ptr_auth();
break;
default:
r = 0;
}
return r;
}
unsigned int kvm_sve_max_vl;
int kvm_arm_init_sve(void)
@ -286,6 +233,10 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
pstate = VCPU_RESET_PSTATE_EL1;
}
if (kvm_vcpu_has_pmu(vcpu) && !kvm_arm_support_pmu_v3()) {
ret = -EINVAL;
goto out;
}
break;
}

View file

@ -20,7 +20,6 @@
#include <asm/debug-monitors.h>
#include <asm/esr.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_coproc.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
@ -64,87 +63,6 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
return false;
}
static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
{
/*
* System registers listed in the switch are not saved on every
* exit from the guest but are only saved on vcpu_put.
*
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the guest cannot modify its
* own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
* thread when emulating cross-VCPU communication.
*/
switch (reg) {
case CSSELR_EL1: *val = read_sysreg_s(SYS_CSSELR_EL1); break;
case SCTLR_EL1: *val = read_sysreg_s(SYS_SCTLR_EL12); break;
case CPACR_EL1: *val = read_sysreg_s(SYS_CPACR_EL12); break;
case TTBR0_EL1: *val = read_sysreg_s(SYS_TTBR0_EL12); break;
case TTBR1_EL1: *val = read_sysreg_s(SYS_TTBR1_EL12); break;
case TCR_EL1: *val = read_sysreg_s(SYS_TCR_EL12); break;
case ESR_EL1: *val = read_sysreg_s(SYS_ESR_EL12); break;
case AFSR0_EL1: *val = read_sysreg_s(SYS_AFSR0_EL12); break;
case AFSR1_EL1: *val = read_sysreg_s(SYS_AFSR1_EL12); break;
case FAR_EL1: *val = read_sysreg_s(SYS_FAR_EL12); break;
case MAIR_EL1: *val = read_sysreg_s(SYS_MAIR_EL12); break;
case VBAR_EL1: *val = read_sysreg_s(SYS_VBAR_EL12); break;
case CONTEXTIDR_EL1: *val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
case TPIDR_EL0: *val = read_sysreg_s(SYS_TPIDR_EL0); break;
case TPIDRRO_EL0: *val = read_sysreg_s(SYS_TPIDRRO_EL0); break;
case TPIDR_EL1: *val = read_sysreg_s(SYS_TPIDR_EL1); break;
case AMAIR_EL1: *val = read_sysreg_s(SYS_AMAIR_EL12); break;
case CNTKCTL_EL1: *val = read_sysreg_s(SYS_CNTKCTL_EL12); break;
case ELR_EL1: *val = read_sysreg_s(SYS_ELR_EL12); break;
case PAR_EL1: *val = read_sysreg_par(); break;
case DACR32_EL2: *val = read_sysreg_s(SYS_DACR32_EL2); break;
case IFSR32_EL2: *val = read_sysreg_s(SYS_IFSR32_EL2); break;
case DBGVCR32_EL2: *val = read_sysreg_s(SYS_DBGVCR32_EL2); break;
default: return false;
}
return true;
}
static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
{
/*
* System registers listed in the switch are not restored on every
* entry to the guest but are only restored on vcpu_load.
*
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the MPIDR should only be set
* once, before running the VCPU, and never changed later.
*/
switch (reg) {
case CSSELR_EL1: write_sysreg_s(val, SYS_CSSELR_EL1); break;
case SCTLR_EL1: write_sysreg_s(val, SYS_SCTLR_EL12); break;
case CPACR_EL1: write_sysreg_s(val, SYS_CPACR_EL12); break;
case TTBR0_EL1: write_sysreg_s(val, SYS_TTBR0_EL12); break;
case TTBR1_EL1: write_sysreg_s(val, SYS_TTBR1_EL12); break;
case TCR_EL1: write_sysreg_s(val, SYS_TCR_EL12); break;
case ESR_EL1: write_sysreg_s(val, SYS_ESR_EL12); break;
case AFSR0_EL1: write_sysreg_s(val, SYS_AFSR0_EL12); break;
case AFSR1_EL1: write_sysreg_s(val, SYS_AFSR1_EL12); break;
case FAR_EL1: write_sysreg_s(val, SYS_FAR_EL12); break;
case MAIR_EL1: write_sysreg_s(val, SYS_MAIR_EL12); break;
case VBAR_EL1: write_sysreg_s(val, SYS_VBAR_EL12); break;
case CONTEXTIDR_EL1: write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
case TPIDR_EL0: write_sysreg_s(val, SYS_TPIDR_EL0); break;
case TPIDRRO_EL0: write_sysreg_s(val, SYS_TPIDRRO_EL0); break;
case TPIDR_EL1: write_sysreg_s(val, SYS_TPIDR_EL1); break;
case AMAIR_EL1: write_sysreg_s(val, SYS_AMAIR_EL12); break;
case CNTKCTL_EL1: write_sysreg_s(val, SYS_CNTKCTL_EL12); break;
case ELR_EL1: write_sysreg_s(val, SYS_ELR_EL12); break;
case PAR_EL1: write_sysreg_s(val, SYS_PAR_EL1); break;
case DACR32_EL2: write_sysreg_s(val, SYS_DACR32_EL2); break;
case IFSR32_EL2: write_sysreg_s(val, SYS_IFSR32_EL2); break;
case DBGVCR32_EL2: write_sysreg_s(val, SYS_DBGVCR32_EL2); break;
default: return false;
}
return true;
}
u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
{
u64 val = 0x8badf00d8badf00d;
@ -169,7 +87,7 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
static u32 cache_levels;
/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
#define CSSELR_MAX 12
#define CSSELR_MAX 14
/* Which cache CCSIDR represents depends on CSSELR value. */
static u32 get_ccsidr(u32 csselr)
@ -209,6 +127,24 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
return true;
}
static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
{
switch (r->aarch32_map) {
case AA32_LO:
*mask = GENMASK_ULL(31, 0);
*shift = 0;
break;
case AA32_HI:
*mask = GENMASK_ULL(63, 32);
*shift = 32;
break;
default:
*mask = GENMASK_ULL(63, 0);
*shift = 0;
break;
}
}
/*
* Generic accessor for VM registers. Only called as long as HCR_TVM
* is set. If the guest enables the MMU, we stop trapping the VM
@ -219,26 +155,21 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
bool was_enabled = vcpu_has_cache_enabled(vcpu);
u64 val;
int reg = r->reg;
u64 val, mask, shift;
BUG_ON(!p->is_write);
/* See the 32bit mapping in kvm_host.h */
if (p->is_aarch32)
reg = r->reg / 2;
get_access_mask(r, &mask, &shift);
if (!p->is_aarch32 || !p->is_32bit) {
val = p->regval;
if (~mask) {
val = vcpu_read_sys_reg(vcpu, r->reg);
val &= ~mask;
} else {
val = vcpu_read_sys_reg(vcpu, reg);
if (r->reg % 2)
val = (p->regval << 32) | (u64)lower_32_bits(val);
else
val = ((u64)upper_32_bits(val) << 32) |
lower_32_bits(p->regval);
val = 0;
}
vcpu_write_sys_reg(vcpu, val, reg);
val |= (p->regval & (mask >> shift)) << shift;
vcpu_write_sys_reg(vcpu, val, r->reg);
kvm_toggle_cache(vcpu, was_enabled);
return true;
@ -248,17 +179,13 @@ static bool access_actlr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
u64 mask, shift;
if (p->is_write)
return ignore_write(vcpu, p);
p->regval = vcpu_read_sys_reg(vcpu, ACTLR_EL1);
if (p->is_aarch32) {
if (r->Op2 & 2)
p->regval = upper_32_bits(p->regval);
else
p->regval = lower_32_bits(p->regval);
}
get_access_mask(r, &mask, &shift);
p->regval = (vcpu_read_sys_reg(vcpu, r->reg) & mask) >> shift;
return true;
}
@ -285,7 +212,7 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
* equivalent to ICC_SGI0R_EL1, as there is no "alternative" secure
* group.
*/
if (p->is_aarch32) {
if (p->Op0 == 0) { /* AArch32 */
switch (p->Op1) {
default: /* Keep GCC quiet */
case 0: /* ICC_SGI1R */
@ -296,7 +223,7 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
g1 = false;
break;
}
} else {
} else { /* AArch64 */
switch (p->Op2) {
default: /* Keep GCC quiet */
case 5: /* ICC_SGI1R_EL1 */
@ -438,26 +365,30 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
*/
static void reg_to_dbg(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *rd,
u64 *dbg_reg)
{
u64 val = p->regval;
u64 mask, shift, val;
if (p->is_32bit) {
val &= 0xffffffffUL;
val |= ((*dbg_reg >> 32) << 32);
}
get_access_mask(rd, &mask, &shift);
val = *dbg_reg;
val &= ~mask;
val |= (p->regval & (mask >> shift)) << shift;
*dbg_reg = val;
vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
}
static void dbg_to_reg(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *rd,
u64 *dbg_reg)
{
p->regval = *dbg_reg;
if (p->is_32bit)
p->regval &= 0xffffffffUL;
u64 mask, shift;
get_access_mask(rd, &mask, &shift);
p->regval = (*dbg_reg & mask) >> shift;
}
static bool trap_bvr(struct kvm_vcpu *vcpu,
@ -467,9 +398,9 @@ static bool trap_bvr(struct kvm_vcpu *vcpu,
u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->reg];
if (p->is_write)
reg_to_dbg(vcpu, p, dbg_reg);
reg_to_dbg(vcpu, p, rd, dbg_reg);
else
dbg_to_reg(vcpu, p, dbg_reg);
dbg_to_reg(vcpu, p, rd, dbg_reg);
trace_trap_reg(__func__, rd->reg, p->is_write, *dbg_reg);
@ -509,9 +440,9 @@ static bool trap_bcr(struct kvm_vcpu *vcpu,
u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_bcr[rd->reg];
if (p->is_write)
reg_to_dbg(vcpu, p, dbg_reg);
reg_to_dbg(vcpu, p, rd, dbg_reg);
else
dbg_to_reg(vcpu, p, dbg_reg);
dbg_to_reg(vcpu, p, rd, dbg_reg);
trace_trap_reg(__func__, rd->reg, p->is_write, *dbg_reg);
@ -552,9 +483,9 @@ static bool trap_wvr(struct kvm_vcpu *vcpu,
u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg];
if (p->is_write)
reg_to_dbg(vcpu, p, dbg_reg);
reg_to_dbg(vcpu, p, rd, dbg_reg);
else
dbg_to_reg(vcpu, p, dbg_reg);
dbg_to_reg(vcpu, p, rd, dbg_reg);
trace_trap_reg(__func__, rd->reg, p->is_write,
vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg]);
@ -595,9 +526,9 @@ static bool trap_wcr(struct kvm_vcpu *vcpu,
u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_wcr[rd->reg];
if (p->is_write)
reg_to_dbg(vcpu, p, dbg_reg);
reg_to_dbg(vcpu, p, rd, dbg_reg);
else
dbg_to_reg(vcpu, p, dbg_reg);
dbg_to_reg(vcpu, p, rd, dbg_reg);
trace_trap_reg(__func__, rd->reg, p->is_write, *dbg_reg);
@ -678,8 +609,9 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
static bool check_pmu_access_disabled(struct kvm_vcpu *vcpu, u64 flags)
{
u64 reg = __vcpu_sys_reg(vcpu, PMUSERENR_EL0);
bool enabled = (reg & flags) || vcpu_mode_priv(vcpu);
bool enabled = kvm_vcpu_has_pmu(vcpu);
enabled &= (reg & flags) || vcpu_mode_priv(vcpu);
if (!enabled)
kvm_inject_undefined(vcpu);
@ -711,9 +643,6 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
u64 val;
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (pmu_access_el0_disabled(vcpu))
return false;
@ -740,9 +669,6 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (pmu_access_event_counter_el0_disabled(vcpu))
return false;
@ -761,9 +687,6 @@ static bool access_pmceid(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
u64 pmceid;
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
BUG_ON(p->is_write);
if (pmu_access_el0_disabled(vcpu))
@ -794,10 +717,7 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
u64 idx;
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
u64 idx = ~0UL;
if (r->CRn == 9 && r->CRm == 13) {
if (r->Op2 == 2) {
@ -813,8 +733,6 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
return false;
idx = ARMV8_PMU_CYCLE_IDX;
} else {
return false;
}
} else if (r->CRn == 0 && r->CRm == 9) {
/* PMCCNTR */
@ -828,10 +746,11 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
return false;
idx = ((r->CRm & 3) << 3) | (r->Op2 & 7);
} else {
return false;
}
/* Catch any decoding mistake */
WARN_ON(idx == ~0UL);
if (!pmu_counter_idx_valid(vcpu, idx))
return false;
@ -852,9 +771,6 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
u64 idx, reg;
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (pmu_access_el0_disabled(vcpu))
return false;
@ -892,9 +808,6 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
u64 val, mask;
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (pmu_access_el0_disabled(vcpu))
return false;
@ -923,13 +836,8 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
u64 mask = kvm_pmu_valid_counter_mask(vcpu);
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (!vcpu_mode_priv(vcpu)) {
kvm_inject_undefined(vcpu);
if (check_pmu_access_disabled(vcpu, 0))
return false;
}
if (p->is_write) {
u64 val = p->regval & mask;
@ -952,9 +860,6 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
u64 mask = kvm_pmu_valid_counter_mask(vcpu);
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (pmu_access_el0_disabled(vcpu))
return false;
@ -977,9 +882,6 @@ static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
u64 mask;
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (!p->is_write)
return read_from_write_only(vcpu, p, r);
@ -994,8 +896,10 @@ static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
if (!kvm_arm_pmu_v3_ready(vcpu))
return trap_raz_wi(vcpu, p, r);
if (!kvm_vcpu_has_pmu(vcpu)) {
kvm_inject_undefined(vcpu);
return false;
}
if (p->is_write) {
if (!vcpu_mode_priv(vcpu)) {
@ -1122,6 +1026,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
val &= ~(0xfUL << ID_AA64PFR0_AMU_SHIFT);
val &= ~(0xfUL << ID_AA64PFR0_CSV2_SHIFT);
val |= ((u64)vcpu->kvm->arch.pfr0_csv2 << ID_AA64PFR0_CSV2_SHIFT);
val &= ~(0xfUL << ID_AA64PFR0_CSV3_SHIFT);
val |= ((u64)vcpu->kvm->arch.pfr0_csv3 << ID_AA64PFR0_CSV3_SHIFT);
} else if (id == SYS_ID_AA64PFR1_EL1) {
val &= ~(0xfUL << ID_AA64PFR1_MTE_SHIFT);
} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
@ -1130,10 +1036,15 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
(0xfUL << ID_AA64ISAR1_GPA_SHIFT) |
(0xfUL << ID_AA64ISAR1_GPI_SHIFT));
} else if (id == SYS_ID_AA64DFR0_EL1) {
u64 cap = 0;
/* Limit guests to PMUv3 for ARMv8.1 */
if (kvm_vcpu_has_pmu(vcpu))
cap = ID_AA64DFR0_PMUVER_8_1;
val = cpuid_feature_cap_perfmon_field(val,
ID_AA64DFR0_PMUVER_SHIFT,
ID_AA64DFR0_PMUVER_8_1);
cap);
} else if (id == SYS_ID_DFR0_EL1) {
/* Limit guests to PMUv3 for ARMv8.1 */
val = cpuid_feature_cap_perfmon_field(val,
@ -1209,9 +1120,9 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
const struct kvm_one_reg *reg, void __user *uaddr)
{
const u64 id = sys_reg_to_index(rd);
u8 csv2, csv3;
int err;
u64 val;
u8 csv2;
err = reg_from_user(&val, uaddr, id);
if (err)
@ -1227,13 +1138,21 @@ static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu,
(csv2 && arm64_get_spectre_v2_state() != SPECTRE_UNAFFECTED))
return -EINVAL;
/* We can only differ with CSV2, and anything else is an error */
/* Same thing for CSV3 */
csv3 = cpuid_feature_extract_unsigned_field(val, ID_AA64PFR0_CSV3_SHIFT);
if (csv3 > 1 ||
(csv3 && arm64_get_meltdown_state() != SPECTRE_UNAFFECTED))
return -EINVAL;
/* We can only differ with CSV[23], and anything else is an error */
val ^= read_id_reg(vcpu, rd, false);
val &= ~(0xFUL << ID_AA64PFR0_CSV2_SHIFT);
val &= ~((0xFUL << ID_AA64PFR0_CSV2_SHIFT) |
(0xFUL << ID_AA64PFR0_CSV3_SHIFT));
if (val)
return -EINVAL;
vcpu->kvm->arch.pfr0_csv2 = csv2;
vcpu->kvm->arch.pfr0_csv3 = csv3 ;
return 0;
}
@ -1327,10 +1246,6 @@ static bool access_csselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
{
int reg = r->reg;
/* See the 32bit mapping in kvm_host.h */
if (p->is_aarch32)
reg = r->reg / 2;
if (p->is_write)
vcpu_write_sys_reg(vcpu, p->regval, reg);
else
@ -1801,57 +1716,18 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu,
}
}
static bool trap_debug32(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
if (p->is_write) {
vcpu_cp14(vcpu, r->reg) = p->regval;
vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
} else {
p->regval = vcpu_cp14(vcpu, r->reg);
}
return true;
}
/* AArch32 debug register mappings
/*
* AArch32 debug register mappings
*
* AArch32 DBGBVRn is mapped to DBGBVRn_EL1[31:0]
* AArch32 DBGBXVRn is mapped to DBGBVRn_EL1[63:32]
*
* All control registers and watchpoint value registers are mapped to
* the lower 32 bits of their AArch64 equivalents. We share the trap
* handlers with the above AArch64 code which checks what mode the
* system is in.
* None of the other registers share their location, so treat them as
* if they were 64bit.
*/
static bool trap_xvr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *rd)
{
u64 *dbg_reg = &vcpu->arch.vcpu_debug_state.dbg_bvr[rd->reg];
if (p->is_write) {
u64 val = *dbg_reg;
val &= 0xffffffffUL;
val |= p->regval << 32;
*dbg_reg = val;
vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
} else {
p->regval = *dbg_reg >> 32;
}
trace_trap_reg(__func__, rd->reg, p->is_write, *dbg_reg);
return true;
}
#define DBG_BCR_BVR_WCR_WVR(n) \
/* DBGBVRn */ \
{ Op1( 0), CRn( 0), CRm((n)), Op2( 4), trap_bvr, NULL, n }, \
{ AA32(LO), Op1( 0), CRn( 0), CRm((n)), Op2( 4), trap_bvr, NULL, n }, \
/* DBGBCRn */ \
{ Op1( 0), CRn( 0), CRm((n)), Op2( 5), trap_bcr, NULL, n }, \
/* DBGWVRn */ \
@ -1860,7 +1736,7 @@ static bool trap_xvr(struct kvm_vcpu *vcpu,
{ Op1( 0), CRn( 0), CRm((n)), Op2( 7), trap_wcr, NULL, n }
#define DBGBXVR(n) \
{ Op1( 0), CRn( 1), CRm((n)), Op2( 1), trap_xvr, NULL, n }
{ AA32(HI), Op1( 0), CRn( 1), CRm((n)), Op2( 1), trap_bvr, NULL, n }
/*
* Trapped cp14 registers. We generally ignore most of the external
@ -1878,9 +1754,9 @@ static const struct sys_reg_desc cp14_regs[] = {
{ Op1( 0), CRn( 0), CRm( 1), Op2( 0), trap_raz_wi },
DBG_BCR_BVR_WCR_WVR(1),
/* DBGDCCINT */
{ Op1( 0), CRn( 0), CRm( 2), Op2( 0), trap_debug32, NULL, cp14_DBGDCCINT },
{ Op1( 0), CRn( 0), CRm( 2), Op2( 0), trap_debug_regs, NULL, MDCCINT_EL1 },
/* DBGDSCRext */
{ Op1( 0), CRn( 0), CRm( 2), Op2( 2), trap_debug32, NULL, cp14_DBGDSCRext },
{ Op1( 0), CRn( 0), CRm( 2), Op2( 2), trap_debug_regs, NULL, MDSCR_EL1 },
DBG_BCR_BVR_WCR_WVR(2),
/* DBGDTR[RT]Xint */
{ Op1( 0), CRn( 0), CRm( 3), Op2( 0), trap_raz_wi },
@ -1895,7 +1771,7 @@ static const struct sys_reg_desc cp14_regs[] = {
{ Op1( 0), CRn( 0), CRm( 6), Op2( 2), trap_raz_wi },
DBG_BCR_BVR_WCR_WVR(6),
/* DBGVCR */
{ Op1( 0), CRn( 0), CRm( 7), Op2( 0), trap_debug32, NULL, cp14_DBGVCR },
{ Op1( 0), CRn( 0), CRm( 7), Op2( 0), trap_debug_regs, NULL, DBGVCR32_EL2 },
DBG_BCR_BVR_WCR_WVR(7),
DBG_BCR_BVR_WCR_WVR(8),
DBG_BCR_BVR_WCR_WVR(9),
@ -1981,19 +1857,29 @@ static const struct sys_reg_desc cp14_64_regs[] = {
*/
static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 0), CRm( 0), Op2( 1), access_ctr },
{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_vm_reg, NULL, c1_SCTLR },
{ Op1( 0), CRn( 1), CRm( 0), Op2( 1), access_actlr },
{ Op1( 0), CRn( 1), CRm( 0), Op2( 3), access_actlr },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 2), access_vm_reg, NULL, c2_TTBCR },
{ Op1( 0), CRn( 3), CRm( 0), Op2( 0), access_vm_reg, NULL, c3_DACR },
{ Op1( 0), CRn( 5), CRm( 0), Op2( 0), access_vm_reg, NULL, c5_DFSR },
{ Op1( 0), CRn( 5), CRm( 0), Op2( 1), access_vm_reg, NULL, c5_IFSR },
{ Op1( 0), CRn( 5), CRm( 1), Op2( 0), access_vm_reg, NULL, c5_ADFSR },
{ Op1( 0), CRn( 5), CRm( 1), Op2( 1), access_vm_reg, NULL, c5_AIFSR },
{ Op1( 0), CRn( 6), CRm( 0), Op2( 0), access_vm_reg, NULL, c6_DFAR },
{ Op1( 0), CRn( 6), CRm( 0), Op2( 2), access_vm_reg, NULL, c6_IFAR },
{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_vm_reg, NULL, SCTLR_EL1 },
/* ACTLR */
{ AA32(LO), Op1( 0), CRn( 1), CRm( 0), Op2( 1), access_actlr, NULL, ACTLR_EL1 },
/* ACTLR2 */
{ AA32(HI), Op1( 0), CRn( 1), CRm( 0), Op2( 3), access_actlr, NULL, ACTLR_EL1 },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, TTBR0_EL1 },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, TTBR1_EL1 },
/* TTBCR */
{ AA32(LO), Op1( 0), CRn( 2), CRm( 0), Op2( 2), access_vm_reg, NULL, TCR_EL1 },
/* TTBCR2 */
{ AA32(HI), Op1( 0), CRn( 2), CRm( 0), Op2( 3), access_vm_reg, NULL, TCR_EL1 },
{ Op1( 0), CRn( 3), CRm( 0), Op2( 0), access_vm_reg, NULL, DACR32_EL2 },
/* DFSR */
{ Op1( 0), CRn( 5), CRm( 0), Op2( 0), access_vm_reg, NULL, ESR_EL1 },
{ Op1( 0), CRn( 5), CRm( 0), Op2( 1), access_vm_reg, NULL, IFSR32_EL2 },
/* ADFSR */
{ Op1( 0), CRn( 5), CRm( 1), Op2( 0), access_vm_reg, NULL, AFSR0_EL1 },
/* AIFSR */
{ Op1( 0), CRn( 5), CRm( 1), Op2( 1), access_vm_reg, NULL, AFSR1_EL1 },
/* DFAR */
{ AA32(LO), Op1( 0), CRn( 6), CRm( 0), Op2( 0), access_vm_reg, NULL, FAR_EL1 },
/* IFAR */
{ AA32(HI), Op1( 0), CRn( 6), CRm( 0), Op2( 2), access_vm_reg, NULL, FAR_EL1 },
/*
* DC{C,I,CI}SW operations:
@ -2019,15 +1905,19 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pminten },
{ Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmovs },
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
{ Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, c10_AMAIR0 },
{ Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, c10_AMAIR1 },
/* PRRR/MAIR0 */
{ AA32(LO), Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, MAIR_EL1 },
/* NMRR/MAIR1 */
{ AA32(HI), Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, MAIR_EL1 },
/* AMAIR0 */
{ AA32(LO), Op1( 0), CRn(10), CRm( 3), Op2( 0), access_vm_reg, NULL, AMAIR_EL1 },
/* AMAIR1 */
{ AA32(HI), Op1( 0), CRn(10), CRm( 3), Op2( 1), access_vm_reg, NULL, AMAIR_EL1 },
/* ICC_SRE */
{ Op1( 0), CRn(12), CRm(12), Op2( 5), access_gic_sre },
{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID },
{ Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, CONTEXTIDR_EL1 },
/* Arch Tmers */
{ SYS_DESC(SYS_AARCH32_CNTP_TVAL), access_arch_timer },
@ -2102,14 +1992,14 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1(1), CRn( 0), CRm( 0), Op2(0), access_ccsidr },
{ Op1(1), CRn( 0), CRm( 0), Op2(1), access_clidr },
{ Op1(2), CRn( 0), CRm( 0), Op2(0), access_csselr, NULL, c0_CSSELR },
{ Op1(2), CRn( 0), CRm( 0), Op2(0), access_csselr, NULL, CSSELR_EL1 },
};
static const struct sys_reg_desc cp15_64_regs[] = {
{ Op1( 0), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
{ Op1( 0), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR0_EL1 },
{ Op1( 0), CRn( 0), CRm( 9), Op2( 0), access_pmu_evcntr },
{ Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_SGI1R */
{ Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, c2_TTBR1 },
{ Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, TTBR1_EL1 },
{ Op1( 1), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_ASGI1R */
{ Op1( 2), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, /* ICC_SGI0R */
{ SYS_DESC(SYS_AARCH32_CNTP_CVAL), access_arch_timer },
@ -2180,7 +2070,7 @@ static void perform_access(struct kvm_vcpu *vcpu,
/* Skip instruction if instructed so */
if (likely(r->access(vcpu, params, r)))
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
kvm_incr_pc(vcpu);
}
/*
@ -2253,8 +2143,6 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
int Rt = kvm_vcpu_sys_get_rt(vcpu);
int Rt2 = (esr >> 10) & 0x1f;
params.is_aarch32 = true;
params.is_32bit = false;
params.CRm = (esr >> 1) & 0xf;
params.is_write = ((esr & 1) == 0);
@ -2304,8 +2192,6 @@ static int kvm_handle_cp_32(struct kvm_vcpu *vcpu,
u32 esr = kvm_vcpu_get_esr(vcpu);
int Rt = kvm_vcpu_sys_get_rt(vcpu);
params.is_aarch32 = true;
params.is_32bit = true;
params.CRm = (esr >> 1) & 0xf;
params.regval = vcpu_get_reg(vcpu, Rt);
params.is_write = ((esr & 1) == 0);
@ -2399,8 +2285,6 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
trace_kvm_handle_sys_reg(esr);
params.is_aarch32 = false;
params.is_32bit = false;
params.Op0 = (esr >> 20) & 3;
params.Op1 = (esr >> 14) & 0x7;
params.CRn = (esr >> 10) & 0xf;

View file

@ -19,14 +19,18 @@ struct sys_reg_params {
u8 Op2;
u64 regval;
bool is_write;
bool is_aarch32;
bool is_32bit; /* Only valid if is_aarch32 is true */
};
struct sys_reg_desc {
/* Sysreg string for debug */
const char *name;
enum {
AA32_ZEROHIGH,
AA32_LO,
AA32_HI,
} aarch32_map;
/* MRS/MSR instruction which accesses it. */
u8 Op0;
u8 Op1;
@ -153,6 +157,7 @@ const struct sys_reg_desc *find_reg_by_id(u64 id,
const struct sys_reg_desc table[],
unsigned int num);
#define AA32(_x) .aarch32_map = AA32_##_x
#define Op0(_x) .Op0 = _x
#define Op1(_x) .Op1 = _x
#define CRn(_x) .CRn = _x

View file

@ -11,6 +11,7 @@
#include <asm/debug-monitors.h>
#include <asm/insn.h>
#include <asm/kvm_mmu.h>
#include <asm/memory.h>
/*
* The LSB of the HYP VA tag
@ -22,6 +23,30 @@ static u8 tag_lsb;
static u64 tag_val;
static u64 va_mask;
/*
* Compute HYP VA by using the same computation as kern_hyp_va().
*/
static u64 __early_kern_hyp_va(u64 addr)
{
addr &= va_mask;
addr |= tag_val << tag_lsb;
return addr;
}
/*
* Store a hyp VA <-> PA offset into a hyp-owned variable.
*/
static void init_hyp_physvirt_offset(void)
{
extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
u64 kern_va, hyp_va;
/* Compute the offset from the hyp VA and PA of a random symbol. */
kern_va = (u64)kvm_ksym_ref(__hyp_text_start);
hyp_va = __early_kern_hyp_va(kern_va);
CHOOSE_NVHE_SYM(hyp_physvirt_offset) = (s64)__pa(kern_va) - (s64)hyp_va;
}
/*
* We want to generate a hyp VA with the following format (with V ==
* vabits_actual):
@ -53,6 +78,8 @@ __init void kvm_compute_layout(void)
tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
}
tag_val >>= tag_lsb;
init_hyp_physvirt_offset();
}
static u32 compute_instruction(int n, u32 rd, u32 rn)
@ -131,28 +158,21 @@ void __init kvm_update_va_mask(struct alt_instr *alt,
}
}
void *__kvm_bp_vect_base;
int __kvm_harden_el2_vector_slot;
void kvm_patch_vector_branch(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
u64 addr;
u32 insn;
BUG_ON(nr_inst != 5);
BUG_ON(nr_inst != 4);
if (has_vhe() || !cpus_have_const_cap(ARM64_HARDEN_EL2_VECTORS)) {
WARN_ON_ONCE(cpus_have_const_cap(ARM64_HARDEN_EL2_VECTORS));
if (!cpus_have_const_cap(ARM64_SPECTRE_V3A) || WARN_ON_ONCE(has_vhe()))
return;
}
/*
* Compute HYP VA by using the same computation as kern_hyp_va()
*/
addr = (uintptr_t)kvm_ksym_ref(__kvm_hyp_vector);
addr &= va_mask;
addr |= tag_val << tag_lsb;
addr = __early_kern_hyp_va((u64)kvm_ksym_ref(__kvm_hyp_vector));
/* Use PC[10:7] to branch to the same vector in KVM */
addr |= ((u64)origptr & GENMASK_ULL(10, 7));
@ -163,15 +183,6 @@ void kvm_patch_vector_branch(struct alt_instr *alt,
*/
addr += KVM_VECTOR_PREAMBLE;
/* stp x0, x1, [sp, #-16]! */
insn = aarch64_insn_gen_load_store_pair(AARCH64_INSN_REG_0,
AARCH64_INSN_REG_1,
AARCH64_INSN_REG_SP,
-16,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX);
*updptr++ = cpu_to_le32(insn);
/* movz x0, #(addr & 0xffff) */
insn = aarch64_insn_gen_movewide(AARCH64_INSN_REG_0,
(u16)addr,
@ -201,3 +212,58 @@ void kvm_patch_vector_branch(struct alt_instr *alt,
AARCH64_INSN_BRANCH_NOLINK);
*updptr++ = cpu_to_le32(insn);
}
static void generate_mov_q(u64 val, __le32 *origptr, __le32 *updptr, int nr_inst)
{
u32 insn, oinsn, rd;
BUG_ON(nr_inst != 4);
/* Compute target register */
oinsn = le32_to_cpu(*origptr);
rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, oinsn);
/* movz rd, #(val & 0xffff) */
insn = aarch64_insn_gen_movewide(rd,
(u16)val,
0,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_ZERO);
*updptr++ = cpu_to_le32(insn);
/* movk rd, #((val >> 16) & 0xffff), lsl #16 */
insn = aarch64_insn_gen_movewide(rd,
(u16)(val >> 16),
16,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_KEEP);
*updptr++ = cpu_to_le32(insn);
/* movk rd, #((val >> 32) & 0xffff), lsl #32 */
insn = aarch64_insn_gen_movewide(rd,
(u16)(val >> 32),
32,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_KEEP);
*updptr++ = cpu_to_le32(insn);
/* movk rd, #((val >> 48) & 0xffff), lsl #48 */
insn = aarch64_insn_gen_movewide(rd,
(u16)(val >> 48),
48,
AARCH64_INSN_VARIANT_64BIT,
AARCH64_INSN_MOVEWIDE_KEEP);
*updptr++ = cpu_to_le32(insn);
}
void kvm_update_kimg_phys_offset(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
generate_mov_q(kimage_voffset + PHYS_OFFSET, origptr, updptr, nr_inst);
}
void kvm_get_kimage_voffset(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
generate_mov_q(kimage_voffset, origptr, updptr, nr_inst);
}

View file

@ -268,8 +268,6 @@ int vgic_v3_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, bool is_write, u64 id,
params.regval = *reg;
params.is_write = is_write;
params.is_aarch32 = false;
params.is_32bit = false;
if (find_reg_by_id(sysreg, &params, gic_v3_icc_reg_descs,
ARRAY_SIZE(gic_v3_icc_reg_descs)))
@ -288,8 +286,6 @@ int vgic_v3_cpu_sysregs_uaccess(struct kvm_vcpu *vcpu, bool is_write, u64 id,
if (is_write)
params.regval = *reg;
params.is_write = is_write;
params.is_aarch32 = false;
params.is_32bit = false;
r = find_reg_by_id(sysreg, &params, gic_v3_icc_reg_descs,
ARRAY_SIZE(gic_v3_icc_reg_descs));

View file

@ -353,6 +353,18 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
return err;
}
void vgic_v4_commit(struct kvm_vcpu *vcpu)
{
struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
/*
* No need to wait for the vPE to be ready across a shallow guest
* exit, as only a vcpu_put will invalidate it.
*/
if (!vpe->ready)
its_commit_vpe(vpe);
}
static struct vgic_its *vgic_get_its(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *irq_entry)
{

View file

@ -915,6 +915,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
if (can_access_vgic_from_kernel())
vgic_restore_state(vcpu);
if (vgic_supports_direct_msis(vcpu->kvm))
vgic_v4_commit(vcpu);
}
void kvm_vgic_load(struct kvm_vcpu *vcpu)

View file

@ -459,6 +459,7 @@ struct kvm_vcpu_stat {
u64 diagnose_308;
u64 diagnose_500;
u64 diagnose_other;
u64 pfault_sync;
};
#define PGM_OPERATION 0x01

View file

@ -184,7 +184,7 @@ static int __import_wp_info(struct kvm_vcpu *vcpu,
if (wp_info->len < 0 || wp_info->len > MAX_WP_SIZE)
return -EINVAL;
wp_info->old_data = kmalloc(bp_data->len, GFP_KERNEL);
wp_info->old_data = kmalloc(bp_data->len, GFP_KERNEL_ACCOUNT);
if (!wp_info->old_data)
return -ENOMEM;
/* try to backup the original value */
@ -234,7 +234,7 @@ int kvm_s390_import_bp_data(struct kvm_vcpu *vcpu,
if (nr_wp > 0) {
wp_info = kmalloc_array(nr_wp,
sizeof(*wp_info),
GFP_KERNEL);
GFP_KERNEL_ACCOUNT);
if (!wp_info) {
ret = -ENOMEM;
goto error;
@ -243,7 +243,7 @@ int kvm_s390_import_bp_data(struct kvm_vcpu *vcpu,
if (nr_bp > 0) {
bp_info = kmalloc_array(nr_bp,
sizeof(*bp_info),
GFP_KERNEL);
GFP_KERNEL_ACCOUNT);
if (!bp_info) {
ret = -ENOMEM;
goto error;
@ -349,7 +349,7 @@ static struct kvm_hw_wp_info_arch *any_wp_changed(struct kvm_vcpu *vcpu)
if (!wp_info || !wp_info->old_data || wp_info->len <= 0)
continue;
temp = kmalloc(wp_info->len, GFP_KERNEL);
temp = kmalloc(wp_info->len, GFP_KERNEL_ACCOUNT);
if (!temp)
continue;

View file

@ -398,7 +398,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
if (!kvm_s390_pv_cpu_is_protected(vcpu) && (addr & ~PAGE_MASK))
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
sctns = (void *)get_zeroed_page(GFP_KERNEL);
sctns = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!sctns)
return -ENOMEM;

View file

@ -1792,7 +1792,7 @@ struct kvm_s390_interrupt_info *kvm_s390_get_io_int(struct kvm *kvm,
goto out;
}
gisa_out:
tmp_inti = kzalloc(sizeof(*inti), GFP_KERNEL);
tmp_inti = kzalloc(sizeof(*inti), GFP_KERNEL_ACCOUNT);
if (tmp_inti) {
tmp_inti->type = KVM_S390_INT_IO(1, 0, 0, 0);
tmp_inti->io.io_int_word = isc_to_int_word(isc);
@ -2015,7 +2015,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
struct kvm_s390_interrupt_info *inti;
int rc;
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
inti = kzalloc(sizeof(*inti), GFP_KERNEL_ACCOUNT);
if (!inti)
return -ENOMEM;
@ -2414,7 +2414,7 @@ static int enqueue_floating_irq(struct kvm_device *dev,
return -EINVAL;
while (len >= sizeof(struct kvm_s390_irq)) {
inti = kzalloc(sizeof(*inti), GFP_KERNEL);
inti = kzalloc(sizeof(*inti), GFP_KERNEL_ACCOUNT);
if (!inti)
return -ENOMEM;
@ -2462,7 +2462,7 @@ static int register_io_adapter(struct kvm_device *dev,
if (dev->kvm->arch.adapters[adapter_info.id] != NULL)
return -EINVAL;
adapter = kzalloc(sizeof(*adapter), GFP_KERNEL);
adapter = kzalloc(sizeof(*adapter), GFP_KERNEL_ACCOUNT);
if (!adapter)
return -ENOMEM;
@ -3290,7 +3290,7 @@ int kvm_s390_gib_init(u8 nisc)
goto out;
}
gib = (struct kvm_s390_gib *)get_zeroed_page(GFP_KERNEL | GFP_DMA);
gib = (struct kvm_s390_gib *)get_zeroed_page(GFP_KERNEL_ACCOUNT | GFP_DMA);
if (!gib) {
rc = -ENOMEM;
goto out;

View file

@ -60,6 +60,7 @@
struct kvm_stats_debugfs_item debugfs_entries[] = {
VCPU_STAT("userspace_handled", exit_userspace),
VCPU_STAT("exit_null", exit_null),
VCPU_STAT("pfault_sync", pfault_sync),
VCPU_STAT("exit_validity", exit_validity),
VCPU_STAT("exit_stop_request", exit_stop_request),
VCPU_STAT("exit_external_request", exit_external_request),
@ -1254,7 +1255,7 @@ static int kvm_s390_set_processor(struct kvm *kvm, struct kvm_device_attr *attr)
ret = -EBUSY;
goto out;
}
proc = kzalloc(sizeof(*proc), GFP_KERNEL);
proc = kzalloc(sizeof(*proc), GFP_KERNEL_ACCOUNT);
if (!proc) {
ret = -ENOMEM;
goto out;
@ -1416,7 +1417,7 @@ static int kvm_s390_get_processor(struct kvm *kvm, struct kvm_device_attr *attr)
struct kvm_s390_vm_cpu_processor *proc;
int ret = 0;
proc = kzalloc(sizeof(*proc), GFP_KERNEL);
proc = kzalloc(sizeof(*proc), GFP_KERNEL_ACCOUNT);
if (!proc) {
ret = -ENOMEM;
goto out;
@ -1444,7 +1445,7 @@ static int kvm_s390_get_machine(struct kvm *kvm, struct kvm_device_attr *attr)
struct kvm_s390_vm_cpu_machine *mach;
int ret = 0;
mach = kzalloc(sizeof(*mach), GFP_KERNEL);
mach = kzalloc(sizeof(*mach), GFP_KERNEL_ACCOUNT);
if (!mach) {
ret = -ENOMEM;
goto out;
@ -1812,7 +1813,7 @@ static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
return -EINVAL;
keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT);
if (!keys)
return -ENOMEM;
@ -1857,7 +1858,7 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
if (args->count < 1 || args->count > KVM_S390_SKEYS_MAX)
return -EINVAL;
keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL);
keys = kvmalloc_array(args->count, sizeof(uint8_t), GFP_KERNEL_ACCOUNT);
if (!keys)
return -ENOMEM;
@ -2625,7 +2626,7 @@ static void sca_dispose(struct kvm *kvm)
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
gfp_t alloc_flags = GFP_KERNEL;
gfp_t alloc_flags = GFP_KERNEL_ACCOUNT;
int i, rc;
char debug_name[16];
static unsigned long sca_offset;
@ -2670,7 +2671,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
BUILD_BUG_ON(sizeof(struct sie_page2) != 4096);
kvm->arch.sie_page2 =
(struct sie_page2 *) get_zeroed_page(GFP_KERNEL | GFP_DMA);
(struct sie_page2 *) get_zeroed_page(GFP_KERNEL_ACCOUNT | GFP_DMA);
if (!kvm->arch.sie_page2)
goto out_err;
@ -2900,7 +2901,7 @@ static int sca_switch_to_extended(struct kvm *kvm)
if (kvm->arch.use_esca)
return 0;
new_sca = alloc_pages_exact(sizeof(*new_sca), GFP_KERNEL|__GFP_ZERO);
new_sca = alloc_pages_exact(sizeof(*new_sca), GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!new_sca)
return -ENOMEM;
@ -3133,7 +3134,7 @@ void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu)
int kvm_s390_vcpu_setup_cmma(struct kvm_vcpu *vcpu)
{
vcpu->arch.sie_block->cbrlo = get_zeroed_page(GFP_KERNEL);
vcpu->arch.sie_block->cbrlo = get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!vcpu->arch.sie_block->cbrlo)
return -ENOMEM;
return 0;
@ -3243,7 +3244,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
int rc;
BUILD_BUG_ON(sizeof(struct sie_page) != 4096);
sie_page = (struct sie_page *) get_zeroed_page(GFP_KERNEL);
sie_page = (struct sie_page *) get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!sie_page)
return -ENOMEM;
@ -4109,6 +4110,7 @@ static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
current->thread.gmap_pfault = 0;
if (kvm_arch_setup_async_pf(vcpu))
return 0;
vcpu->stat.pfault_sync++;
return kvm_arch_fault_in_page(vcpu, current->thread.gmap_addr, 1);
}
return vcpu_post_run_fault_in_sie(vcpu);

View file

@ -879,7 +879,7 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
switch (fc) {
case 1: /* same handling for 1 and 2 */
case 2:
mem = get_zeroed_page(GFP_KERNEL);
mem = get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!mem)
goto out_no_data;
if (stsi((void *) mem, fc, sel1, sel2))
@ -888,7 +888,7 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
case 3:
if (sel1 != 2 || sel2 != 2)
goto out_no_data;
mem = get_zeroed_page(GFP_KERNEL);
mem = get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!mem)
goto out_no_data;
handle_stsi_3_2_2(vcpu, (void *) mem);

View file

@ -60,7 +60,7 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc)
if (kvm_s390_pv_cpu_get_handle(vcpu))
return -EINVAL;
vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL,
vcpu->arch.pv.stor_base = __get_free_pages(GFP_KERNEL_ACCOUNT,
get_order(uv_info.guest_cpu_stor_len));
if (!vcpu->arch.pv.stor_base)
return -ENOMEM;
@ -72,7 +72,7 @@ int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc)
uvcb.stor_origin = (u64)vcpu->arch.pv.stor_base;
/* Alloc Secure Instruction Data Area Designation */
vcpu->arch.sie_block->sidad = __get_free_page(GFP_KERNEL | __GFP_ZERO);
vcpu->arch.sie_block->sidad = __get_free_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!vcpu->arch.sie_block->sidad) {
free_pages(vcpu->arch.pv.stor_base,
get_order(uv_info.guest_cpu_stor_len));
@ -120,7 +120,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
struct kvm_memory_slot *memslot;
kvm->arch.pv.stor_var = NULL;
kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL, get_order(base));
kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL_ACCOUNT, get_order(base));
if (!kvm->arch.pv.stor_base)
return -ENOMEM;

View file

@ -1234,7 +1234,7 @@ static struct vsie_page *get_vsie_page(struct kvm *kvm, unsigned long addr)
mutex_lock(&kvm->arch.vsie.mutex);
if (kvm->arch.vsie.page_count < nr_vcpus) {
page = alloc_page(GFP_KERNEL | __GFP_ZERO | GFP_DMA);
page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | GFP_DMA);
if (!page) {
mutex_unlock(&kvm->arch.vsie.mutex);
return ERR_PTR(-ENOMEM);
@ -1336,7 +1336,7 @@ out_put:
void kvm_s390_vsie_init(struct kvm *kvm)
{
mutex_init(&kvm->arch.vsie.mutex);
INIT_RADIX_TREE(&kvm->arch.vsie.addr_to_page, GFP_KERNEL);
INIT_RADIX_TREE(&kvm->arch.vsie.addr_to_page, GFP_KERNEL_ACCOUNT);
}
/* Destroy the vsie data structures. To be called when a vm is destroyed. */

View file

@ -2,7 +2,7 @@
/*
* KVM guest address space mapping code
*
* Copyright IBM Corp. 2007, 2016, 2018
* Copyright IBM Corp. 2007, 2020
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
* David Hildenbrand <david@redhat.com>
* Janosch Frank <frankja@linux.vnet.ibm.com>
@ -56,19 +56,19 @@ static struct gmap *gmap_alloc(unsigned long limit)
atype = _ASCE_TYPE_REGION1;
etype = _REGION1_ENTRY_EMPTY;
}
gmap = kzalloc(sizeof(struct gmap), GFP_KERNEL);
gmap = kzalloc(sizeof(struct gmap), GFP_KERNEL_ACCOUNT);
if (!gmap)
goto out;
INIT_LIST_HEAD(&gmap->crst_list);
INIT_LIST_HEAD(&gmap->children);
INIT_LIST_HEAD(&gmap->pt_list);
INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL);
INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC);
INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC);
INIT_RADIX_TREE(&gmap->guest_to_host, GFP_KERNEL_ACCOUNT);
INIT_RADIX_TREE(&gmap->host_to_guest, GFP_ATOMIC | __GFP_ACCOUNT);
INIT_RADIX_TREE(&gmap->host_to_rmap, GFP_ATOMIC | __GFP_ACCOUNT);
spin_lock_init(&gmap->guest_table_lock);
spin_lock_init(&gmap->shadow_lock);
refcount_set(&gmap->ref_count, 1);
page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
goto out_free;
page->index = 0;
@ -309,7 +309,7 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
unsigned long *new;
/* since we dont free the gmap table until gmap_free we can unlock */
page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
return -ENOMEM;
new = (unsigned long *) page_to_phys(page);
@ -594,7 +594,7 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
if (pmd_large(*pmd) && !gmap->mm->context.allow_gmap_hpage_1m)
return -EFAULT;
/* Link gmap segment table entry location to page table. */
rc = radix_tree_preload(GFP_KERNEL);
rc = radix_tree_preload(GFP_KERNEL_ACCOUNT);
if (rc)
return rc;
ptl = pmd_lock(mm, pmd);
@ -1218,11 +1218,11 @@ static int gmap_protect_rmap(struct gmap *sg, unsigned long raddr,
vmaddr = __gmap_translate(parent, paddr);
if (IS_ERR_VALUE(vmaddr))
return vmaddr;
rmap = kzalloc(sizeof(*rmap), GFP_KERNEL);
rmap = kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT);
if (!rmap)
return -ENOMEM;
rmap->raddr = raddr;
rc = radix_tree_preload(GFP_KERNEL);
rc = radix_tree_preload(GFP_KERNEL_ACCOUNT);
if (rc) {
kfree(rmap);
return rc;
@ -1741,7 +1741,7 @@ int gmap_shadow_r2t(struct gmap *sg, unsigned long saddr, unsigned long r2t,
BUG_ON(!gmap_is_shadow(sg));
/* Allocate a shadow region second table */
page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
return -ENOMEM;
page->index = r2t & _REGION_ENTRY_ORIGIN;
@ -1825,7 +1825,7 @@ int gmap_shadow_r3t(struct gmap *sg, unsigned long saddr, unsigned long r3t,
BUG_ON(!gmap_is_shadow(sg));
/* Allocate a shadow region second table */
page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
return -ENOMEM;
page->index = r3t & _REGION_ENTRY_ORIGIN;
@ -1909,7 +1909,7 @@ int gmap_shadow_sgt(struct gmap *sg, unsigned long saddr, unsigned long sgt,
BUG_ON(!gmap_is_shadow(sg) || (sgt & _REGION3_ENTRY_LARGE));
/* Allocate a shadow segment table */
page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
if (!page)
return -ENOMEM;
page->index = sgt & _REGION_ENTRY_ORIGIN;
@ -2116,7 +2116,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
parent = sg->parent;
prot = (pte_val(pte) & _PAGE_PROTECT) ? PROT_READ : PROT_WRITE;
rmap = kzalloc(sizeof(*rmap), GFP_KERNEL);
rmap = kzalloc(sizeof(*rmap), GFP_KERNEL_ACCOUNT);
if (!rmap)
return -ENOMEM;
rmap->raddr = (saddr & PAGE_MASK) | _SHADOW_RMAP_PGTABLE;
@ -2128,7 +2128,7 @@ int gmap_shadow_page(struct gmap *sg, unsigned long saddr, pte_t pte)
rc = vmaddr;
break;
}
rc = radix_tree_preload(GFP_KERNEL);
rc = radix_tree_preload(GFP_KERNEL_ACCOUNT);
if (rc)
break;
rc = -EAGAIN;

View file

@ -237,6 +237,7 @@
#define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */
#define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
#define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */
#define X86_FEATURE_VM_PAGE_FLUSH ( 8*32+21) /* "" VM Page Flush MSR is supported */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
@ -376,6 +377,7 @@
#define X86_FEATURE_TSXLDTRK (18*32+16) /* TSX Suspend Load Address Tracking */
#define X86_FEATURE_PCONFIG (18*32+18) /* Intel PCONFIG */
#define X86_FEATURE_ARCH_LBR (18*32+19) /* Intel ARCH LBR */
#define X86_FEATURE_AVX512_FP16 (18*32+23) /* AVX512 FP16 */
#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */
#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */
#define X86_FEATURE_FLUSH_L1D (18*32+28) /* Flush L1D cache */

View file

@ -614,6 +614,7 @@ struct kvm_vcpu_arch {
struct kvm_pio_request pio;
void *pio_data;
void *guest_ins_data;
u8 event_exit_inst_len;
@ -805,6 +806,9 @@ struct kvm_vcpu_arch {
*/
bool enforce;
} pv_cpuid;
/* Protected Guests */
bool guest_state_protected;
};
struct kvm_lpage_info {
@ -1088,7 +1092,7 @@ struct kvm_x86_ops {
void (*hardware_disable)(void);
void (*hardware_unsetup)(void);
bool (*cpu_has_accelerated_tpr)(void);
bool (*has_emulated_msr)(u32 index);
bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
unsigned int vm_size;
@ -1115,7 +1119,8 @@ struct kvm_x86_ops {
struct kvm_segment *var, int seg);
void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l);
void (*set_cr0)(struct kvm_vcpu *vcpu, unsigned long cr0);
int (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
bool (*is_valid_cr4)(struct kvm_vcpu *vcpu, unsigned long cr0);
void (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4);
int (*set_efer)(struct kvm_vcpu *vcpu, u64 efer);
void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
@ -1231,6 +1236,7 @@ struct kvm_x86_ops {
void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t offset, unsigned long mask);
int (*cpu_dirty_log_size)(void);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
@ -1280,6 +1286,7 @@ struct kvm_x86_ops {
void (*migrate_timers)(struct kvm_vcpu *vcpu);
void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
};
struct kvm_x86_nested_ops {
@ -1470,6 +1477,10 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
int reason, bool has_error_code, u32 error_code);
void kvm_free_guest_fpu(struct kvm_vcpu *vcpu);
void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0);
void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4);
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
@ -1696,7 +1707,8 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);
int kvm_is_in_guest(void);
int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size);
void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
u32 size);
bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu);
bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu);
@ -1743,4 +1755,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
#define GET_SMSTATE(type, buf, offset) \
(*(type *)((buf) + (offset) - 0x7e00))
int kvm_cpu_dirty_log_size(void);
#endif /* _ASM_X86_KVM_HOST_H */

View file

@ -472,6 +472,7 @@
#define MSR_AMD64_ICIBSEXTDCTL 0xc001103c
#define MSR_AMD64_IBSOPDATA4 0xc001103d
#define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */
#define MSR_AMD64_VM_PAGE_FLUSH 0xc001011e
#define MSR_AMD64_SEV_ES_GHCB 0xc0010130
#define MSR_AMD64_SEV 0xc0010131
#define MSR_AMD64_SEV_ENABLED_BIT 0

View file

@ -98,6 +98,16 @@ enum {
INTERCEPT_MWAIT_COND,
INTERCEPT_XSETBV,
INTERCEPT_RDPRU,
TRAP_EFER_WRITE,
TRAP_CR0_WRITE,
TRAP_CR1_WRITE,
TRAP_CR2_WRITE,
TRAP_CR3_WRITE,
TRAP_CR4_WRITE,
TRAP_CR5_WRITE,
TRAP_CR6_WRITE,
TRAP_CR7_WRITE,
TRAP_CR8_WRITE,
/* Byte offset 014h (word 5) */
INTERCEPT_INVLPGB = 160,
INTERCEPT_INVLPGB_ILLEGAL,
@ -130,7 +140,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u32 exit_int_info_err;
u64 nested_ctl;
u64 avic_vapic_bar;
u8 reserved_4[8];
u64 ghcb_gpa;
u32 event_inj;
u32 event_inj_err;
u64 nested_cr3;
@ -144,6 +154,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u8 reserved_6[8]; /* Offset 0xe8 */
u64 avic_logical_id; /* Offset 0xf0 */
u64 avic_physical_id; /* Offset 0xf8 */
u8 reserved_7[8];
u64 vmsa_pa; /* Used for an SEV-ES guest */
};
@ -178,7 +190,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
#define SVM_INTERRUPT_SHADOW_MASK 1
#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)
#define SVM_IOIO_STR_SHIFT 2
#define SVM_IOIO_REP_SHIFT 3
@ -197,6 +210,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
#define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
struct vmcb_seg {
u16 selector;
@ -220,7 +234,8 @@ struct vmcb_save_area {
u8 cpl;
u8 reserved_2[4];
u64 efer;
u8 reserved_3[112];
u8 reserved_3[104];
u64 xss; /* Valid for SEV-ES only */
u64 cr4;
u64 cr3;
u64 cr0;
@ -251,9 +266,12 @@ struct vmcb_save_area {
/*
* The following part of the save area is valid only for
* SEV-ES guests when referenced through the GHCB.
* SEV-ES guests when referenced through the GHCB or for
* saving to the host save area.
*/
u8 reserved_7[104];
u8 reserved_7[80];
u32 pkru;
u8 reserved_7a[20];
u64 reserved_8; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
@ -294,7 +312,7 @@ struct ghcb {
#define EXPECTED_VMCB_SAVE_AREA_SIZE 1032
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 256
#define EXPECTED_VMCB_CONTROL_AREA_SIZE 272
#define EXPECTED_GHCB_SIZE PAGE_SIZE
static inline void __unused_size_checks(void)
@ -379,6 +397,16 @@ struct vmcb {
(unsigned long *)&ghcb->save.valid_bitmap); \
} \
\
static inline u64 ghcb_get_##field(struct ghcb *ghcb) \
{ \
return ghcb->save.field; \
} \
\
static inline u64 ghcb_get_##field##_if_valid(struct ghcb *ghcb) \
{ \
return ghcb_##field##_is_valid(ghcb) ? ghcb->save.field : 0; \
} \
\
static inline void ghcb_set_##field(struct ghcb *ghcb, u64 value) \
{ \
__set_bit(GHCB_BITMAP_IDX(field), \

View file

@ -113,6 +113,7 @@
#define VMX_MISC_PREEMPTION_TIMER_RATE_MASK 0x0000001f
#define VMX_MISC_SAVE_EFER_LMA 0x00000020
#define VMX_MISC_ACTIVITY_HLT 0x00000040
#define VMX_MISC_ACTIVITY_WAIT_SIPI 0x00000100
#define VMX_MISC_ZERO_LEN_INS 0x40000000
#define VMX_MISC_MSR_LIST_MULTIPLIER 512

View file

@ -12,6 +12,7 @@
#define KVM_PIO_PAGE_OFFSET 1
#define KVM_COALESCED_MMIO_PAGE_OFFSET 2
#define KVM_DIRTY_LOG_PAGE_OFFSET 64
#define DE_VECTOR 0
#define DB_VECTOR 1

View file

@ -77,10 +77,28 @@
#define SVM_EXIT_MWAIT_COND 0x08c
#define SVM_EXIT_XSETBV 0x08d
#define SVM_EXIT_RDPRU 0x08e
#define SVM_EXIT_EFER_WRITE_TRAP 0x08f
#define SVM_EXIT_CR0_WRITE_TRAP 0x090
#define SVM_EXIT_CR1_WRITE_TRAP 0x091
#define SVM_EXIT_CR2_WRITE_TRAP 0x092
#define SVM_EXIT_CR3_WRITE_TRAP 0x093
#define SVM_EXIT_CR4_WRITE_TRAP 0x094
#define SVM_EXIT_CR5_WRITE_TRAP 0x095
#define SVM_EXIT_CR6_WRITE_TRAP 0x096
#define SVM_EXIT_CR7_WRITE_TRAP 0x097
#define SVM_EXIT_CR8_WRITE_TRAP 0x098
#define SVM_EXIT_CR9_WRITE_TRAP 0x099
#define SVM_EXIT_CR10_WRITE_TRAP 0x09a
#define SVM_EXIT_CR11_WRITE_TRAP 0x09b
#define SVM_EXIT_CR12_WRITE_TRAP 0x09c
#define SVM_EXIT_CR13_WRITE_TRAP 0x09d
#define SVM_EXIT_CR14_WRITE_TRAP 0x09e
#define SVM_EXIT_CR15_WRITE_TRAP 0x09f
#define SVM_EXIT_INVPCID 0x0a2
#define SVM_EXIT_NPF 0x400
#define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401
#define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402
#define SVM_EXIT_VMGEXIT 0x403
/* SEV-ES software-defined VMGEXIT events */
#define SVM_VMGEXIT_MMIO_READ 0x80000001
@ -183,10 +201,20 @@
{ SVM_EXIT_MONITOR, "monitor" }, \
{ SVM_EXIT_MWAIT, "mwait" }, \
{ SVM_EXIT_XSETBV, "xsetbv" }, \
{ SVM_EXIT_EFER_WRITE_TRAP, "write_efer_trap" }, \
{ SVM_EXIT_CR0_WRITE_TRAP, "write_cr0_trap" }, \
{ SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \
{ SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \
{ SVM_EXIT_INVPCID, "invpcid" }, \
{ SVM_EXIT_NPF, "npf" }, \
{ SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \
{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \
{ SVM_EXIT_VMGEXIT, "vmgexit" }, \
{ SVM_VMGEXIT_MMIO_READ, "vmgexit_mmio_read" }, \
{ SVM_VMGEXIT_MMIO_WRITE, "vmgexit_mmio_write" }, \
{ SVM_VMGEXIT_NMI_COMPLETE, "vmgexit_nmi_complete" }, \
{ SVM_VMGEXIT_AP_HLT_LOOP, "vmgexit_ap_hlt_loop" }, \
{ SVM_VMGEXIT_AP_JUMP_TABLE, "vmgexit_ap_jump_table" }, \
{ SVM_EXIT_ERR, "invalid_guest_state" }

View file

@ -32,6 +32,7 @@
#define EXIT_REASON_EXTERNAL_INTERRUPT 1
#define EXIT_REASON_TRIPLE_FAULT 2
#define EXIT_REASON_INIT_SIGNAL 3
#define EXIT_REASON_SIPI_SIGNAL 4
#define EXIT_REASON_INTERRUPT_WINDOW 7
#define EXIT_REASON_NMI_WINDOW 8
@ -94,6 +95,7 @@
{ EXIT_REASON_EXTERNAL_INTERRUPT, "EXTERNAL_INTERRUPT" }, \
{ EXIT_REASON_TRIPLE_FAULT, "TRIPLE_FAULT" }, \
{ EXIT_REASON_INIT_SIGNAL, "INIT_SIGNAL" }, \
{ EXIT_REASON_SIPI_SIGNAL, "SIPI_SIGNAL" }, \
{ EXIT_REASON_INTERRUPT_WINDOW, "INTERRUPT_WINDOW" }, \
{ EXIT_REASON_NMI_WINDOW, "NMI_WINDOW" }, \
{ EXIT_REASON_TASK_SWITCH, "TASK_SWITCH" }, \

View file

@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
{ X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
{ X86_FEATURE_PER_THREAD_MBA, X86_FEATURE_MBA },
{}

View file

@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_SEV, CPUID_EAX, 1, 0x8000001f, 0 },
{ X86_FEATURE_SEV_ES, CPUID_EAX, 3, 0x8000001f, 0 },
{ X86_FEATURE_SME_COHERENT, CPUID_EAX, 10, 0x8000001f, 0 },
{ X86_FEATURE_VM_PAGE_FLUSH, CPUID_EAX, 2, 0x8000001f, 0 },
{ 0, 0, 0, 0, 0 }
};

View file

@ -501,12 +501,12 @@ static bool vmware_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
ghcb_rbp_is_valid(ghcb)))
return false;
regs->bx = ghcb->save.rbx;
regs->cx = ghcb->save.rcx;
regs->dx = ghcb->save.rdx;
regs->si = ghcb->save.rsi;
regs->di = ghcb->save.rdi;
regs->bp = ghcb->save.rbp;
regs->bx = ghcb_get_rbx(ghcb);
regs->cx = ghcb_get_rcx(ghcb);
regs->dx = ghcb_get_rdx(ghcb);
regs->si = ghcb_get_rsi(ghcb);
regs->di = ghcb_get_rdi(ghcb);
regs->bp = ghcb_get_rbp(ghcb);
return true;
}

View file

@ -44,7 +44,6 @@ static int __init parse_no_kvmclock_vsyscall(char *arg)
early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
/* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
#define HV_CLOCK_SIZE (sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)
#define HVC_BOOT_ARRAY_SIZE \
(PAGE_SIZE / sizeof(struct pvclock_vsyscall_time_info))

View file

@ -100,7 +100,8 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
help
Provides support for launching Encrypted VMs on AMD processors.
Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
with Encrypted State (SEV-ES) on AMD processors.
config KVM_MMU_AUDIT
bool "Audit KVM MMU"

View file

@ -10,7 +10,8 @@ endif
KVM := ../../../virt/kvm
kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o \
$(KVM)/dirty_ring.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \

View file

@ -146,6 +146,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
MSR_IA32_MISC_ENABLE_MWAIT);
}
}
EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
@ -418,7 +419,7 @@ void kvm_set_cpu_caps(void)
F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
F(SERIALIZE) | F(TSXLDTRK)
F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16)
);
/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */

View file

@ -264,6 +264,20 @@ static inline int guest_cpuid_stepping(struct kvm_vcpu *vcpu)
return x86_stepping(best->eax);
}
static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
{
return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
}
static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
{
return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB));
}
static inline bool supports_cpuid_fault(struct kvm_vcpu *vcpu)
{
return vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT;

View file

@ -1951,7 +1951,7 @@ int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args)
return kvm_hv_eventfd_assign(kvm, args->conn_id, args->fd);
}
int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
struct kvm_cpuid_entry2 __user *entries)
{
uint16_t evmcs_ver = 0;
@ -2037,7 +2037,7 @@ int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
* Direct Synthetic timers only make sense with in-kernel
* LAPIC
*/
if (lapic_in_kernel(vcpu))
if (!vcpu || lapic_in_kernel(vcpu))
ent->edx |= HV_STIMER_DIRECT_MODE_AVAILABLE;
break;

View file

@ -126,7 +126,7 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
void kvm_hv_init_vm(struct kvm *kvm);
void kvm_hv_destroy_vm(struct kvm *kvm);
int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
struct kvm_cpuid_entry2 __user *entries);
#endif

View file

@ -9,34 +9,6 @@
(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
| X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
{ \
return vcpu->arch.regs[VCPU_REGS_##uname]; \
} \
static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
unsigned long val) \
{ \
vcpu->arch.regs[VCPU_REGS_##uname] = val; \
}
BUILD_KVM_GPR_ACCESSORS(rax, RAX)
BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
BUILD_KVM_GPR_ACCESSORS(rcx, RCX)
BUILD_KVM_GPR_ACCESSORS(rdx, RDX)
BUILD_KVM_GPR_ACCESSORS(rbp, RBP)
BUILD_KVM_GPR_ACCESSORS(rsi, RSI)
BUILD_KVM_GPR_ACCESSORS(rdi, RDI)
#ifdef CONFIG_X86_64
BUILD_KVM_GPR_ACCESSORS(r8, R8)
BUILD_KVM_GPR_ACCESSORS(r9, R9)
BUILD_KVM_GPR_ACCESSORS(r10, R10)
BUILD_KVM_GPR_ACCESSORS(r11, R11)
BUILD_KVM_GPR_ACCESSORS(r12, R12)
BUILD_KVM_GPR_ACCESSORS(r13, R13)
BUILD_KVM_GPR_ACCESSORS(r14, R14)
BUILD_KVM_GPR_ACCESSORS(r15, R15)
#endif
static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
enum kvm_reg reg)
{
@ -62,6 +34,35 @@ static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
}
#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
{ \
return vcpu->arch.regs[VCPU_REGS_##uname]; \
} \
static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
unsigned long val) \
{ \
vcpu->arch.regs[VCPU_REGS_##uname] = val; \
kvm_register_mark_dirty(vcpu, VCPU_REGS_##uname); \
}
BUILD_KVM_GPR_ACCESSORS(rax, RAX)
BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
BUILD_KVM_GPR_ACCESSORS(rcx, RCX)
BUILD_KVM_GPR_ACCESSORS(rdx, RDX)
BUILD_KVM_GPR_ACCESSORS(rbp, RBP)
BUILD_KVM_GPR_ACCESSORS(rsi, RSI)
BUILD_KVM_GPR_ACCESSORS(rdi, RDI)
#ifdef CONFIG_X86_64
BUILD_KVM_GPR_ACCESSORS(r8, R8)
BUILD_KVM_GPR_ACCESSORS(r9, R9)
BUILD_KVM_GPR_ACCESSORS(r10, R10)
BUILD_KVM_GPR_ACCESSORS(r11, R11)
BUILD_KVM_GPR_ACCESSORS(r12, R12)
BUILD_KVM_GPR_ACCESSORS(r13, R13)
BUILD_KVM_GPR_ACCESSORS(r14, R14)
BUILD_KVM_GPR_ACCESSORS(r15, R15)
#endif
static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
{
if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))

View file

@ -2843,14 +2843,35 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
u8 sipi_vector;
int r;
unsigned long pe;
if (!lapic_in_kernel(vcpu) || !apic->pending_events)
if (!lapic_in_kernel(vcpu))
return;
/*
* Read pending events before calling the check_events
* callback.
*/
pe = smp_load_acquire(&apic->pending_events);
if (!pe)
return;
if (is_guest_mode(vcpu)) {
r = kvm_x86_ops.nested_ops->check_events(vcpu);
if (r < 0)
return;
/*
* If an event has happened and caused a vmexit,
* we know INITs are latched and therefore
* we will not incorrectly deliver an APIC
* event instead of a vmexit.
*/
}
/*
* INITs are latched while CPU is in specific states
* (SMM, VMX non-root mode, SVM with GIF=0).
* (SMM, VMX root mode, SVM with GIF=0).
* Because a CPU cannot be in these states immediately
* after it has processed an INIT signal (and thus in
* KVM_MP_STATE_INIT_RECEIVED state), just eat SIPIs
@ -2858,27 +2879,29 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
*/
if (kvm_vcpu_latch_init(vcpu)) {
WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
if (test_bit(KVM_APIC_SIPI, &apic->pending_events))
if (test_bit(KVM_APIC_SIPI, &pe))
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
return;
}
pe = xchg(&apic->pending_events, 0);
if (test_bit(KVM_APIC_INIT, &pe)) {
clear_bit(KVM_APIC_INIT, &apic->pending_events);
kvm_vcpu_reset(vcpu, true);
if (kvm_vcpu_is_bsp(apic->vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
}
if (test_bit(KVM_APIC_SIPI, &pe) &&
vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
if (test_bit(KVM_APIC_SIPI, &pe)) {
clear_bit(KVM_APIC_SIPI, &apic->pending_events);
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
}
}
}
void kvm_lapic_init(void)

View file

@ -820,7 +820,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
return NULL;
if (no_dirty_log && slot->dirty_bitmap)
if (no_dirty_log && kvm_slot_dirty_track_enabled(slot))
return NULL;
return slot;
@ -1289,6 +1289,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
}
int kvm_cpu_dirty_log_size(void)
{
if (kvm_x86_ops.cpu_dirty_log_size)
return kvm_x86_ops.cpu_dirty_log_size();
return 0;
}
bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
struct kvm_memory_slot *slot, u64 gfn)
{

View file

@ -381,6 +381,35 @@ TRACE_EVENT(
)
);
TRACE_EVENT(
kvm_tdp_mmu_spte_changed,
TP_PROTO(int as_id, gfn_t gfn, int level, u64 old_spte, u64 new_spte),
TP_ARGS(as_id, gfn, level, old_spte, new_spte),
TP_STRUCT__entry(
__field(u64, gfn)
__field(u64, old_spte)
__field(u64, new_spte)
/* Level cannot be larger than 5 on x86, so it fits in a u8. */
__field(u8, level)
/* as_id can only be 0 or 1 x86, so it fits in a u8. */
__field(u8, as_id)
),
TP_fast_assign(
__entry->gfn = gfn;
__entry->old_spte = old_spte;
__entry->new_spte = new_spte;
__entry->level = level;
__entry->as_id = as_id;
),
TP_printk("as id %d gfn %llx level %d old_spte %llx new_spte %llx",
__entry->as_id, __entry->gfn, __entry->level,
__entry->old_spte, __entry->new_spte
)
);
#endif /* _TRACE_KVMMMU_H */
#undef TRACE_INCLUDE_PATH

Some files were not shown because too many files have changed in this diff Show more