Skip to content
api.rst 210 KiB
Newer Older
7.21 KVM_CAP_X86_USER_SPACE_MSR
-------------------------------

:Architectures: x86
:Target: VM
:Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report
:Returns: 0 on success; -1 on error

This capability enables trapping of #GP invoking RDMSR and WRMSR instructions
into user space.

When a guest requests to read or write an MSR, KVM may not implement all MSRs
that are relevant to a respective system. It also does not differentiate by
CPU type.

To allow more fine grained control over MSR handling, user space may enable
this capability. With it enabled, MSR accesses that match the mask specified in
args[0] and trigger a #GP event inside the guest by KVM will instead trigger
KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
can then handle to implement model specific MSR handling and/or user notifications
to inform a user that an MSR was not handled.

======================

This section lists capabilities that give information about other
features of the KVM implementation.

8.1 KVM_CAP_PPC_HWRNG
---------------------

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that the kernel has an implementation of the
H_RANDOM hypercall backed by a hardware random-number generator.
If present, the kernel H_RANDOM handler can be enabled for guest use
with the KVM_CAP_PPC_ENABLE_HCALL capability.
------------------------

:Architectures: x86

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that the kernel has an implementation of the
Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
used to support Windows Hyper-V based guest paravirt drivers(VMBus).

In order to use SynIC, it has to be activated by setting this
capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
will disable the use of APIC hardware virtualization even if supported
by the CPU, as it's incompatible with SynIC auto-EOI behavior.
-------------------------

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that the kernel can support guests using the
radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
processor).

8.4 KVM_CAP_PPC_HASH_MMU_V3
---------------------------

This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that the kernel can support guests using the
hashed page table MMU defined in Power ISA V3.00 (as implemented in
the POWER9 processor), including in-memory segment tables.

8.5 KVM_CAP_MIPS_VZ
:Architectures: mips

This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
it is available, means that full hardware assisted virtualization capabilities
of the hardware are available for use through KVM. An appropriate
KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
utilises it.

If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
available, it means that the VM is using full hardware assisted virtualization
capabilities of the hardware. This is useful to check after creating a VM with
KVM_VM_MIPS_DEFAULT.

The value returned by KVM_CHECK_EXTENSION should be compared against known
values (see below). All other values are reserved. This is to allow for the
possibility of other hardware assisted virtualization implementations which
may be incompatible with the MIPS VZ ASE.

==  ==========================================================================
 0  The trap & emulate implementation is in use to run guest code in user
    mode. Guest virtual memory segments are rearranged to fit the guest in the
    user mode address space.

 1  The MIPS VZ ASE is in use, providing full hardware assisted
    virtualization, including standard guest virtual memory segments.
==  ==========================================================================

8.6 KVM_CAP_MIPS_TE
:Architectures: mips

This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
it is available, means that the trap & emulate implementation is available to
run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware
assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed
to KVM_CREATE_VM to create a VM which utilises it.

If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
available, it means that the VM is using trap & emulate.

8.7 KVM_CAP_MIPS_64BIT
----------------------
:Architectures: mips

This capability indicates the supported architecture type of the guest, i.e. the
supported register and address width.

The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
be checked specifically against known values (see below). All other values are
reserved.

==  ========================================================================
 0  MIPS32 or microMIPS32.
    Both registers and addresses are 32-bits wide.
    It will only be possible to run 32-bit guest code.

 1  MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
    Registers are 64-bits wide, but addresses are 32-bits wide.
    64-bit guest code may run but cannot access MIPS64 memory segments.
    It will also be possible to run 32-bit guest code.

 2  MIPS64 or microMIPS64 with access to all address segments.
    Both registers and addresses are 64-bits wide.
    It will be possible to run 64-bit or 32-bit guest code.
==  ========================================================================
8.9 KVM_CAP_ARM_USER_IRQ
------------------------

:Architectures: arm, arm64

This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
that if userspace creates a VM without an in-kernel interrupt controller, it
will be notified of changes to the output level of in-kernel emulated devices,
which can generate virtual interrupts, presented to the VM.
For such VMs, on every return to userspace, the kernel
updates the vcpu's run->s.regs.device_irq_level field to represent the actual
output level of the device.

Whenever kvm detects a change in the device output level, kvm guarantees at
least one return to userspace before running the VM.  This exit could either
be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
userspace can always sample the device output level and re-compute the state of
the userspace interrupt controller.  Userspace should always check the state
of run->s.regs.device_irq_level on every kvm exit.
The value in run->s.regs.device_irq_level can represent both level and edge
triggered interrupt signals, depending on the device.  Edge triggered interrupt
signals will exit to userspace with the bit in run->s.regs.device_irq_level
set exactly once per edge signal.

The field run->s.regs.device_irq_level is available independent of
run->kvm_valid_regs or run->kvm_dirty_regs bits.

If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
number larger than 0 indicating the version of this capability is implemented
and thereby which bits in run->s.regs.device_irq_level can signal values.
Currently the following bits are defined for the device_irq_level bitmap::

  KVM_CAP_ARM_USER_IRQ >= 1:

    KVM_ARM_DEV_EL1_VTIMER -  EL1 virtual timer
    KVM_ARM_DEV_EL1_PTIMER -  EL1 physical timer
    KVM_ARM_DEV_PMU        -  ARM PMU overflow interrupt signal

Future versions of kvm may implement additional events. These will get
indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
listed above.
-----------------------------

Querying this capability returns a bitmap indicating the possible
virtual SMT modes that can be set using KVM_CAP_PPC_SMT.  If bit N
(counting from the right) is set, then a virtual SMT mode of 2^N is
available.

8.11 KVM_CAP_HYPERV_SYNIC2
--------------------------

This capability enables a newer version of Hyper-V Synthetic interrupt
controller (SynIC).  The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
doesn't clear SynIC message and event flags pages when they are enabled by
writing to the respective MSRs.

8.12 KVM_CAP_HYPERV_VP_INDEX
----------------------------

This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr.  Its
value is used to denote the target vcpu for a SynIC interrupt.  For
compatibilty, KVM initializes this msr to KVM's internal vcpu index.  When this
capability is absent, userspace can still query this msr's value.
-------------------------------
:Architectures: s390
:Parameters: none

This capability indicates if the flic device will be able to get/set the
AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
to discover this without having to create a flic device.
---------------------
:Architectures: s390

This capability indicates that the PSW is exposed via the kvm_run structure.

8.15 KVM_CAP_S390_GMAP
----------------------
:Architectures: s390

This capability indicates that the user space memory used as guest mapping can
be anywhere in the user memory address space, as long as the memory slots are
aligned and sized to a segment (1MB) boundary.

8.16 KVM_CAP_S390_COW
---------------------
:Architectures: s390

This capability indicates that the user space memory used as guest mapping can
use copy-on-write semantics as well as dirty pages tracking via read-only page
tables.

8.17 KVM_CAP_S390_BPB
---------------------
:Architectures: s390

This capability indicates that kvm will implement the interfaces to handle
reset, migration and nested KVM for branch prediction blocking. The stfle
facility 82 should not be provided to the guest without this capability.
8.18 KVM_CAP_HYPERV_TLBFLUSH
----------------------------

This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
hypercalls:
HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
8.19 KVM_CAP_ARM_INJECT_SERROR_ESR
----------------------------------
:Architectures: arm, arm64

This capability indicates that userspace can specify (via the
KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
takes a virtual SError interrupt exception.
If KVM advertises this capability, userspace can only specify the ISS field for
the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
CPU when the exception is taken. If this virtual SError is taken to EL1 using
AArch64, this value will be reported in the ISS field of ESR_ELx.

See KVM_CAP_VCPU_EVENTS for more details.
8.20 KVM_CAP_HYPERV_SEND_IPI
----------------------------

This capability indicates that KVM supports paravirtualized Hyper-V IPI send
hypercalls:
HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
-----------------------------------
:Architectures: x86

This capability indicates that KVM running on top of Hyper-V hypervisor
enables Direct TLB flush for its guests meaning that TLB flush
hypercalls are handled by Level 0 hypervisor (Hyper-V) bypassing KVM.
Due to the different ABI for hypercall parameters between Hyper-V and
KVM, enabling this capability effectively disables all hypercall
handling by KVM (as some KVM hypercall may be mistakenly treated as TLB
flush hypercalls by Hyper-V) so userspace should disable KVM identification
in CPUID and only exposes Hyper-V identification. In this case, guest
thinks it's running on Hyper-V and only use Hyper-V hypercalls.

8.22 KVM_CAP_S390_VCPU_RESETS
-----------------------------
:Architectures: s390

This capability indicates that the KVM_S390_NORMAL_RESET and
KVM_S390_CLEAR_RESET ioctls are available.

8.23 KVM_CAP_S390_PROTECTED
---------------------------
:Architectures: s390

This capability indicates that the Ultravisor has been initialized and
KVM can therefore start protected VMs.
This capability governs the KVM_S390_PV_COMMAND ioctl and the
KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected
guests when the state change is invalid.

8.24 KVM_CAP_STEAL_TIME
-----------------------

:Architectures: arm64, x86

This capability indicates that KVM supports steal time accounting.
When steal time accounting is supported it may be enabled with
architecture-specific interfaces.  This capability and the architecture-
specific interfaces must be consistent, i.e. if one says the feature
is supported, than the other should as well and vice versa.  For arm64
see Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL".
For x86 see Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME".

8.25 KVM_CAP_S390_DIAG318
-------------------------

:Architectures: s390

This capability enables a guest to set information about its control program
(i.e. guest kernel type and version). The information is helpful during
system/firmware service events, providing additional data about the guest
environments running on the machine.

The information is associated with the DIAGNOSE 0x318 instruction, which sets
an 8-byte value consisting of a one-byte Control Program Name Code (CPNC) and
a 7-byte Control Program Version Code (CPVC). The CPNC determines what
environment the control program is running in (e.g. Linux, z/VM...), and the
CPVC is used for information specific to OS (e.g. Linux version, Linux
distribution...)

If this capability is available, then the CPNC and CPVC can be synchronized
between KVM and userspace via the sync regs mechanism (KVM_SYNC_DIAG318).

8.26 KVM_CAP_X86_USER_SPACE_MSR
-------------------------------

:Architectures: x86

This capability indicates that KVM supports deflection of MSR reads and
writes to user space. It can be enabled on a VM level. If enabled, MSR
accesses that would usually trigger a #GP by KVM into the guest will
instead get bounced to user space through the KVM_EXIT_X86_RDMSR and
KVM_EXIT_X86_WRMSR exit notifications.
8.27 KVM_X86_SET_MSR_FILTER
---------------------------

:Architectures: x86

This capability indicates that KVM supports that accesses to user defined MSRs
may be rejected. With this capability exposed, KVM exports new VM ioctl
KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR
ranges that KVM should reject access to.

In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
trap and emulate MSRs that are outside of the scope of KVM as well as
limit the attack surface on KVM's MSR emulation code.
8.28 KVM_CAP_ENFORCE_PV_CPUID
-----------------------------

Architectures: x86

When enabled, KVM will disable paravirtual features provided to the
guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
(0x40000001). Otherwise, a guest may use the paravirtual features
regardless of what has actually been exposed through the CPUID leaf.