Newer
Older
machine check handling routine. Without this capability KVM will
branch to guests' 0x200 interrupt vector.
7.13 KVM_CAP_X86_DISABLE_EXITS
Architectures: x86
Parameters: args[0] defines which exits are disabled
Returns: 0 on success, -EINVAL when args[0] contains invalid exits
Valid bits in args[0] are
#define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0)
#define KVM_X86_DISABLE_EXITS_HLT (1 << 1)
#define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2)
#define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3)
Enabling this capability on a VM provides userspace with a way to no
longer intercept some instructions for improved latency in some
workloads, and is suggested when vCPUs are associated to dedicated
physical CPUs. More bits can be added in the future; userspace can
just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable
all such vmexits.
Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
7.14 KVM_CAP_S390_HPAGE_1M
Architectures: s390
Parameters: none
Returns: 0 on success, -EINVAL if hpage module parameter was not set
or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL
flag set
With this capability the KVM support for memory backing with 1m pages
through hugetlbfs can be enabled for a VM. After the capability is
enabled, cmma can't be enabled anymore and pfmfi and the storage key
interpretation are disabled. If cmma has already been enabled or the
hpage module parameter is not set to 1, -EINVAL is returned.
While it is generally possible to create a huge page backed VM without
this capability, the VM will not be able to run.
7.15 KVM_CAP_MSR_PLATFORM_INFO
Architectures: x86
Parameters: args[0] whether feature should be enabled or not
With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise,
a #GP would be raised when the guest tries to access. Currently, this
capability does not enable write permissions of this MSR for the guest.
7.16 KVM_CAP_PPC_NESTED_HV
Architectures: ppc
Parameters: none
Returns: 0 on success, -EINVAL when the implementation doesn't support
nested-HV virtualization.
HV-KVM on POWER9 and later systems allows for "nested-HV"
virtualization, which provides a way for a guest VM to run guests that
can run using the CPU's supervisor mode (privileged non-hypervisor
state). Enabling this capability on a VM depends on the CPU having
the necessary functionality and on the facility being enabled with a
kvm-hv module parameter.
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
7.17 KVM_CAP_EXCEPTION_PAYLOAD
Architectures: x86
Parameters: args[0] whether feature should be enabled or not
With this capability enabled, CR2 will not be modified prior to the
emulated VM-exit when L1 intercepts a #PF exception that occurs in
L2. Similarly, for kvm-intel only, DR6 will not be modified prior to
the emulated VM-exit when L1 intercepts a #DB exception that occurs in
L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or
#DB) exception for L2, exception.has_payload will be set and the
faulting address (or the new DR6 bits*) will be reported in the
exception_payload field. Similarly, when userspace injects a #PF (or
#DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set
exception.has_payload and to put the faulting address (or the new DR6
bits*) in the exception_payload field.
This capability also enables exception.pending in struct
kvm_vcpu_events, which allows userspace to distinguish between pending
and injected exceptions.
* For the new DR6 bits, note that bit 16 is set iff the #DB exception
will clear DR6.RTM.
7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
Architectures: x86, arm, arm64, mips
Parameters: args[0] whether feature should be enabled or not
With this capability enabled, KVM_GET_DIRTY_LOG will not automatically
clear and write-protect all pages that are returned as dirty.
Rather, userspace will have to do this operation separately using
KVM_CLEAR_DIRTY_LOG.
At the cost of a slightly more complicated operation, this provides better
scalability and responsiveness for two reasons. First,
KVM_CLEAR_DIRTY_LOG ioctl can operate on a 64-page granularity rather
than requiring to sync a full memslot; this ensures that KVM does not
take spinlocks for an extended period of time. Second, in some cases a
large amount of time can pass between a call to KVM_GET_DIRTY_LOG and
userspace actually using the data in the page. Pages can be modified
during this time, which is inefficint for both the guest and userspace:
the guest will incur a higher penalty due to write protection faults,
while userspace can see false reports of dirty pages. Manual reprotection
helps reducing this time, improving guest performance and reducing the
number of dirty log false positives.
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make
it hard or impossible to use it correctly. The availability of
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed.
Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT.
8. Other capabilities.
----------------------
This section lists capabilities that give information about other
features of the KVM implementation.
8.1 KVM_CAP_PPC_HWRNG
Architectures: ppc
This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel has an implementation of the
H_RANDOM hypercall backed by a hardware random-number generator.
If present, the kernel H_RANDOM handler can be enabled for guest use
with the KVM_CAP_PPC_ENABLE_HCALL capability.
8.2 KVM_CAP_HYPERV_SYNIC
Architectures: x86
This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel has an implementation of the
Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
used to support Windows Hyper-V based guest paravirt drivers(VMBus).
In order to use SynIC, it has to be activated by setting this
capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
will disable the use of APIC hardware virtualization even if supported
by the CPU, as it's incompatible with SynIC auto-EOI behavior.
8.3 KVM_CAP_PPC_RADIX_MMU
Architectures: ppc
This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel can support guests using the
radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
processor).
8.4 KVM_CAP_PPC_HASH_MMU_V3
Architectures: ppc
This capability, if KVM_CHECK_EXTENSION indicates that it is
available, means that that the kernel can support guests using the
hashed page table MMU defined in Power ISA V3.00 (as implemented in
the POWER9 processor), including in-memory segment tables.
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
5192
5193
5194
5195
5196
5197
5198
5199
5200
5201
5202
5203
5204
5205
8.5 KVM_CAP_MIPS_VZ
Architectures: mips
This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
it is available, means that full hardware assisted virtualization capabilities
of the hardware are available for use through KVM. An appropriate
KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which
utilises it.
If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
available, it means that the VM is using full hardware assisted virtualization
capabilities of the hardware. This is useful to check after creating a VM with
KVM_VM_MIPS_DEFAULT.
The value returned by KVM_CHECK_EXTENSION should be compared against known
values (see below). All other values are reserved. This is to allow for the
possibility of other hardware assisted virtualization implementations which
may be incompatible with the MIPS VZ ASE.
0: The trap & emulate implementation is in use to run guest code in user
mode. Guest virtual memory segments are rearranged to fit the guest in the
user mode address space.
1: The MIPS VZ ASE is in use, providing full hardware assisted
virtualization, including standard guest virtual memory segments.
8.6 KVM_CAP_MIPS_TE
Architectures: mips
This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
it is available, means that the trap & emulate implementation is available to
run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware
assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed
to KVM_CREATE_VM to create a VM which utilises it.
If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
available, it means that the VM is using trap & emulate.
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
8.7 KVM_CAP_MIPS_64BIT
Architectures: mips
This capability indicates the supported architecture type of the guest, i.e. the
supported register and address width.
The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
kvm VM handle correspond roughly to the CP0_Config.AT register field, and should
be checked specifically against known values (see below). All other values are
reserved.
0: MIPS32 or microMIPS32.
Both registers and addresses are 32-bits wide.
It will only be possible to run 32-bit guest code.
1: MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
Registers are 64-bits wide, but addresses are 32-bits wide.
64-bit guest code may run but cannot access MIPS64 memory segments.
It will also be possible to run 32-bit guest code.
2: MIPS64 or microMIPS64 with access to all address segments.
Both registers and addresses are 64-bits wide.
It will be possible to run 64-bit or 32-bit guest code.
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264
5265
5266
5267
5268
5269
5270
5271
Architectures: arm, arm64
This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
that if userspace creates a VM without an in-kernel interrupt controller, it
will be notified of changes to the output level of in-kernel emulated devices,
which can generate virtual interrupts, presented to the VM.
For such VMs, on every return to userspace, the kernel
updates the vcpu's run->s.regs.device_irq_level field to represent the actual
output level of the device.
Whenever kvm detects a change in the device output level, kvm guarantees at
least one return to userspace before running the VM. This exit could either
be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way,
userspace can always sample the device output level and re-compute the state of
the userspace interrupt controller. Userspace should always check the state
of run->s.regs.device_irq_level on every kvm exit.
The value in run->s.regs.device_irq_level can represent both level and edge
triggered interrupt signals, depending on the device. Edge triggered interrupt
signals will exit to userspace with the bit in run->s.regs.device_irq_level
set exactly once per edge signal.
The field run->s.regs.device_irq_level is available independent of
run->kvm_valid_regs or run->kvm_dirty_regs bits.
If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a
number larger than 0 indicating the version of this capability is implemented
and thereby which bits in in run->s.regs.device_irq_level can signal values.
Currently the following bits are defined for the device_irq_level bitmap:
KVM_CAP_ARM_USER_IRQ >= 1:
KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
Future versions of kvm may implement additional events. These will get
indicated by returning a higher number from KVM_CHECK_EXTENSION and will be
listed above.
8.10 KVM_CAP_PPC_SMT_POSSIBLE
Architectures: ppc
Querying this capability returns a bitmap indicating the possible
virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N
(counting from the right) is set, then a virtual SMT mode of 2^N is
available.
8.11 KVM_CAP_HYPERV_SYNIC2
Architectures: x86
This capability enables a newer version of Hyper-V Synthetic interrupt
controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM
doesn't clear SynIC message and event flags pages when they are enabled by
writing to the respective MSRs.
8.12 KVM_CAP_HYPERV_VP_INDEX
Architectures: x86
This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its
value is used to denote the target vcpu for a SynIC interrupt. For
compatibilty, KVM initializes this msr to KVM's internal vcpu index. When this
capability is absent, userspace can still query this msr's value.
8.13 KVM_CAP_S390_AIS_MIGRATION
Architectures: s390
Parameters: none
This capability indicates if the flic device will be able to get/set the
AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows
to discover this without having to create a flic device.
5308
5309
5310
5311
5312
5313
5314
5315
5316
5317
5318
5319
5320
5321
5322
5323
5324
5325
5326
5327
5328
5329
5330
5331
5332
5333
5334
5335
5336
5337
8.14 KVM_CAP_S390_PSW
Architectures: s390
This capability indicates that the PSW is exposed via the kvm_run structure.
8.15 KVM_CAP_S390_GMAP
Architectures: s390
This capability indicates that the user space memory used as guest mapping can
be anywhere in the user memory address space, as long as the memory slots are
aligned and sized to a segment (1MB) boundary.
8.16 KVM_CAP_S390_COW
Architectures: s390
This capability indicates that the user space memory used as guest mapping can
use copy-on-write semantics as well as dirty pages tracking via read-only page
tables.
8.17 KVM_CAP_S390_BPB
Architectures: s390
This capability indicates that kvm will implement the interfaces to handle
reset, migration and nested KVM for branch prediction blocking. The stfle
facility 82 should not be provided to the guest without this capability.
8.18 KVM_CAP_HYPERV_TLBFLUSH
Architectures: x86
This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
hypercalls:
HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx,
HvFlushVirtualAddressList, HvFlushVirtualAddressListEx.
8.19 KVM_CAP_ARM_INJECT_SERROR_ESR
Architectures: arm, arm64
This capability indicates that userspace can specify (via the
KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it
takes a virtual SError interrupt exception.
If KVM advertises this capability, userspace can only specify the ISS field for
the ESR syndrome. Other parts of the ESR, such as the EC are generated by the
CPU when the exception is taken. If this virtual SError is taken to EL1 using
AArch64, this value will be reported in the ISS field of ESR_ELx.
See KVM_CAP_VCPU_EVENTS for more details.
8.20 KVM_CAP_HYPERV_SEND_IPI
Architectures: x86
This capability indicates that KVM supports paravirtualized Hyper-V IPI send
hypercalls:
HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
Architecture: x86
This capability indicates that KVM running on top of Hyper-V hypervisor
enables Direct TLB flush for its guests meaning that TLB flush
hypercalls are handled by Level 0 hypervisor (Hyper-V) bypassing KVM.
Due to the different ABI for hypercall parameters between Hyper-V and
KVM, enabling this capability effectively disables all hypercall
handling by KVM (as some KVM hypercall may be mistakenly treated as TLB
flush hypercalls by Hyper-V) so userspace should disable KVM identification
in CPUID and only exposes Hyper-V identification. In this case, guest
thinks it's running on Hyper-V and only use Hyper-V hypercalls.