Newer
Older
5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR
Sets the guest physical address of the vcpu_runstate_info for a given
vCPU. This is how a Xen guest tracks CPU state such as steal time.
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT
Sets the runstate (RUNSTATE_running/_runnable/_blocked/_offline) of
the given vCPU from the .u.runstate.state member of the structure.
KVM automatically accounts running and runnable time but blocked
and offline states are only entered explicitly.
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA
Sets all fields of the vCPU runstate data from the .u.runstate member
of the structure, including the current runstate. The state_entry_time
must equal the sum of the other four times.
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST
This *adds* the contents of the .u.runstate members of the structure
to the corresponding members of the given vCPU's runstate data, thus
permitting atomic adjustments to the runstate times. The adjustment
to the state_entry_time must equal the sum of the adjustments to the
other four times. The state field must be set to -1, or to a valid
runstate value (RUNSTATE_running, RUNSTATE_runnable, RUNSTATE_blocked
or RUNSTATE_offline) to set the current accounted state as of the
adjusted state_entry_time.
4.129 KVM_XEN_VCPU_GET_ATTR
:Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO
:Architectures: x86
:Type: vcpu ioctl
:Parameters: struct kvm_xen_vcpu_attr
:Returns: 0 on success, < 0 on error
Allows Xen vCPU attributes to be read. For the structure and types,
see KVM_XEN_VCPU_SET_ATTR above.
The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
with the KVM_XEN_VCPU_GET_ATTR ioctl.
5042
5043
5044
5045
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
5062
5063
5064
5065
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
4.130 KVM_ARM_MTE_COPY_TAGS
---------------------------
:Capability: KVM_CAP_ARM_MTE
:Architectures: arm64
:Type: vm ioctl
:Parameters: struct kvm_arm_copy_mte_tags
:Returns: number of bytes copied, < 0 on error (-EINVAL for incorrect
arguments, -EFAULT if memory cannot be accessed).
::
struct kvm_arm_copy_mte_tags {
__u64 guest_ipa;
__u64 length;
void __user *addr;
__u64 flags;
__u64 reserved[2];
};
Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The
``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr``
field must point to a buffer which the tags will be copied to or from.
``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or
``KVM_ARM_TAGS_FROM_GUEST``.
The size of the buffer to store the tags is ``(length / 16)`` bytes
(granules in MTE are 16 bytes long). Each byte contains a single tag
value. This matches the format of ``PTRACE_PEEKMTETAGS`` and
``PTRACE_POKEMTETAGS``.
If an error occurs before any data is copied then a negative error code is
returned. If some tags have been copied before an error occurs then the number
of bytes successfully copied is returned. If the call completes successfully
then ``length`` is returned.
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
4.131 KVM_GET_SREGS2
------------------
:Capability: KVM_CAP_SREGS2
:Architectures: x86
:Type: vcpu ioctl
:Parameters: struct kvm_sregs2 (out)
:Returns: 0 on success, -1 on error
Reads special registers from the vcpu.
This ioctl (when supported) replaces the KVM_GET_SREGS.
::
struct kvm_sregs2 {
/* out (KVM_GET_SREGS2) / in (KVM_SET_SREGS2) */
struct kvm_segment cs, ds, es, fs, gs, ss;
struct kvm_segment tr, ldt;
struct kvm_dtable gdt, idt;
__u64 cr0, cr2, cr3, cr4, cr8;
__u64 efer;
__u64 apic_base;
__u64 flags;
__u64 pdptrs[4];
};
flags values for ``kvm_sregs2``:
``KVM_SREGS2_FLAGS_PDPTRS_VALID``
Indicates thats the struct contain valid PDPTR values.
4.132 KVM_SET_SREGS2
------------------
:Capability: KVM_CAP_SREGS2
:Architectures: x86
:Type: vcpu ioctl
:Parameters: struct kvm_sregs2 (in)
:Returns: 0 on success, -1 on error
Writes special registers into the vcpu.
See KVM_GET_SREGS2 for the data structures.
This ioctl (when supported) replaces the KVM_SET_SREGS.
5125
5126
5127
5128
5129
5130
5131
5132
5133
5134
5135
5136
5137
5138
5139
5140
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151
5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186
5187
5188
5189
5190
5191
5192
5193
5194
5195
5196
5197
5198
5199
5200
5201
5202
5203
5204
5205
5206
5207
5208
5209
5210
5211
5212
5213
5214
5215
5216
5217
5218
5219
5220
5221
5222
5223
5224
5225
5226
5227
5228
5229
5230
5231
5232
5233
5234
5235
5236
5237
5238
5239
5240
5241
5242
5243
5244
5245
5246
5247
5248
5249
5250
5251
5252
5253
5254
5255
5256
5257
5258
5259
5260
5261
5262
5263
5264
5265
5266
5267
5268
5269
5270
5271
5272
5273
5274
5275
5276
5277
5278
5279
5280
5281
5282
5283
5284
5285
5286
5287
5288
5289
5290
5291
5292
5293
5294
4.133 KVM_GET_STATS_FD
----------------------
:Capability: KVM_CAP_STATS_BINARY_FD
:Architectures: all
:Type: vm ioctl, vcpu ioctl
:Parameters: none
:Returns: statistics file descriptor on success, < 0 on error
Errors:
====== ======================================================
ENOMEM if the fd could not be created due to lack of memory
EMFILE if the number of opened files exceeds the limit
====== ======================================================
The returned file descriptor can be used to read VM/vCPU statistics data in
binary format. The data in the file descriptor consists of four blocks
organized as follows:
+-------------+
| Header |
+-------------+
| id string |
+-------------+
| Descriptors |
+-------------+
| Stats Data |
+-------------+
Apart from the header starting at offset 0, please be aware that it is
not guaranteed that the four blocks are adjacent or in the above order;
the offsets of the id, descriptors and data blocks are found in the
header. However, all four blocks are aligned to 64 bit offsets in the
file and they do not overlap.
All blocks except the data block are immutable. Userspace can read them
only one time after retrieving the file descriptor, and then use ``pread`` or
``lseek`` to read the statistics repeatedly.
All data is in system endianness.
The format of the header is as follows::
struct kvm_stats_header {
__u32 flags;
__u32 name_size;
__u32 num_desc;
__u32 id_offset;
__u32 desc_offset;
__u32 data_offset;
};
The ``flags`` field is not used at the moment. It is always read as 0.
The ``name_size`` field is the size (in byte) of the statistics name string
(including trailing '\0') which is contained in the "id string" block and
appended at the end of every descriptor.
The ``num_desc`` field is the number of descriptors that are included in the
descriptor block. (The actual number of values in the data block may be
larger, since each descriptor may comprise more than one value).
The ``id_offset`` field is the offset of the id string from the start of the
file indicated by the file descriptor. It is a multiple of 8.
The ``desc_offset`` field is the offset of the Descriptors block from the start
of the file indicated by the file descriptor. It is a multiple of 8.
The ``data_offset`` field is the offset of the Stats Data block from the start
of the file indicated by the file descriptor. It is a multiple of 8.
The id string block contains a string which identifies the file descriptor on
which KVM_GET_STATS_FD was invoked. The size of the block, including the
trailing ``'\0'``, is indicated by the ``name_size`` field in the header.
The descriptors block is only needed to be read once for the lifetime of the
file descriptor contains a sequence of ``struct kvm_stats_desc``, each followed
by a string of size ``name_size``.
#define KVM_STATS_TYPE_SHIFT 0
#define KVM_STATS_TYPE_MASK (0xF << KVM_STATS_TYPE_SHIFT)
#define KVM_STATS_TYPE_CUMULATIVE (0x0 << KVM_STATS_TYPE_SHIFT)
#define KVM_STATS_TYPE_INSTANT (0x1 << KVM_STATS_TYPE_SHIFT)
#define KVM_STATS_TYPE_PEAK (0x2 << KVM_STATS_TYPE_SHIFT)
#define KVM_STATS_UNIT_SHIFT 4
#define KVM_STATS_UNIT_MASK (0xF << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_NONE (0x0 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_BYTES (0x1 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_SECONDS (0x2 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_UNIT_CYCLES (0x3 << KVM_STATS_UNIT_SHIFT)
#define KVM_STATS_BASE_SHIFT 8
#define KVM_STATS_BASE_MASK (0xF << KVM_STATS_BASE_SHIFT)
#define KVM_STATS_BASE_POW10 (0x0 << KVM_STATS_BASE_SHIFT)
#define KVM_STATS_BASE_POW2 (0x1 << KVM_STATS_BASE_SHIFT)
struct kvm_stats_desc {
__u32 flags;
__s16 exponent;
__u16 size;
__u32 offset;
__u32 unused;
char name[];
};
The ``flags`` field contains the type and unit of the statistics data described
by this descriptor. Its endianness is CPU native.
The following flags are supported:
Bits 0-3 of ``flags`` encode the type:
* ``KVM_STATS_TYPE_CUMULATIVE``
The statistics data is cumulative. The value of data can only be increased.
Most of the counters used in KVM are of this type.
The corresponding ``size`` field for this type is always 1.
All cumulative statistics data are read/write.
* ``KVM_STATS_TYPE_INSTANT``
The statistics data is instantaneous. Its value can be increased or
decreased. This type is usually used as a measurement of some resources,
like the number of dirty pages, the number of large pages, etc.
All instant statistics are read only.
The corresponding ``size`` field for this type is always 1.
* ``KVM_STATS_TYPE_PEAK``
The statistics data is peak. The value of data can only be increased, and
represents a peak value for a measurement, for example the maximum number
of items in a hash table bucket, the longest time waited and so on.
The corresponding ``size`` field for this type is always 1.
Bits 4-7 of ``flags`` encode the unit:
* ``KVM_STATS_UNIT_NONE``
There is no unit for the value of statistics data. This usually means that
the value is a simple counter of an event.
* ``KVM_STATS_UNIT_BYTES``
It indicates that the statistics data is used to measure memory size, in the
unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
determined by the ``exponent`` field in the descriptor.
* ``KVM_STATS_UNIT_SECONDS``
It indicates that the statistics data is used to measure time or latency.
* ``KVM_STATS_UNIT_CYCLES``
It indicates that the statistics data is used to measure CPU clock cycles.
Bits 8-11 of ``flags``, together with ``exponent``, encode the scale of the
unit:
* ``KVM_STATS_BASE_POW10``
The scale is based on power of 10. It is used for measurement of time and
CPU clock cycles. For example, an exponent of -9 can be used with
``KVM_STATS_UNIT_SECONDS`` to express that the unit is nanoseconds.
* ``KVM_STATS_BASE_POW2``
The scale is based on power of 2. It is used for measurement of memory size.
For example, an exponent of 20 can be used with ``KVM_STATS_UNIT_BYTES`` to
express that the unit is MiB.
The ``size`` field is the number of values of this statistics data. Its
value is usually 1 for most of simple statistics. 1 means it contains an
unsigned 64bit data.
The ``offset`` field is the offset from the start of Data Block to the start of
the corresponding statistics data.
The ``unused`` field is reserved for future support for other types of
statistics data, like log/linear histogram. Its value is always 0 for the types
defined above.
The ``name`` field is the name string of the statistics data. The name string
starts at the end of ``struct kvm_stats_desc``. The maximum length including
the trailing ``'\0'``, is indicated by ``name_size`` in the header.
The Stats Data block contains an array of 64-bit values in the same order
as the descriptors in Descriptors block.
========================
Application code obtains a pointer to the kvm_run structure by
mmap()ing a vcpu fd. From that point, application code can control
execution by changing fields in kvm_run prior to calling the KVM_RUN
ioctl, and obtain information about the reason KVM_RUN returned by
looking up structure members.
::
struct kvm_run {
/* in */
__u8 request_interrupt_window;
Request that KVM_RUN return when it becomes possible to inject external
interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
__u8 immediate_exit;
This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
exits immediately, returning -EINTR. In the common scenario where a
signal is used to "kick" a VCPU out of KVM_RUN, this field can be used
to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
Rather than blocking the signal outside KVM_RUN, userspace can set up
a signal handler that sets run->immediate_exit to a non-zero value.
This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
__u8 padding1[6];
/* out */
__u32 exit_reason;
When KVM_RUN has returned successfully (return value 0), this informs
application code why KVM_RUN has returned. Allowable values for this
field are detailed below.
__u8 ready_for_interrupt_injection;
If request_interrupt_window has been specified, this field indicates
an interrupt can be injected now with KVM_INTERRUPT.
__u8 if_flag;
The value of the current interrupt flag. Only valid if in-kernel
local APIC is not used.
__u16 flags;
More architecture-specific flags detailing state of the VCPU that may
affect the device's behavior. Current defined flags::
/* x86, set if the VCPU is in system management mode */
#define KVM_RUN_X86_SMM (1 << 0)
/* x86, set if bus lock detected in VM */
#define KVM_RUN_BUS_LOCK (1 << 1)
/* in (pre_kvm_run), out (post_kvm_run) */
__u64 cr8;
The value of the cr8 register. Only valid if in-kernel local APIC is
not used. Both input and output.
__u64 apic_base;
The value of the APIC BASE msr. Only valid if in-kernel local
APIC is not used. Both input and output.
union {
/* KVM_EXIT_UNKNOWN */
struct {
__u64 hardware_exit_reason;
} hw;
If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
reasons. Further architecture-specific information is available in
hardware_exit_reason.
/* KVM_EXIT_FAIL_ENTRY */
struct {
__u64 hardware_entry_failure_reason;
__u32 cpu; /* if KVM_LAST_CPU */
} fail_entry;
If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
to unknown reasons. Further architecture-specific information is
available in hardware_entry_failure_reason.
/* KVM_EXIT_EXCEPTION */
struct {
__u32 exception;
__u32 error_code;
} ex;
Unused.
#define KVM_EXIT_IO_IN 0
#define KVM_EXIT_IO_OUT 1
__u8 direction;
__u8 size; /* bytes */
__u16 port;
__u32 count;
__u64 data_offset; /* relative to kvm_run start */
} io;
If exit_reason is KVM_EXIT_IO, then the vcpu has
executed a port I/O instruction which could not be satisfied by kvm.
data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
where kvm expects application code to place the data for the next
KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
struct {
struct kvm_debug_exit_arch arch;
} debug;
If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event
for which architecture specific information is returned.
/* KVM_EXIT_MMIO */
struct {
__u64 phys_addr;
__u8 data[8];
__u32 len;
__u8 is_write;
} mmio;
If exit_reason is KVM_EXIT_MMIO, then the vcpu has
executed a memory-mapped I/O instruction which could not be satisfied
by kvm. The 'data' member contains the written data if 'is_write' is
true, and should be filled by application code otherwise.
The 'data' member contains, in its first 'len' bytes, the value as it would
appear if the VCPU performed a load or store of the appropriate width directly
to the byte array.
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals.
The pending state of the operation is not preserved in state which is
visible to userspace, thus userspace should ensure that the operation is
completed before performing a live migration. Userspace can re-enter the
guest with an unmasked signal pending or with the immediate_exit field set
to complete pending operations without allowing any further instructions
to be executed.
/* KVM_EXIT_HYPERCALL */
struct {
__u64 nr;
__u64 args[6];
__u64 ret;
__u32 longmode;
__u32 pad;
} hypercall;
Unused. This was once used for 'hypercall to userspace'. To implement
such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
.. note:: KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
::
/* KVM_EXIT_TPR_ACCESS */
struct {
__u64 rip;
__u32 is_write;
__u32 pad;
} tpr_access;
To be documented (KVM_TPR_ACCESS_REPORTING).
/* KVM_EXIT_S390_SIEIC */
struct {
__u8 icptcode;
__u64 mask; /* psw upper half */
__u64 addr; /* psw lower half */
__u16 ipa;
__u32 ipb;
} s390_sieic;
s390 specific.
#define KVM_S390_RESET_POR 1
#define KVM_S390_RESET_CLEAR 2
#define KVM_S390_RESET_SUBSYSTEM 4
#define KVM_S390_RESET_CPU_INIT 8
#define KVM_S390_RESET_IPL 16
__u64 s390_reset_flags;
s390 specific.
/* KVM_EXIT_S390_UCONTROL */
struct {
__u64 trans_exc_code;
__u32 pgm_code;
} s390_ucontrol;
s390 specific. A page fault has occurred for a user controlled virtual
machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
resolved by the kernel.
The program code and the translation exception code that were placed
in the cpu's lowcore are presented here as defined by the z Architecture
Principles of Operation Book in the Chapter for Dynamic Address Translation
(DAT)
/* KVM_EXIT_DCR */
struct {
__u32 dcrn;
__u32 data;
__u8 is_write;
} dcr;
/* KVM_EXIT_OSI */
struct {
__u64 gprs[32];
} osi;
MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
hypercalls and exit with this exit struct that contains all the guest gprs.
If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
Userspace can now handle the hypercall and when it's done modify the gprs as
necessary. Upon guest entry all guest GPRs will then be replaced by the values
in this struct.
/* KVM_EXIT_PAPR_HCALL */
struct {
__u64 nr;
__u64 ret;
__u64 args[9];
} papr_hcall;
This is used on 64-bit PowerPC when emulating a pSeries partition,
e.g. with the 'pseries' machine type in qemu. It occurs when the
guest does a hypercall using the 'sc 1' instruction. The 'nr' field
contains the hypercall number (from the guest R3), and 'args' contains
the arguments (from the guest R4 - R12). Userspace should put the
return code in 'ret' and any extra returned values in args[].
The possible hypercalls are defined in the Power Architecture Platform
Requirements (PAPR) document available from www.power.org (free
developer registration required to access it).
/* KVM_EXIT_S390_TSCH */
struct {
__u16 subchannel_id;
__u16 subchannel_nr;
__u32 io_int_parm;
__u32 io_int_word;
__u32 ipb;
__u8 dequeued;
} s390_tsch;
s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
interrupt for the target subchannel has been dequeued and subchannel_id,
subchannel_nr, io_int_parm and io_int_word contain the parameters for that
interrupt. ipb is needed for instruction parameter decoding.
/* KVM_EXIT_EPR */
struct {
__u32 epr;
} epr;
On FSL BookE PowerPC chips, the interrupt controller has a fast patch
interrupt acknowledge path to the core. When the core successfully
delivers an interrupt, it automatically populates the EPR register with
the interrupt vector number and acknowledges the interrupt inside
the interrupt controller.
In case the interrupt controller lives in user space, we need to do
the interrupt acknowledge cycle through it to fetch the next to be
delivered interrupt vector using this exit.
It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
external interrupt has just been delivered into the guest. User space
should put the acknowledged interrupt vector into the 'epr' field.
/* KVM_EXIT_SYSTEM_EVENT */
struct {
#define KVM_SYSTEM_EVENT_SHUTDOWN 1
#define KVM_SYSTEM_EVENT_RESET 2
#define KVM_SYSTEM_EVENT_CRASH 3
__u32 type;
__u64 flags;
} system_event;
If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
a system-level event using some architecture specific mechanism (hypercall
or some special instruction). In case of ARM/ARM64, this is triggered using
HVC instruction based PSCI call from the vcpu. The 'type' field describes
the system-level event type. The 'flags' field describes architecture
specific flags for the system-level event.
Valid values for 'type' are:
- KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
VM. Userspace is not obliged to honour this, and if it does honour
this does not need to destroy the VM synchronously (ie it may call
KVM_RUN again before shutdown finally occurs).
- KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
As with SHUTDOWN, userspace can choose to ignore the request, or
to schedule the reset to occur in the future and may call KVM_RUN again.
- KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
has requested a crash condition maintenance. Userspace can choose
to ignore the request, or to gather VM memory core dump and/or
reset/shutdown of the VM.
/* KVM_EXIT_IOAPIC_EOI */
struct {
__u8 vector;
} eoi;
Indicates that the VCPU's in-kernel local APIC received an EOI for a
level-triggered IOAPIC interrupt. This exit only triggers when the
IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled);
the userspace IOAPIC should process the EOI and retrigger the interrupt if
it is still asserted. Vector is the LAPIC interrupt vector for which the
EOI was received.
#define KVM_EXIT_HYPERV_SYNIC 1
#define KVM_EXIT_HYPERV_HCALL 2
#define KVM_EXIT_HYPERV_SYNDBG 3
__u32 pad1;
__u32 pad2;
__u64 control;
__u64 evt_page;
__u64 msg_page;
} synic;
struct {
__u64 input;
__u64 result;
__u64 params[2];
} hcall;
struct {
__u32 msr;
__u32 pad2;
__u64 control;
__u64 status;
__u64 send_page;
__u64 recv_page;
__u64 pending_page;
} syndbg;
} u;
};
/* KVM_EXIT_HYPERV */
struct kvm_hyperv_exit hyperv;
Indicates that the VCPU exits into userspace to process some tasks
related to Hyper-V emulation.
- KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
Hyper-V SynIC state change. Notification is used to remap SynIC
event/message pages and to enable/disable SynIC messages/events processing
in userspace.
- KVM_EXIT_HYPERV_SYNDBG -- synchronously notify user-space about
Hyper-V Synthetic debugger state change. Notification is used to either update
the pending_page location or to send a control command (send the buffer located
in send_page or recv a buffer to recv_page).
5728
5729
5730
5731
5732
5733
5734
5735
5736
5737
5738
5739
5740
5741
5742
5743
5744
5745
5746
5747
5748
5749
5750
5751
5752
5753
5754
5755
5756
5757
5758
5759
5760
/* KVM_EXIT_ARM_NISV */
struct {
__u64 esr_iss;
__u64 fault_ipa;
} arm_nisv;
Used on arm and arm64 systems. If a guest accesses memory not in a memslot,
KVM will typically return to userspace and ask it to do MMIO emulation on its
behalf. However, for certain classes of instructions, no instruction decode
(direction, length of memory access) is provided, and fetching and decoding
the instruction from the VM is overly complicated to live in the kernel.
Historically, when this situation occurred, KVM would print a warning and kill
the VM. KVM assumed that if the guest accessed non-memslot memory, it was
trying to do I/O, which just couldn't be emulated, and the warning message was
phrased accordingly. However, what happened more often was that a guest bug
caused access outside the guest memory areas which should lead to a more
meaningful warning message and an external abort in the guest, if the access
did not fall within an I/O window.
Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable
this capability at VM creation. Once this is done, these types of errors will
instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from
the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA
in the fault_ipa field. Userspace can either fix up the access if it's
actually an I/O access by decoding the instruction from guest memory (if it's
very brave) and continue executing the guest, or it can decide to suspend,
dump, or restart the guest.
Note that KVM does not skip the faulting instruction as it does for
KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
if it decides to decode and emulate the instruction.
5761
5762
5763
5764
5765
5766
5767
5768
5769
5770
5771
5772
5773
5774
5775
5776
5777
5778
5779
5780
5781
5782
::
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
struct {
__u8 error; /* user -> kernel */
__u8 pad[7];
__u32 reason; /* kernel -> user */
__u32 index; /* kernel -> user */
__u64 data; /* kernel <-> user */
} msr;
Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
exit for writes.
The "reason" field specifies why the MSR trap occurred. User space will only
receive MSR exit traps when a particular reason was requested during through
ENABLE_CAP. Currently valid exit reasons are:
KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM
KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits
KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER
For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
wants to read. To respond to this request with a successful read, user space
writes the respective data into the "data" field and must continue guest
execution to ensure the read data is transferred into guest register state.
If the RDMSR request was unsuccessful, user space indicates that with a "1" in
the "error" field. This will inject a #GP into the guest when the VCPU is
executed again.
For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest
wants to write. Once finished processing the event, user space must continue
vCPU execution. If the MSR write was unsuccessful, user space also sets the
"error" field to "1".
5799
5800
5801
5802
5803
5804
5805
5806
5807
5808
5809
5810
5811
5812
5813
5814
5815
5816
5817
5818
5819
5820
5821
5822
5823
5824
5825
5826
::
struct kvm_xen_exit {
#define KVM_EXIT_XEN_HCALL 1
__u32 type;
union {
struct {
__u32 longmode;
__u32 cpl;
__u64 input;
__u64 result;
__u64 params[6];
} hcall;
} u;
};
/* KVM_EXIT_XEN */
struct kvm_hyperv_exit xen;
Indicates that the VCPU exits into userspace to process some tasks
related to Xen emulation.
Valid values for 'type' are:
- KVM_EXIT_XEN_HCALL -- synchronously notify user-space about Xen hypercall.
Userspace is expected to place the hypercall result into the appropriate
field before invoking KVM_RUN again.
/* Fix the size of the union. */
char padding[256];
};
/*
* shared registers between kvm and userspace.
* kvm_valid_regs specifies the register classes set by the host
* kvm_dirty_regs specified the register classes dirtied by userspace
* struct kvm_sync_regs is architecture specific, as well as the
* bits for kvm_valid_regs and kvm_dirty_regs
*/
__u64 kvm_valid_regs;
__u64 kvm_dirty_regs;
union {
struct kvm_sync_regs regs;
char padding[SYNC_REGS_SIZE_BYTES];
} s;
If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
certain guest registers without having to call SET/GET_*REGS. Thus we can
avoid some system call overhead if userspace has to handle the exit.
Userspace can query the validity of the structure by checking
kvm_valid_regs for specific bits. These bits are architecture specific
and usually define the validity of a groups of registers. (e.g. one bit
for general purpose registers)
Please note that the kernel is allowed to use the kvm_run structure as the
primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
6. Capabilities that can be enabled on vCPUs
============================================
There are certain capabilities that change the behavior of the virtual CPU or
the virtual machine when enabled. To enable them, please see section 4.37.
Below you can find a list of capabilities and what their effect on the vCPU or
the virtual machine is when enabling them.
The following information is provided along with the description:
Architectures:
which instruction set architectures provide this ioctl.
x86 includes both i386 and x86_64.
Target:
whether this is a per-vcpu or per-vm capability.
Parameters:
what parameters are accepted by the capability.
Returns:
the return value. General error numbers (EBADF, ENOMEM, EINVAL)
are not detailed, but errors with specific meanings are.
6.1 KVM_CAP_PPC_OSI
:Architectures: ppc
:Target: vcpu
:Parameters: none
:Returns: 0 on success; -1 on error
This capability enables interception of OSI hypercalls that otherwise would
be treated as normal system calls to be injected into the guest. OSI hypercalls
were invented by Mac-on-Linux to have a standardized communication mechanism
between the guest and the host.
When this capability is enabled, KVM_EXIT_OSI can occur.
6.2 KVM_CAP_PPC_PAPR
:Architectures: ppc
:Target: vcpu
:Parameters: none
:Returns: 0 on success; -1 on error
This capability enables interception of PAPR hypercalls. PAPR hypercalls are
done using the hypercall instruction "sc 1".
It also sets the guest privilege level to "supervisor" mode. Usually the guest
runs in "hypervisor" privilege mode with a few missing features.
In addition to the above, it changes the semantics of SDR1. In this mode, the
HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
HTAB invisible to the guest.
When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
------------------
:Architectures: ppc
:Target: vcpu
:Parameters: args[0] is the address of a struct kvm_config_tlb
:Returns: 0 on success; -1 on error
struct kvm_config_tlb {
__u64 params;
__u64 array;
__u32 mmu_type;
__u32 array_len;
Configures the virtual CPU's TLB array, establishing a shared memory area
between userspace and KVM. The "params" and "array" fields are userspace
addresses of mmu-type-specific data structures. The "array_len" field is an
safety mechanism, and should be set to the size in bytes of the memory that
userspace has reserved for the array. It must be at least the size dictated
by "mmu_type" and "params".
While KVM_RUN is active, the shared region is under control of KVM. Its
contents are undefined, and any modification by userspace results in
boundedly undefined behavior.
On return from KVM_RUN, the shared region will reflect the current state of
the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB
to tell KVM which entries have been changed, prior to calling KVM_RUN again
on this vcpu.
For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
- The "params" field is of type "struct kvm_book3e_206_tlb_params".
- The "array" field points to an array of type "struct
kvm_book3e_206_tlb_entry".
- The array consists of all entries in the first TLB, followed by all
entries in the second TLB.
- Within a TLB, entries are ordered first by increasing set number. Within a
set, entries are ordered by way (increasing ESEL).
- The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
- The tsize field of mas1 shall be set to 4K on TLB0, even though the
hardware ignores this value for TLB0.
6.4 KVM_CAP_S390_CSS_SUPPORT
----------------------------
:Architectures: s390
:Target: vcpu
:Parameters: none
:Returns: 0 on success; -1 on error
This capability enables support for handling of channel I/O instructions.
TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
handled in-kernel, while the other I/O instructions are passed to userspace.
When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
SUBCHANNEL intercepts.
Note that even though this capability is enabled per-vcpu, the complete
virtual machine is affected.
:Architectures: ppc
:Target: vcpu
:Parameters: args[0] defines whether the proxy facility is active
:Returns: 0 on success; -1 on error