Newer
Older
The Definitive KVM (Kernel-based Virtual Machine) API Documentation
===================================================================
1. General description
The kvm API is a set of ioctls that are issued to control various aspects
of a virtual machine. The ioctls belong to three classes:
- System ioctls: These query and set global attributes which affect the
whole kvm subsystem. In addition a system ioctl is used to create
virtual machines.
- VM ioctls: These query and set attributes that affect an entire virtual
machine, for example memory layout. In addition a VM ioctl is used to
create virtual cpus (vcpus) and devices.
VM ioctls must be issued from the same process (address space) that was
used to create the VM.
- vcpu ioctls: These query and set attributes that control the operation
of a single virtual cpu.
vcpu ioctls should be issued from the same thread that was used to create
the vcpu, except for asynchronous vcpu ioctl that are marked as such in
the documentation. Otherwise, the first ioctl after switching threads
could see a performance impact.
- device ioctls: These query and set attributes that control the operation
of a single device.
device ioctls must be issued from the same process (address space) that
was used to create the VM.
The kvm API is centered around file descriptors. An initial
open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
handle will create a VM file descriptor which can be used to issue VM
ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
create a virtual cpu or device and return a file descriptor pointing to
the new resource. Finally, ioctls on a vcpu or device fd can be used
to control the vcpu or device. For vcpus, this includes the important
task of actually running guest code.
In general file descriptors can be migrated among processes by means
of fork() and the SCM_RIGHTS facility of unix domain socket. These
kinds of tricks are explicitly not supported by kvm. While they will
not cause harm to the host, their actual behavior is not guaranteed by
the API. See "General description" for details on the ioctl usage
model that is supported by KVM.
It is important to note that althought VM ioctls may only be issued from
the process that created the VM, a VM's lifecycle is associated with its
file descriptor, not its creator (process). In other words, the VM and
its resources, *including the associated address space*, are not freed
until the last reference to the VM's file descriptor has been released.
For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will
not be freed until both the parent (original) process and its child have
put their references to the VM's file descriptor.
Because a VM's resources are not freed until the last reference to its
file descriptor is released, creating additional references to a VM via
via fork(), dup(), etc... without careful consideration is strongly
discouraged and may have unwanted side effects, e.g. memory allocated
by and on behalf of the VM's process may not be freed/unaccounted when
the VM is shut down.
As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
incompatible change are allowed. However, there is an extension
facility that allows backward-compatible extensions to the API to be
queried and used.
The extension mechanism is not based on the Linux version number.
Instead, kvm defines extension identifiers and a facility to query
whether a particular extension identifier is available. If it is, a
set of ioctls is available for application use.
This section describes ioctls that can be used to control kvm guests.
For each ioctl, the following information is provided along with a
description:
Capability: which KVM extension provides this ioctl. Can be 'basic',
which means that is will be provided by any kernel that supports
API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
means availability needs to be checked with KVM_CHECK_EXTENSION
(see section 4.4), or 'none' which means that while not all kernels
support this ioctl, there's no capability bit to check its
availability: for kernels that don't support the ioctl,
the ioctl returns -ENOTTY.
Architectures: which instruction set architectures provide this ioctl.
x86 includes both i386 and x86_64.
Type: system, vm, or vcpu.
Parameters: what parameters are accepted by the ioctl.
Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL)
are not detailed, but errors with specific meanings are.
4.1 KVM_GET_API_VERSION
Capability: basic
Architectures: all
Type: system ioctl
Parameters: none
Returns: the constant KVM_API_VERSION (=12)
This identifies the API version as the stable kvm API. It is not
expected that this number will change. However, Linux 2.6.20 and
2.6.21 report earlier versions; these are not documented and not
supported. Applications should refuse to run if KVM_GET_API_VERSION
returns a value other than 12. If this check passes, all ioctls
described as 'basic' will be available.
4.2 KVM_CREATE_VM
Capability: basic
Architectures: all
Type: system ioctl
Parameters: machine type identifier (KVM_VM_*)
Returns: a VM fd that can be used to control the new virtual machine.
The new VM has no virtual cpus and no memory.
You probably want to use 0 as machine type.
In order to create user controlled virtual machines on S390, check
KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
privileged user (CAP_SYS_ADMIN).
To use hardware assisted virtualization on MIPS (VZ ASE) rather than
the default trap & emulate implementation (which changes the virtual
memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
flag KVM_VM_MIPS_VZ.
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
On arm64, the physical address size for a VM (IPA Size limit) is limited
to 40bits by default. The limit can be configured if the host supports the
extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
identifier, where IPA_Bits is the maximum width of any physical
address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
machine type identifier.
e.g, to configure a guest to use 48bit physical address size :
vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48));
The requested size (IPA_Bits) must be :
0 - Implies default size, 40bits (for backward compatibility)
or
N - Implies N bits, where N is a positive integer such that,
32 <= N <= Host_IPA_Limit
Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and
is dependent on the CPU capability and the kernel configuration. The limit can
be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION
ioctl() at run-time.
Please note that configuring the IPA size does not affect the capability
exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
size of the address translated by the stage2 level (guest physical to
host physical address translations).
4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST
Type: system ioctl
Parameters: struct kvm_msr_list (in/out)
Returns: 0 on success; -1 on error
Errors:
EFAULT: the msr index list cannot be read from or written to
E2BIG: the msr index list is to be to fit in the array specified by
the user.
struct kvm_msr_list {
__u32 nmsrs; /* number of msrs in entries */
__u32 indices[0];
};
The user fills in the size of the indices array in nmsrs, and in return
kvm adjusts nmsrs to reflect the actual number of msrs and fills in the
indices array with their numbers.
Loading
Loading full blame...