api.txt

Specifying exception.has_esr on a system that does not support it will return
-EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
will return -EINVAL.

struct kvm_vcpu_events {
	struct {
		__u8 serror_pending;
		__u8 serror_has_esr;
		/* Align it to 8 bytes */
		__u8 pad[6];
		__u64 serror_esr;
	} exception;
	__u32 reserved[12];
};

4.32 KVM_SET_VCPU_EVENTS

Capability: KVM_CAP_VCPU_EVENTS
Extended by: KVM_CAP_INTR_SHADOW
Architectures: x86, arm, arm64
Type: vcpu ioctl
Parameters: struct kvm_vcpu_event (in)
Returns: 0 on success, -1 on error

X86:

Set pending exceptions, interrupts, and NMIs as well as related states of the
vcpu.

See KVM_GET_VCPU_EVENTS for the data structure.

Fields that may be modified asynchronously by running VCPUs can be excluded
from the update. These fields are nmi.pending, sipi_vector, smi.smm,
smi.pending. Keep the corresponding bits in the flags field cleared to
suppress overwriting the current in-kernel state. The bits are:

KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
KVM_VCPUEVENT_VALID_SMM         - transfer the smi sub-struct.

If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
the flags field to signal that interrupt.shadow contains a valid state and
shall be written into the VCPU.

KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.

If KVM_CAP_EXCEPTION_PAYLOAD is enabled, KVM_VCPUEVENT_VALID_PAYLOAD
can be set in the flags field to signal that the
exception_has_payload, exception_payload, and exception.pending fields
contain a valid state and shall be written into the VCPU.

ARM/ARM64:

Set the pending SError exception state for this VCPU. It is not possible to
'cancel' an Serror that has been made pending.

See KVM_GET_VCPU_EVENTS for the data structure.


4.33 KVM_GET_DEBUGREGS

Capability: KVM_CAP_DEBUGREGS
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_debugregs (out)
Returns: 0 on success, -1 on error

Reads debug registers from the vcpu.

struct kvm_debugregs {
	__u64 db[4];
	__u64 dr6;
	__u64 dr7;
	__u64 flags;
	__u64 reserved[9];
};


4.34 KVM_SET_DEBUGREGS

Capability: KVM_CAP_DEBUGREGS
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_debugregs (in)
Returns: 0 on success, -1 on error

Writes debug registers into the vcpu.

See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.


4.35 KVM_SET_USER_MEMORY_REGION

Capability: KVM_CAP_USER_MEMORY
Architectures: all
Type: vm ioctl
Parameters: struct kvm_userspace_memory_region (in)
Returns: 0 on success, -1 on error

struct kvm_userspace_memory_region {
	__u32 slot;
	__u32 flags;
	__u64 guest_phys_addr;
	__u64 memory_size; /* bytes */
	__u64 userspace_addr; /* start of the userspace allocated memory */
};

/* for kvm_memory_region::flags */
#define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
#define KVM_MEM_READONLY	(1UL << 1)

This ioctl allows the user to create, modify or delete a guest physical
memory slot.  Bits 0-15 of "slot" specify the slot id and this value
should be less than the maximum number of user memory slots supported per
VM.  The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS.
Slots may not overlap in guest physical address space.

If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
specifies the address space which is being modified.  They must be
less than the value that KVM_CHECK_EXTENSION returns for the
KVM_CAP_MULTI_ADDRESS_SPACE capability.  Slots in separate address spaces
are unrelated; the restriction on overlapping slots only applies within
each address space.

Deleting a slot is done by passing zero for memory_size.  When changing
an existing slot, it may be moved in the guest physical memory space,
or its flags may be modified, but it may not be resized.

Memory for the region is taken starting at the address denoted by the
field userspace_addr, which must point at user addressable memory for
the entire memory slot size.  Any object may back this memory, including
anonymous memory, ordinary files, and hugetlbfs.

It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
be identical.  This allows large pages in the guest to be backed by large
pages in the host.

The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
to make a new slot read-only.  In this case, writes to this memory will be
posted to userspace as KVM_EXIT_MMIO exits.

When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
the memory region are automatically reflected into the guest.  For example, an
mmap() that affects the region will be made visible immediately.  Another
example is madvise(MADV_DROP).

It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.


4.36 KVM_SET_TSS_ADDR

Capability: KVM_CAP_SET_TSS_ADDR
Architectures: x86
Type: vm ioctl
Parameters: unsigned long tss_address (in)
Returns: 0 on success, -1 on error

This ioctl defines the physical address of a three-page region in the guest
physical address space.  The region must be within the first 4GB of the
guest physical address space and must not conflict with any memory slot
or any mmio address.  The guest may malfunction if it accesses this memory
region.

This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).


4.37 KVM_ENABLE_CAP

Capability: KVM_CAP_ENABLE_CAP
Architectures: mips, ppc, s390
Type: vcpu ioctl
Parameters: struct kvm_enable_cap (in)
Returns: 0 on success; -1 on error

Capability: KVM_CAP_ENABLE_CAP_VM
Architectures: all
Type: vcpu ioctl
Parameters: struct kvm_enable_cap (in)
Returns: 0 on success; -1 on error

+Not all extensions are enabled by default. Using this ioctl the application
can enable an extension, making it available to the guest.

On systems that do not support this ioctl, it always fails. On systems that
do support it, it only works for extensions that are supported for enablement.

To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
be used.

struct kvm_enable_cap {
       /* in */
       __u32 cap;

The capability that is supposed to get enabled.

       __u32 flags;

A bitfield indicating future enhancements. Has to be 0 for now.

       __u64 args[4];

Arguments for enabling a feature. If a feature needs initial values to
function properly, this is the place to put them.

       __u8  pad[64];
};

The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.

4.38 KVM_GET_MP_STATE

Capability: KVM_CAP_MP_STATE
Architectures: x86, s390, arm, arm64
Type: vcpu ioctl
Parameters: struct kvm_mp_state (out)
Returns: 0 on success; -1 on error

struct kvm_mp_state {
	__u32 mp_state;
};

Returns the vcpu's current "multiprocessing state" (though also valid on
uniprocessor guests).

Possible values are:

 - KVM_MP_STATE_RUNNABLE:        the vcpu is currently running [x86,arm/arm64]
 - KVM_MP_STATE_UNINITIALIZED:   the vcpu is an application processor (AP)
                                 which has not yet received an INIT signal [x86]
 - KVM_MP_STATE_INIT_RECEIVED:   the vcpu has received an INIT signal, and is
                                 now ready for a SIPI [x86]
 - KVM_MP_STATE_HALTED:          the vcpu has executed a HLT instruction and
                                 is waiting for an interrupt [x86]
 - KVM_MP_STATE_SIPI_RECEIVED:   the vcpu has just received a SIPI (vector
                                 accessible via KVM_GET_VCPU_EVENTS) [x86]
 - KVM_MP_STATE_STOPPED:         the vcpu is stopped [s390,arm/arm64]
 - KVM_MP_STATE_CHECK_STOP:      the vcpu is in a special error state [s390]
 - KVM_MP_STATE_OPERATING:       the vcpu is operating (running or halted)
                                 [s390]
 - KVM_MP_STATE_LOAD:            the vcpu is in a special load/startup state
                                 [s390]

On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.

For arm/arm64:

The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.

4.39 KVM_SET_MP_STATE

Capability: KVM_CAP_MP_STATE
Architectures: x86, s390, arm, arm64
Type: vcpu ioctl
Parameters: struct kvm_mp_state (in)
Returns: 0 on success; -1 on error

Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
arguments.

On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.

For arm/arm64:

The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.

4.40 KVM_SET_IDENTITY_MAP_ADDR

Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
Architectures: x86
Type: vm ioctl
Parameters: unsigned long identity (in)
Returns: 0 on success, -1 on error

This ioctl defines the physical address of a one-page region in the guest
physical address space.  The region must be within the first 4GB of the
guest physical address space and must not conflict with any memory slot
or any mmio address.  The guest may malfunction if it accesses this memory
region.

Setting the address to 0 will result in resetting the address to its default
(0xfffbc000).

This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).

Fails if any VCPU has already been created.

4.41 KVM_SET_BOOT_CPU_ID

Capability: KVM_CAP_SET_BOOT_CPU_ID
Architectures: x86
Type: vm ioctl
Parameters: unsigned long vcpu_id
Returns: 0 on success, -1 on error

Define which vcpu is the Bootstrap Processor (BSP).  Values are the same
as the vcpu id in KVM_CREATE_VCPU.  If this ioctl is not called, the default
is vcpu 0.


4.42 KVM_GET_XSAVE

Capability: KVM_CAP_XSAVE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xsave (out)
Returns: 0 on success, -1 on error

struct kvm_xsave {
	__u32 region[1024];
};

This ioctl would copy current vcpu's xsave struct to the userspace.


4.43 KVM_SET_XSAVE

Capability: KVM_CAP_XSAVE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xsave (in)
Returns: 0 on success, -1 on error

struct kvm_xsave {
	__u32 region[1024];
};

This ioctl would copy userspace's xsave struct to the kernel.


4.44 KVM_GET_XCRS

Capability: KVM_CAP_XCRS
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xcrs (out)
Returns: 0 on success, -1 on error

struct kvm_xcr {
	__u32 xcr;
	__u32 reserved;
	__u64 value;
};

struct kvm_xcrs {
	__u32 nr_xcrs;
	__u32 flags;
	struct kvm_xcr xcrs[KVM_MAX_XCRS];
	__u64 padding[16];
};

This ioctl would copy current vcpu's xcrs to the userspace.


4.45 KVM_SET_XCRS

Capability: KVM_CAP_XCRS
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_xcrs (in)
Returns: 0 on success, -1 on error

struct kvm_xcr {
	__u32 xcr;
	__u32 reserved;
	__u64 value;
};

struct kvm_xcrs {
	__u32 nr_xcrs;
	__u32 flags;
	struct kvm_xcr xcrs[KVM_MAX_XCRS];
	__u64 padding[16];
};

This ioctl would set vcpu's xcr to the value userspace specified.


4.46 KVM_GET_SUPPORTED_CPUID

Capability: KVM_CAP_EXT_CPUID
Architectures: x86
Type: system ioctl
Parameters: struct kvm_cpuid2 (in/out)
Returns: 0 on success, -1 on error

struct kvm_cpuid2 {
	__u32 nent;
	__u32 padding;
	struct kvm_cpuid_entry2 entries[0];
};

#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
#define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
#define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)

struct kvm_cpuid_entry2 {
	__u32 function;
	__u32 index;
	__u32 flags;
	__u32 eax;
	__u32 ebx;
	__u32 ecx;
	__u32 edx;
	__u32 padding[3];
};

This ioctl returns x86 cpuid features which are supported by both the
hardware and kvm in its default configuration.  Userspace can use the
information returned by this ioctl to construct cpuid information (for
KVM_SET_CPUID2) that is consistent with hardware, kernel, and
userspace capabilities, and with user requirements (for example, the
user may wish to constrain cpuid to emulate older hardware, or for
feature consistency across a cluster).

Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may
expose cpuid features (e.g. MONITOR) which are not supported by kvm in
its default configuration. If userspace enables such capabilities, it
is responsible for modifying the results of this ioctl appropriately.

Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
with the 'nent' field indicating the number of entries in the variable-size
array 'entries'.  If the number of entries is too low to describe the cpu
capabilities, an error (E2BIG) is returned.  If the number is too high,
the 'nent' field is adjusted and an error (ENOMEM) is returned.  If the
number is just right, the 'nent' field is adjusted to the number of valid
entries in the 'entries' array, which is then filled.

The entries returned are the host cpuid as returned by the cpuid instruction,
with unknown or unsupported features masked out.  Some features (for example,
x2apic), may not be present in the host cpu, but are exposed by kvm if it can
emulate them efficiently. The fields in each entry are defined as follows:

  function: the eax value used to obtain the entry
  index: the ecx value used to obtain the entry (for entries that are
         affected by ecx)
  flags: an OR of zero or more of the following:
        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
           if the index field is valid
        KVM_CPUID_FLAG_STATEFUL_FUNC:
           if cpuid for this function returns different values for successive
           invocations; there will be several entries with the same function,
           all with this flag set
        KVM_CPUID_FLAG_STATE_READ_NEXT:
           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
           the first entry to be read by a cpu
   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
         this function/index combination

The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
support.  Instead it is reported via

  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)

if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.


4.47 KVM_PPC_GET_PVINFO

Capability: KVM_CAP_PPC_GET_PVINFO
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_pvinfo (out)
Returns: 0 on success, !0 on error

struct kvm_ppc_pvinfo {
	__u32 flags;
	__u32 hcall[4];
	__u8  pad[108];
};

This ioctl fetches PV specific information that need to be passed to the guest
using the device tree or other means from vm context.

The hcall array defines 4 instructions that make up a hypercall.

If any additional field gets added to this structure later on, a bit for that
additional piece of information will be set in the flags bitmap.

The flags bitmap is defined as:

   /* the host supports the ePAPR idle hcall
   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)

4.52 KVM_SET_GSI_ROUTING

Capability: KVM_CAP_IRQ_ROUTING
Architectures: x86 s390 arm arm64
Type: vm ioctl
Parameters: struct kvm_irq_routing (in)
Returns: 0 on success, -1 on error

Sets the GSI routing table entries, overwriting any previously set entries.

On arm/arm64, GSI routing has the following limitation:
- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.

struct kvm_irq_routing {
	__u32 nr;
	__u32 flags;
	struct kvm_irq_routing_entry entries[0];
};

No flags are specified so far, the corresponding field must be set to zero.

struct kvm_irq_routing_entry {
	__u32 gsi;
	__u32 type;
	__u32 flags;
	__u32 pad;
	union {
		struct kvm_irq_routing_irqchip irqchip;
		struct kvm_irq_routing_msi msi;
		struct kvm_irq_routing_s390_adapter adapter;
		struct kvm_irq_routing_hv_sint hv_sint;
		__u32 pad[8];
	} u;
};

/* gsi routing entry types */
#define KVM_IRQ_ROUTING_IRQCHIP 1
#define KVM_IRQ_ROUTING_MSI 2
#define KVM_IRQ_ROUTING_S390_ADAPTER 3
#define KVM_IRQ_ROUTING_HV_SINT 4

flags:
- KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry
  type, specifies that the devid field contains a valid value.  The per-VM
  KVM_CAP_MSI_DEVID capability advertises the requirement to provide
  the device ID.  If this capability is not available, userspace should
  never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
- zero otherwise

struct kvm_irq_routing_irqchip {
	__u32 irqchip;
	__u32 pin;
};

struct kvm_irq_routing_msi {
	__u32 address_lo;
	__u32 address_hi;
	__u32 data;
	union {
		__u32 pad;
		__u32 devid;
	};
};

If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
for the device that wrote the MSI message.  For PCI, this is usually a
BFD identifier in the lower 16 bits.

On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
feature of KVM_CAP_X2APIC_API capability is enabled.  If it is enabled,
address_hi bits 31-8 provide bits 31-8 of the destination id.  Bits 7-0 of
address_hi must be zero.

struct kvm_irq_routing_s390_adapter {
	__u64 ind_addr;
	__u64 summary_addr;
	__u64 ind_offset;
	__u32 summary_offset;
	__u32 adapter_id;
};

struct kvm_irq_routing_hv_sint {
	__u32 vcpu;
	__u32 sint;
};


4.55 KVM_SET_TSC_KHZ

Capability: KVM_CAP_TSC_CONTROL
Architectures: x86
Type: vcpu ioctl
Parameters: virtual tsc_khz
Returns: 0 on success, -1 on error

Specifies the tsc frequency for the virtual machine. The unit of the
frequency is KHz.


4.56 KVM_GET_TSC_KHZ

Capability: KVM_CAP_GET_TSC_KHZ
Architectures: x86
Type: vcpu ioctl
Parameters: none
Returns: virtual tsc-khz on success, negative value on error

Returns the tsc frequency of the guest. The unit of the return value is
KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
error.


4.57 KVM_GET_LAPIC

Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (out)
Returns: 0 on success, -1 on error

#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
	char regs[KVM_APIC_REG_SIZE];
};

Reads the Local APIC registers and copies them into the input argument.  The
data format and layout are the same as documented in the architecture manual.

If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
enabled, then the format of APIC_ID register depends on the APIC mode
(reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores APIC ID in
the APIC_ID register (bytes 32-35).  xAPIC only allows an 8-bit APIC ID
which is stored in bits 31-24 of the APIC register, or equivalently in
byte 35 of struct kvm_lapic_state's regs field.  KVM_GET_LAPIC must then
be called after MSR_IA32_APICBASE has been set with KVM_SET_MSR.

If KVM_X2APIC_API_USE_32BIT_IDS feature is disabled, struct kvm_lapic_state
always uses xAPIC format.


4.58 KVM_SET_LAPIC

Capability: KVM_CAP_IRQCHIP
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_lapic_state (in)
Returns: 0 on success, -1 on error

#define KVM_APIC_REG_SIZE 0x400
struct kvm_lapic_state {
	char regs[KVM_APIC_REG_SIZE];
};

Copies the input argument into the Local APIC registers.  The data format
and layout are the same as documented in the architecture manual.

The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
See the note in KVM_GET_LAPIC.


4.59 KVM_IOEVENTFD

Capability: KVM_CAP_IOEVENTFD
Architectures: all
Type: vm ioctl
Parameters: struct kvm_ioeventfd (in)
Returns: 0 on success, !0 on error

This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
within the guest.  A guest write in the registered address will signal the
provided event instead of triggering an exit.

struct kvm_ioeventfd {
	__u64 datamatch;
	__u64 addr;        /* legal pio/mmio address */
	__u32 len;         /* 0, 1, 2, 4, or 8 bytes    */
	__s32 fd;
	__u32 flags;
	__u8  pad[36];
};

For the special case of virtio-ccw devices on s390, the ioevent is matched
to a subchannel/virtqueue tuple instead.

The following flags are defined:

#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
#define KVM_IOEVENTFD_FLAG_PIO       (1 << kvm_ioeventfd_flag_nr_pio)
#define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)

If datamatch flag is set, the event will be signaled only if the written value
to the registered address is equal to datamatch in struct kvm_ioeventfd.

For virtio-ccw devices, addr contains the subchannel id and datamatch the
virtqueue index.

With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and
the kernel will ignore the length of guest write and may get a faster vmexit.
The speedup may only apply to specific architectures, but the ioeventfd will
work anyway.

4.60 KVM_DIRTY_TLB

Capability: KVM_CAP_SW_TLB
Architectures: ppc
Type: vcpu ioctl
Parameters: struct kvm_dirty_tlb (in)
Returns: 0 on success, -1 on error

struct kvm_dirty_tlb {
	__u64 bitmap;
	__u32 num_dirty;
};

This must be called whenever userspace has changed an entry in the shared
TLB, prior to calling KVM_RUN on the associated vcpu.

The "bitmap" field is the userspace address of an array.  This array
consists of a number of bits, equal to the total number of TLB entries as
determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
nearest multiple of 64.

Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
array.

The array is little-endian: the bit 0 is the least significant bit of the
first byte, bit 8 is the least significant bit of the second byte, etc.
This avoids any complications with differing word sizes.

The "num_dirty" field is a performance hint for KVM to determine whether it
should skip processing the bitmap and just invalidate everything.  It must
be set to the number of set bits in the bitmap.


4.62 KVM_CREATE_SPAPR_TCE

Capability: KVM_CAP_SPAPR_TCE
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_create_spapr_tce (in)
Returns: file descriptor for manipulating the created TCE table

This creates a virtual TCE (translation control entry) table, which
is an IOMMU for PAPR-style virtual I/O.  It is used to translate
logical addresses used in virtual I/O into guest physical addresses,
and provides a scatter/gather capability for PAPR virtual I/O.

/* for KVM_CAP_SPAPR_TCE */
struct kvm_create_spapr_tce {
	__u64 liobn;
	__u32 window_size;
};

The liobn field gives the logical IO bus number for which to create a
TCE table.  The window_size field specifies the size of the DMA window
which this TCE table will translate - the table will contain one 64
bit TCE entry for every 4kiB of the DMA window.

When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
table has been created using this ioctl(), the kernel will handle it
in real mode, updating the TCE table.  H_PUT_TCE calls for other
liobns will cause a vm exit and must be handled by userspace.

The return value is a file descriptor which can be passed to mmap(2)
to map the created TCE table into userspace.  This lets userspace read
the entries written by kernel-handled H_PUT_TCE calls, and also lets
userspace update the TCE table directly which is useful in some
circumstances.


4.63 KVM_ALLOCATE_RMA

Capability: KVM_CAP_PPC_RMA
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_allocate_rma (out)
Returns: file descriptor for mapping the allocated RMA

This allocates a Real Mode Area (RMA) from the pool allocated at boot
time by the kernel.  An RMA is a physically-contiguous, aligned region
of memory used on older POWER processors to provide the memory which
will be accessed by real-mode (MMU off) accesses in a KVM guest.
POWER processors support a set of sizes for the RMA that usually
includes 64MB, 128MB, 256MB and some larger powers of two.

/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
	__u64 rma_size;
};

The return value is a file descriptor which can be passed to mmap(2)
to map the allocated RMA into userspace.  The mapped area can then be
passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
RMA for a virtual machine.  The size of the RMA in bytes (which is
fixed at host kernel boot time) is returned in the rma_size field of
the argument structure.

The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
is supported; 2 if the processor requires all virtual machines to have
an RMA, or 1 if the processor can use an RMA but doesn't require it,
because it supports the Virtual RMA (VRMA) facility.


4.64 KVM_NMI

Capability: KVM_CAP_USER_NMI
Architectures: x86
Type: vcpu ioctl
Parameters: none
Returns: 0 on success, -1 on error

Queues an NMI on the thread's vcpu.  Note this is well defined only
when KVM_CREATE_IRQCHIP has not been called, since this is an interface
between the virtual cpu core and virtual local APIC.  After KVM_CREATE_IRQCHIP
has been called, this interface is completely emulated within the kernel.

To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
following algorithm:

  - pause the vcpu
  - read the local APIC's state (KVM_GET_LAPIC)
  - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
  - if so, issue KVM_NMI
  - resume the vcpu

Some guests configure the LINT1 NMI input to cause a panic, aiding in
debugging.


4.65 KVM_S390_UCAS_MAP

Capability: KVM_CAP_S390_UCONTROL
Architectures: s390
Type: vcpu ioctl
Parameters: struct kvm_s390_ucas_mapping (in)
Returns: 0 in case of success

The parameter is defined like this:
	struct kvm_s390_ucas_mapping {
		__u64 user_addr;
		__u64 vcpu_addr;
		__u64 length;
	};

This ioctl maps the memory at "user_addr" with the length "length" to
the vcpu's address space starting at "vcpu_addr". All parameters need to
be aligned by 1 megabyte.


4.66 KVM_S390_UCAS_UNMAP

Capability: KVM_CAP_S390_UCONTROL
Architectures: s390
Type: vcpu ioctl
Parameters: struct kvm_s390_ucas_mapping (in)
Returns: 0 in case of success

The parameter is defined like this:
	struct kvm_s390_ucas_mapping {
		__u64 user_addr;
		__u64 vcpu_addr;
		__u64 length;
	};

This ioctl unmaps the memory in the vcpu's address space starting at
"vcpu_addr" with the length "length". The field "user_addr" is ignored.
All parameters need to be aligned by 1 megabyte.


4.67 KVM_S390_VCPU_FAULT

Capability: KVM_CAP_S390_UCONTROL
Architectures: s390
Type: vcpu ioctl
Parameters: vcpu absolute address (in)
Returns: 0 in case of success

This call creates a page table entry on the virtual cpu's address space
(for user controlled virtual machines) or the virtual machine's address
space (for regular virtual machines). This only works for minor faults,
thus it's recommended to access subject memory page via the user page
table upfront. This is useful to handle validity intercepts for user
controlled virtual machines to fault in the virtual cpu's lowcore pages
prior to calling the KVM_RUN ioctl.


4.68 KVM_SET_ONE_REG

Capability: KVM_CAP_ONE_REG
Architectures: all
Type: vcpu ioctl
Parameters: struct kvm_one_reg (in)
Returns: 0 on success, negative value on failure
Errors:
  ENOENT:   no such register
  EINVAL:   invalid register ID, or no such register
  EPERM:    (arm64) register access not allowed before vcpu finalization
(These error codes are indicative only: do not rely on a specific error
code being returned in a specific situation.)

struct kvm_one_reg {
       __u64 id;
       __u64 addr;
};

Using this ioctl, a single vcpu register can be set to a specific value
defined by user space with the passed in struct kvm_one_reg, where id
refers to the register identifier as described below and addr is a pointer
to a variable with the respective size. There can be architecture agnostic
and architecture specific registers. Each have their own range of operation
and their own constants and width. To keep track of the implemented
registers, find a list below:

  Arch  |           Register            | Width (bits)
        |                               |
  PPC   | KVM_REG_PPC_HIOR              | 64
  PPC   | KVM_REG_PPC_IAC1              | 64
  PPC   | KVM_REG_PPC_IAC2              | 64
  PPC   | KVM_REG_PPC_IAC3              | 64
  PPC   | KVM_REG_PPC_IAC4              | 64
  PPC   | KVM_REG_PPC_DAC1              | 64
  PPC   | KVM_REG_PPC_DAC2              | 64
  PPC   | KVM_REG_PPC_DABR              | 64
  PPC   | KVM_REG_PPC_DSCR              | 64
  PPC   | KVM_REG_PPC_PURR              | 64
  PPC   | KVM_REG_PPC_SPURR             | 64
  PPC   | KVM_REG_PPC_DAR               | 64
  PPC   | KVM_REG_PPC_DSISR             | 32
  PPC   | KVM_REG_PPC_AMR               | 64
  PPC   | KVM_REG_PPC_UAMOR             | 64
  PPC   | KVM_REG_PPC_MMCR0             | 64
  PPC   | KVM_REG_PPC_MMCR1             | 64
  PPC   | KVM_REG_PPC_MMCRA             | 64
  PPC   | KVM_REG_PPC_MMCR2             | 64
  PPC   | KVM_REG_PPC_MMCRS             | 64
  PPC   | KVM_REG_PPC_SIAR              | 64
  PPC   | KVM_REG_PPC_SDAR              | 64
  PPC   | KVM_REG_PPC_SIER              | 64
  PPC   | KVM_REG_PPC_PMC1              | 32
  PPC   | KVM_REG_PPC_PMC2              | 32
  PPC   | KVM_REG_PPC_PMC3              | 32
  PPC   | KVM_REG_PPC_PMC4              | 32
  PPC   | KVM_REG_PPC_PMC5              | 32
  PPC   | KVM_REG_PPC_PMC6              | 32
  PPC   | KVM_REG_PPC_PMC7              | 32
  PPC   | KVM_REG_PPC_PMC8              | 32
  PPC   | KVM_REG_PPC_FPR0              | 64
          ...
  PPC   | KVM_REG_PPC_FPR31             | 64
  PPC   | KVM_REG_PPC_VR0               | 128
          ...
  PPC   | KVM_REG_PPC_VR31              | 128
  PPC   | KVM_REG_PPC_VSR0              | 128
          ...
  PPC   | KVM_REG_PPC_VSR31             | 128
  PPC   | KVM_REG_PPC_FPSCR             | 64
  PPC   | KVM_REG_PPC_VSCR              | 32
  PPC   | KVM_REG_PPC_VPA_ADDR          | 64
  PPC   | KVM_REG_PPC_VPA_SLB           | 128
  PPC   | KVM_REG_PPC_VPA_DTL           | 128
  PPC   | KVM_REG_PPC_EPCR              | 32
  PPC   | KVM_REG_PPC_EPR               | 32
  PPC   | KVM_REG_PPC_TCR               | 32
  PPC   | KVM_REG_PPC_TSR               | 32
  PPC   | KVM_REG_PPC_OR_TSR            | 32
  PPC   | KVM_REG_PPC_CLEAR_TSR         | 32
  PPC   | KVM_REG_PPC_MAS0              | 32
  PPC   | KVM_REG_PPC_MAS1              | 32
  PPC   | KVM_REG_PPC_MAS2              | 64
  PPC   | KVM_REG_PPC_MAS7_3            | 64
  PPC   | KVM_REG_PPC_MAS4              | 32
  PPC   | KVM_REG_PPC_MAS6              | 32
  PPC   | KVM_REG_PPC_MMUCFG            | 32
  PPC   | KVM_REG_PPC_TLB0CFG           | 32
  PPC   | KVM_REG_PPC_TLB1CFG           | 32
  PPC   | KVM_REG_PPC_TLB2CFG           | 32
  PPC   | KVM_REG_PPC_TLB3CFG           | 32
  PPC   | KVM_REG_PPC_TLB0PS            | 32
  PPC   | KVM_REG_PPC_TLB1PS            | 32
  PPC   | KVM_REG_PPC_TLB2PS            | 32
  PPC   | KVM_REG_PPC_TLB3PS            | 32
  PPC   | KVM_REG_PPC_EPTCFG            | 32
  PPC   | KVM_REG_PPC_ICP_STATE         | 64
  PPC   | KVM_REG_PPC_VP_STATE          | 128
  PPC   | KVM_REG_PPC_TB_OFFSET         | 64
  PPC   | KVM_REG_PPC_SPMC1             | 32
  PPC   | KVM_REG_PPC_SPMC2             | 32
  PPC   | KVM_REG_PPC_IAMR              | 64
  PPC   | KVM_REG_PPC_TFHAR             | 64
  PPC   | KVM_REG_PPC_TFIAR             | 64
  PPC   | KVM_REG_PPC_TEXASR            | 64
  PPC   | KVM_REG_PPC_FSCR              | 64
  PPC   | KVM_REG_PPC_PSPB              | 32
  PPC   | KVM_REG_PPC_EBBHR             | 64
  PPC   | KVM_REG_PPC_EBBRR             | 64