Newer
Older
Note: If any architecturally invalid key value is found in the given data then
the ioctl will return -EINVAL.
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
4.92 KVM_S390_IRQ
Capability: KVM_CAP_S390_INJECT_IRQ
Architectures: s390
Type: vcpu ioctl
Parameters: struct kvm_s390_irq (in)
Returns: 0 on success, -1 on error
Errors:
EINVAL: interrupt type is invalid
type is KVM_S390_SIGP_STOP and flag parameter is invalid value
type is KVM_S390_INT_EXTERNAL_CALL and code is bigger
than the maximum of VCPUs
EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped
type is KVM_S390_SIGP_STOP and a stop irq is already pending
type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt
is already pending
Allows to inject an interrupt to the guest.
Using struct kvm_s390_irq as a parameter allows
to inject additional payload which is not
possible via KVM_S390_INTERRUPT.
Interrupt parameters are passed via kvm_s390_irq:
struct kvm_s390_irq {
__u64 type;
union {
struct kvm_s390_io_info io;
struct kvm_s390_ext_info ext;
struct kvm_s390_pgm_info pgm;
struct kvm_s390_emerg_info emerg;
struct kvm_s390_extcall_info extcall;
struct kvm_s390_prefix_info prefix;
struct kvm_s390_stop_info stop;
struct kvm_s390_mchk_info mchk;
char reserved[64];
} u;
};
type can be one of the following:
KVM_S390_SIGP_STOP - sigp stop; parameter in .stop
KVM_S390_PROGRAM_INT - program check; parameters in .pgm
KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix
KVM_S390_RESTART - restart; no parameters
KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters
KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters
KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg
KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall
KVM_S390_MCHK - machine check interrupt; parameters in .mchk
This is an asynchronous vcpu ioctl and can be invoked from any thread.
4.94 KVM_S390_GET_IRQ_STATE
Capability: KVM_CAP_S390_IRQ_STATE
Architectures: s390
Type: vcpu ioctl
Parameters: struct kvm_s390_irq_state (out)
Returns: >= number of bytes copied into buffer,
-EINVAL if buffer size is 0,
-ENOBUFS if buffer size is too small to fit all pending interrupts,
-EFAULT if the buffer address was invalid
This ioctl allows userspace to retrieve the complete state of all currently
pending interrupts in a single buffer. Use cases include migration
and introspection. The parameter structure contains the address of a
userspace buffer and its length:
struct kvm_s390_irq_state {
__u64 buf;
__u32 flags; /* will stay unused for compatibility reasons */
__u32 reserved[4]; /* will stay unused for compatibility reasons */
};
Userspace passes in the above struct and for each pending interrupt a
struct kvm_s390_irq is copied to the provided buffer.
The structure contains a flags and a reserved field for future extensions. As
the kernel never checked for flags == 0 and QEMU never pre-zeroed flags and
reserved, these fields can not be used in the future without breaking
compatibility.
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
3108
3109
3110
3111
3112
If -ENOBUFS is returned the buffer provided was too small and userspace
may retry with a bigger buffer.
4.95 KVM_S390_SET_IRQ_STATE
Capability: KVM_CAP_S390_IRQ_STATE
Architectures: s390
Type: vcpu ioctl
Parameters: struct kvm_s390_irq_state (in)
Returns: 0 on success,
-EFAULT if the buffer address was invalid,
-EINVAL for an invalid buffer length (see below),
-EBUSY if there were already interrupts pending,
errors occurring when actually injecting the
interrupt. See KVM_S390_IRQ.
This ioctl allows userspace to set the complete state of all cpu-local
interrupts currently pending for the vcpu. It is intended for restoring
interrupt state after a migration. The input parameter is a userspace buffer
containing a struct kvm_s390_irq_state:
struct kvm_s390_irq_state {
__u64 buf;
__u32 flags; /* will stay unused for compatibility reasons */
__u32 reserved[4]; /* will stay unused for compatibility reasons */
The restrictions for flags and reserved apply as well.
(see KVM_S390_GET_IRQ_STATE)
The userspace memory referenced by buf contains a struct kvm_s390_irq
for each interrupt to be injected into the guest.
If one of the interrupts could not be injected for some reason the
ioctl aborts.
len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
which is the maximum number of possibly pending cpu-local interrupts.
Capability: KVM_CAP_X86_SMM
Architectures: x86
Type: vcpu ioctl
Parameters: none
Returns: 0 on success, -1 on error
Queues an SMI on the thread's vcpu.
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
4.97 KVM_CAP_PPC_MULTITCE
Capability: KVM_CAP_PPC_MULTITCE
Architectures: ppc
Type: vm
This capability means the kernel is capable of handling hypercalls
H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
space. This significantly accelerates DMA operations for PPC KVM guests.
User space should expect that its handlers for these hypercalls
are not going to be called if user space previously registered LIOBN
in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
user space might have to advertise it for the guest. For example,
IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
present in the "ibm,hypertas-functions" device-tree property.
The hypercalls mentioned above may or may not be processed successfully
in the kernel based fast path. If they can not be handled by the kernel,
they will get passed on to user space. So user space still has to have
an implementation for these despite the in kernel acceleration.
This capability is always enabled.
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
3183
3184
3185
3186
3187
3188
3189
3190
3191
3192
3193
3194
3195
3196
4.98 KVM_CREATE_SPAPR_TCE_64
Capability: KVM_CAP_SPAPR_TCE_64
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_create_spapr_tce_64 (in)
Returns: file descriptor for manipulating the created TCE table
This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit
windows, described in 4.62 KVM_CREATE_SPAPR_TCE
This capability uses extended struct in ioctl interface:
/* for KVM_CAP_SPAPR_TCE_64 */
struct kvm_create_spapr_tce_64 {
__u64 liobn;
__u32 page_shift;
__u32 flags;
__u64 offset; /* in pages */
__u64 size; /* in pages */
};
The aim of extension is to support an additional bigger DMA window with
a variable page size.
KVM_CREATE_SPAPR_TCE_64 receives a 64bit window size, an IOMMU page shift and
a bus offset of the corresponding DMA window, @size and @offset are numbers
of IOMMU pages.
@flags are not used at the moment.
The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.
4.99 KVM_REINJECT_CONTROL
3198
3199
3200
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
Capability: KVM_CAP_REINJECT_CONTROL
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_reinject_control (in)
Returns: 0 on success,
-EFAULT if struct kvm_reinject_control cannot be read,
-ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier.
i8254 (PIT) has two modes, reinject and !reinject. The default is reinject,
where KVM queues elapsed i8254 ticks and monitors completion of interrupt from
vector(s) that i8254 injects. Reinject mode dequeues a tick and injects its
interrupt whenever there isn't a pending interrupt from i8254.
!reinject mode injects an interrupt as soon as a tick arrives.
struct kvm_reinject_control {
__u8 pit_reinject;
__u8 reserved[31];
};
pit_reinject = 0 (!reinject mode) is recommended, unless running an old
operating system that uses the PIT for timing (e.g. Linux 2.4.x).
4.100 KVM_PPC_CONFIGURE_V3_MMU
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_mmuv3_cfg (in)
Returns: 0 on success,
-EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
-EINVAL if the configuration is invalid
This ioctl controls whether the guest will use radix or HPT (hashed
page table) translation, and sets the pointer to the process table for
the guest.
struct kvm_ppc_mmuv3_cfg {
__u64 flags;
__u64 process_table;
};
There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and
KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest
to use radix tree translation, and if clear, to use HPT translation.
KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest
to be able to use the global TLB and SLB invalidation instructions;
if clear, the guest may not use these instructions.
The process_table field specifies the address and size of the guest
process table, which is in the guest's space. This field is formatted
as the second doubleword of the partition table entry, as defined in
the Power ISA V3.00, Book III section 5.7.6.1.
4.101 KVM_PPC_GET_RMMU_INFO
3253
3254
3255
3256
3257
3258
3259
3260
3261
3262
3263
3264
3265
3266
3267
3268
3269
3270
3271
3272
3273
3274
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
Capability: KVM_CAP_PPC_RADIX_MMU
Architectures: ppc
Type: vm ioctl
Parameters: struct kvm_ppc_rmmu_info (out)
Returns: 0 on success,
-EFAULT if struct kvm_ppc_rmmu_info cannot be written,
-EINVAL if no useful information can be returned
This ioctl returns a structure containing two things: (a) a list
containing supported radix tree geometries, and (b) a list that maps
page sizes to put in the "AP" (actual page size) field for the tlbie
(TLB invalidate entry) instruction.
struct kvm_ppc_rmmu_info {
struct kvm_ppc_radix_geom {
__u8 page_shift;
__u8 level_bits[4];
__u8 pad[3];
} geometries[8];
__u32 ap_encodings[8];
};
The geometries[] field gives up to 8 supported geometries for the
radix page table, in terms of the log base 2 of the smallest page
size, and the number of bits indexed at each level of the tree, from
the PTE level up to the PGD level in that order. Any unused entries
will have 0 in the page_shift field.
The ap_encodings gives the supported page sizes and their AP field
encodings, encoded with the AP value in the top 3 bits and the log
base 2 of the page size in the bottom 6 bits.
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
3299
3300
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
3326
3327
3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
3360
3361
3362
3363
3364
3365
3366
3367
3368
3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
4.102 KVM_PPC_RESIZE_HPT_PREPARE
Capability: KVM_CAP_SPAPR_RESIZE_HPT
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_ppc_resize_hpt (in)
Returns: 0 on successful completion,
>0 if a new HPT is being prepared, the value is an estimated
number of milliseconds until preparation is complete
-EFAULT if struct kvm_reinject_control cannot be read,
-EINVAL if the supplied shift or flags are invalid
-ENOMEM if unable to allocate the new HPT
-ENOSPC if there was a hash collision when moving existing
HPT entries to the new HPT
-EIO on other error conditions
Used to implement the PAPR extension for runtime resizing of a guest's
Hashed Page Table (HPT). Specifically this starts, stops or monitors
the preparation of a new potential HPT for the guest, essentially
implementing the H_RESIZE_HPT_PREPARE hypercall.
If called with shift > 0 when there is no pending HPT for the guest,
this begins preparation of a new pending HPT of size 2^(shift) bytes.
It then returns a positive integer with the estimated number of
milliseconds until preparation is complete.
If called when there is a pending HPT whose size does not match that
requested in the parameters, discards the existing pending HPT and
creates a new one as above.
If called when there is a pending HPT of the size requested, will:
* If preparation of the pending HPT is already complete, return 0
* If preparation of the pending HPT has failed, return an error
code, then discard the pending HPT.
* If preparation of the pending HPT is still in progress, return an
estimated number of milliseconds until preparation is complete.
If called with shift == 0, discards any currently pending HPT and
returns 0 (i.e. cancels any in-progress preparation).
flags is reserved for future expansion, currently setting any bits in
flags will result in an -EINVAL.
Normally this will be called repeatedly with the same parameters until
it returns <= 0. The first call will initiate preparation, subsequent
ones will monitor preparation until it completes or fails.
struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};
4.103 KVM_PPC_RESIZE_HPT_COMMIT
Capability: KVM_CAP_SPAPR_RESIZE_HPT
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_ppc_resize_hpt (in)
Returns: 0 on successful completion,
-EFAULT if struct kvm_reinject_control cannot be read,
-EINVAL if the supplied shift or flags are invalid
-ENXIO is there is no pending HPT, or the pending HPT doesn't
have the requested size
-EBUSY if the pending HPT is not fully prepared
-ENOSPC if there was a hash collision when moving existing
HPT entries to the new HPT
-EIO on other error conditions
Used to implement the PAPR extension for runtime resizing of a guest's
Hashed Page Table (HPT). Specifically this requests that the guest be
transferred to working with the new HPT, essentially implementing the
H_RESIZE_HPT_COMMIT hypercall.
This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
returned 0 with the same parameters. In other cases
KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
-EBUSY, though others may be possible if the preparation was started,
but failed).
This will have undefined effects on the guest if it has not already
placed itself in a quiescent state where no vcpu will make MMU enabled
memory accesses.
On succsful completion, the pending HPT will become the guest's active
HPT and the previous HPT will be discarded.
On failure, the guest will still be operating on its previous HPT.
struct kvm_ppc_resize_hpt {
__u64 flags;
__u32 shift;
__u32 pad;
};
3381
3382
3383
3384
3385
3386
3387
3388
3389
3390
3391
3392
3393
3394
3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
3431
3432
3433
3434
3435
3436
3437
3438
3439
3440
3441
3442
3443
4.104 KVM_X86_GET_MCE_CAP_SUPPORTED
Capability: KVM_CAP_MCE
Architectures: x86
Type: system ioctl
Parameters: u64 mce_cap (out)
Returns: 0 on success, -1 on error
Returns supported MCE capabilities. The u64 mce_cap parameter
has the same format as the MSR_IA32_MCG_CAP register. Supported
capabilities will have the corresponding bits set.
4.105 KVM_X86_SETUP_MCE
Capability: KVM_CAP_MCE
Architectures: x86
Type: vcpu ioctl
Parameters: u64 mcg_cap (in)
Returns: 0 on success,
-EFAULT if u64 mcg_cap cannot be read,
-EINVAL if the requested number of banks is invalid,
-EINVAL if requested MCE capability is not supported.
Initializes MCE support for use. The u64 mcg_cap parameter
has the same format as the MSR_IA32_MCG_CAP register and
specifies which capabilities should be enabled. The maximum
supported number of error-reporting banks can be retrieved when
checking for KVM_CAP_MCE. The supported capabilities can be
retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED.
4.106 KVM_X86_SET_MCE
Capability: KVM_CAP_MCE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_x86_mce (in)
Returns: 0 on success,
-EFAULT if struct kvm_x86_mce cannot be read,
-EINVAL if the bank number is invalid,
-EINVAL if VAL bit is not set in status field.
Inject a machine check error (MCE) into the guest. The input
parameter is:
struct kvm_x86_mce {
__u64 status;
__u64 addr;
__u64 misc;
__u64 mcg_status;
__u8 bank;
__u8 pad1[7];
__u64 pad2[3];
};
If the MCE being reported is an uncorrected error, KVM will
inject it as an MCE exception into the guest. If the guest
MCG_STATUS register reports that an MCE is in progress, KVM
causes an KVM_EXIT_SHUTDOWN vmexit.
Otherwise, if the MCE is a corrected error, KVM will just
store it in the corresponding bank (provided this bank is
not holding a previously reported uncorrected error).
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
3460
3461
3462
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
4.107 KVM_S390_GET_CMMA_BITS
Capability: KVM_CAP_S390_CMMA_MIGRATION
Architectures: s390
Type: vm ioctl
Parameters: struct kvm_s390_cmma_log (in, out)
Returns: 0 on success, a negative value on error
This ioctl is used to get the values of the CMMA bits on the s390
architecture. It is meant to be used in two scenarios:
- During live migration to save the CMMA values. Live migration needs
to be enabled via the KVM_REQ_START_MIGRATION VM property.
- To non-destructively peek at the CMMA values, with the flag
KVM_S390_CMMA_PEEK set.
The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired
values are written to a buffer whose location is indicated via the "values"
member in the kvm_s390_cmma_log struct. The values in the input struct are
also updated as needed.
Each CMMA value takes up one byte.
struct kvm_s390_cmma_log {
__u64 start_gfn;
__u32 count;
__u32 flags;
union {
__u64 remaining;
__u64 mask;
};
__u64 values;
};
start_gfn is the number of the first guest frame whose CMMA values are
to be retrieved,
count is the length of the buffer in bytes,
values points to the buffer where the result will be written to.
If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be
KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
other ioctls.
The result is written in the buffer pointed to by the field values, and
the values of the input parameter are updated as follows.
Depending on the flags, different actions are performed. The only
supported flag so far is KVM_S390_CMMA_PEEK.
The default behaviour if KVM_S390_CMMA_PEEK is not set is:
start_gfn will indicate the first page frame whose CMMA bits were dirty.
It is not necessarily the same as the one passed as input, as clean pages
are skipped.
count will indicate the number of bytes actually written in the buffer.
It can (and very often will) be smaller than the input value, since the
buffer is only filled until 16 bytes of clean values are found (which
are then not copied in the buffer). Since a CMMA migration block needs
the base address and the length, for a total of 16 bytes, we will send
back some clean data if there is some dirty data afterwards, as long as
the size of the clean data does not exceed the size of the header. This
allows to minimize the amount of data to be saved or transferred over
the network at the expense of more roundtrips to userspace. The next
invocation of the ioctl will skip over all the clean values, saving
potentially more than just the 16 bytes we found.
If KVM_S390_CMMA_PEEK is set:
the existing storage attributes are read even when not in migration
mode, and no other action is performed;
the output start_gfn will be equal to the input start_gfn,
the output count will be equal to the input count, except if the end of
memory has been reached.
In both cases:
the field "remaining" will indicate the total number of dirty CMMA values
still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is
not enabled.
mask is unused.
values points to the userspace buffer where the result will be stored.
This ioctl can fail with -ENOMEM if not enough memory can be allocated to
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with
-EFAULT if the userspace address is invalid or if no page table is
present for the addresses (e.g. when using hugepages).
4.108 KVM_S390_SET_CMMA_BITS
Capability: KVM_CAP_S390_CMMA_MIGRATION
Architectures: s390
Type: vm ioctl
Parameters: struct kvm_s390_cmma_log (in)
Returns: 0 on success, a negative value on error
This ioctl is used to set the values of the CMMA bits on the s390
architecture. It is meant to be used during live migration to restore
the CMMA values, but there are no restrictions on its use.
The ioctl takes parameters via the kvm_s390_cmma_values struct.
Each CMMA value takes up one byte.
struct kvm_s390_cmma_log {
__u64 start_gfn;
__u32 count;
__u32 flags;
union {
__u64 remaining;
__u64 mask;
};
__u64 values;
};
start_gfn indicates the starting guest frame number,
count indicates how many values are to be considered in the buffer,
flags is not used and must be 0.
mask indicates which PGSTE bits are to be considered.
remaining is not used.
values points to the buffer in userspace where to store the values.
This ioctl can fail with -ENOMEM if not enough memory can be allocated to
complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or
if the flags field was not 0, with -EFAULT if the userspace address is
invalid, if invalid pages are written to (e.g. after the end of memory)
or if no page table is present for the addresses (e.g. when using
hugepages).
Radim Krčmář
committed
4.109 KVM_PPC_GET_CPU_CHAR
Paul Mackerras
committed
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
Capability: KVM_CAP_PPC_GET_CPU_CHAR
Architectures: powerpc
Type: vm ioctl
Parameters: struct kvm_ppc_cpu_char (out)
Returns: 0 on successful completion
-EFAULT if struct kvm_ppc_cpu_char cannot be written
This ioctl gives userspace information about certain characteristics
of the CPU relating to speculative execution of instructions and
possible information leakage resulting from speculative execution (see
CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is
returned in struct kvm_ppc_cpu_char, which looks like this:
struct kvm_ppc_cpu_char {
__u64 character; /* characteristics of the CPU */
__u64 behaviour; /* recommended software behaviour */
__u64 character_mask; /* valid bits in character */
__u64 behaviour_mask; /* valid bits in behaviour */
};
For extensibility, the character_mask and behaviour_mask fields
indicate which bits of character and behaviour have been filled in by
the kernel. If the set of defined bits is extended in future then
userspace will be able to tell whether it is running on a kernel that
knows about the new bits.
The character field describes attributes of the CPU which can help
with preventing inadvertent information disclosure - specifically,
whether there is an instruction to flash-invalidate the L1 data cache
(ori 30,30,0 or mtspr SPRN_TRIG2,rN), whether the L1 data cache is set
to a mode where entries can only be used by the thread that created
them, whether the bcctr[l] instruction prevents speculation, and
whether a speculation barrier instruction (ori 31,31,0) is provided.
The behaviour field describes actions that software should take to
prevent inadvertent information disclosure, and thus describes which
vulnerabilities the hardware is subject to; specifically whether the
L1 data cache should be flushed when returning to user mode from the
kernel, and whether a speculation barrier should be placed between an
array bounds check and the array access.
These fields use the same bit definitions as the new
H_GET_CPU_CHARACTERISTICS hypercall.
Radim Krčmář
committed
4.110 KVM_MEMORY_ENCRYPT_OP
Capability: basic
Architectures: x86
Type: system
Parameters: an opaque platform specific structure (in/out)
Returns: 0 on success; -1 on error
If the platform supports creating encrypted VMs then this ioctl can be used
for issuing platform-specific memory encryption commands to manage those
encrypted VMs.
Currently, this ioctl is used for issuing Secure Encrypted Virtualization
(SEV) commands on AMD Processors. The SEV commands are defined in
Documentation/virtual/kvm/amd-memory-encryption.rst.
Radim Krčmář
committed
4.111 KVM_MEMORY_ENCRYPT_REG_REGION
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660
3661
3662
3663
Capability: basic
Architectures: x86
Type: system
Parameters: struct kvm_enc_region (in)
Returns: 0 on success; -1 on error
This ioctl can be used to register a guest memory region which may
contain encrypted data (e.g. guest RAM, SMRAM etc).
It is used in the SEV-enabled guest. When encryption is enabled, a guest
memory region may contain encrypted data. The SEV memory encryption
engine uses a tweak such that two identical plaintext pages, each at
different locations will have differing ciphertexts. So swapping or
moving ciphertext of those pages will not result in plaintext being
swapped. So relocating (or migrating) physical backing pages for the SEV
guest will require some additional steps.
Note: The current SEV key management spec does not provide commands to
swap or migrate (move) ciphertext pages. Hence, for now we pin the guest
memory region registered with the ioctl.
Radim Krčmář
committed
4.112 KVM_MEMORY_ENCRYPT_UNREG_REGION
Capability: basic
Architectures: x86
Type: system
Parameters: struct kvm_enc_region (in)
Returns: 0 on success; -1 on error
This ioctl can be used to unregister the guest memory region registered
with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above.
3675
3676
3677
3678
3679
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
3703
3704
3705
3706
4.113 KVM_HYPERV_EVENTFD
Capability: KVM_CAP_HYPERV_EVENTFD
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_hyperv_eventfd (in)
This ioctl (un)registers an eventfd to receive notifications from the guest on
the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
(bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.
struct kvm_hyperv_eventfd {
__u32 conn_id;
__s32 fd;
__u32 flags;
__u32 padding[3];
};
The conn_id field should fit within 24 bits:
#define KVM_HYPERV_CONN_ID_MASK 0x00ffffff
The acceptable values for the flags field are:
#define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0)
Returns: 0 on success,
-EINVAL if conn_id or flags is outside the allowed range
-ENOENT on deassign if the conn_id isn't registered
-EEXIST on assign if the conn_id is already registered
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716
3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728
3729
3730
3731
3732
3733
3734
3735
3736
3737
3738
3739
3740
3741
3742
3743
3744
3745
3746
3747
3748
3749
3750
3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
3761
3762
4.114 KVM_GET_NESTED_STATE
Capability: KVM_CAP_NESTED_STATE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_nested_state (in/out)
Returns: 0 on success, -1 on error
Errors:
E2BIG: the total state size (including the fixed-size part of struct
kvm_nested_state) exceeds the value of 'size' specified by
the user; the size required will be written into size.
struct kvm_nested_state {
__u16 flags;
__u16 format;
__u32 size;
union {
struct kvm_vmx_nested_state vmx;
struct kvm_svm_nested_state svm;
__u8 pad[120];
};
__u8 data[0];
};
#define KVM_STATE_NESTED_GUEST_MODE 0x00000001
#define KVM_STATE_NESTED_RUN_PENDING 0x00000002
#define KVM_STATE_NESTED_SMM_GUEST_MODE 0x00000001
#define KVM_STATE_NESTED_SMM_VMXON 0x00000002
struct kvm_vmx_nested_state {
__u64 vmxon_pa;
__u64 vmcs_pa;
struct {
__u16 flags;
} smm;
};
This ioctl copies the vcpu's nested virtualization state from the kernel to
userspace.
The maximum size of the state, including the fixed-size part of struct
kvm_nested_state, can be retrieved by passing KVM_CAP_NESTED_STATE to
the KVM_CHECK_EXTENSION ioctl().
4.115 KVM_SET_NESTED_STATE
Capability: KVM_CAP_NESTED_STATE
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_nested_state (in)
Returns: 0 on success, -1 on error
This copies the vcpu's kvm_nested_state struct from userspace to the kernel. For
the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE.
Radim Krčmář
committed
4.116 KVM_(UN)REGISTER_COALESCED_MMIO
Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio)
KVM_CAP_COALESCED_PIO (for coalesced pio)
Architectures: all
Type: vm ioctl
Parameters: struct kvm_coalesced_mmio_zone
Returns: 0 on success, < 0 on error
Coalesced I/O is a performance optimization that defers hardware
register write emulation so that userspace exits are avoided. It is
typically used to reduce the overhead of emulating frequently accessed
hardware registers.
When a hardware register is configured for coalesced I/O, write accesses
do not exit to userspace and their value is recorded in a ring buffer
that is shared between kernel and userspace.
Coalesced I/O is used if one or more write accesses to a hardware
register can be deferred until a read or a write to another hardware
register on the same device. This last access will cause a vmexit and
userspace will process accesses from the ring buffer before emulating
it. That will avoid exiting to userspace on repeated writes.
Coalesced pio is based on coalesced mmio. There is little difference
between coalesced mmio and pio except that coalesced pio records accesses
to I/O ports.
3792
3793
3794
3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
Architectures: x86
Type: vm ioctl
Parameters: struct kvm_dirty_log (in)
Returns: 0 on success, -1 on error
/* for KVM_CLEAR_DIRTY_LOG */
struct kvm_clear_dirty_log {
__u32 slot;
__u32 num_pages;
__u64 first_page;
union {
void __user *dirty_bitmap; /* one bit per page */
__u64 padding;
};
};
The ioctl clears the dirty status of pages in a memory slot, according to
the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap
field. Bit 0 of the bitmap corresponds to page "first_page" in the
memory slot, and num_pages is the size in bits of the input bitmap.
first_page must be a multiple of 64; num_pages must also be a multiple of
64 unless first_page + num_pages is the size of the memory slot. For each
bit that is set in the input bitmap, the corresponding page is marked "clean"
in KVM's dirty bitmap, and dirty tracking is re-enabled for that page
(for example via write-protection, or by clearing the dirty bit in
a page table entry).
If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 specifies
the address space for which you want to return the dirty bitmap.
They must be less than the value that KVM_CHECK_EXTENSION returns for
the KVM_CAP_MULTI_ADDRESS_SPACE capability.
This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT
is enabled; for more information, see the description of the capability.
However, it can always be used as long as KVM_CHECK_EXTENSION confirms
that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is present.
3832
3833
3834
3835
3836
3837
3838
3839
3840
3841
3842
3843
3844
3845
3846
3847
3848
3849
3850
3851
3852
3853
3854
3855
3856
3857
3858
3859
3860
3861
3862
3863
3864
3865
3866
3867
3868
3869
3870
3871
3872
3873
3874
3875
3876
3877
3878
3879
3880
3881
3882
3883
3884
3885
3886
3887
4.118 KVM_GET_SUPPORTED_HV_CPUID
Capability: KVM_CAP_HYPERV_CPUID
Architectures: x86
Type: vcpu ioctl
Parameters: struct kvm_cpuid2 (in/out)
Returns: 0 on success, -1 on error
struct kvm_cpuid2 {
__u32 nent;
__u32 padding;
struct kvm_cpuid_entry2 entries[0];
};
struct kvm_cpuid_entry2 {
__u32 function;
__u32 index;
__u32 flags;
__u32 eax;
__u32 ebx;
__u32 ecx;
__u32 edx;
__u32 padding[3];
};
This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in
KVM. Userspace can use the information returned by this ioctl to construct
cpuid information presented to guests consuming Hyper-V enlightenments (e.g.
Windows or Hyper-V guests).
CPUID feature leaves returned by this ioctl are defined by Hyper-V Top Level
Functional Specification (TLFS). These leaves can't be obtained with
KVM_GET_SUPPORTED_CPUID ioctl because some of them intersect with KVM feature
leaves (0x40000000, 0x40000001).
Currently, the following list of CPUID leaves are returned:
HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
HYPERV_CPUID_INTERFACE
HYPERV_CPUID_VERSION
HYPERV_CPUID_FEATURES
HYPERV_CPUID_ENLIGHTMENT_INFO
HYPERV_CPUID_IMPLEMENT_LIMITS
HYPERV_CPUID_NESTED_FEATURES
HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
with the 'nent' field indicating the number of entries in the variable-size
array 'entries'. If the number of entries is too low to describe all Hyper-V
feature leaves, an error (E2BIG) is returned. If the number is more or equal
to the number of Hyper-V feature leaves, the 'nent' field is adjusted to the
number of valid entries in the 'entries' array, which is then filled.
'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved,
userspace should not expect to get any particular value there.
Application code obtains a pointer to the kvm_run structure by
mmap()ing a vcpu fd. From that point, application code can control
execution by changing fields in kvm_run prior to calling the KVM_RUN
ioctl, and obtain information about the reason KVM_RUN returned by
looking up structure members.
struct kvm_run {
/* in */
__u8 request_interrupt_window;
Request that KVM_RUN return when it becomes possible to inject external
interrupts into the guest. Useful in conjunction with KVM_INTERRUPT.
__u8 immediate_exit;
This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
exits immediately, returning -EINTR. In the common scenario where a
signal is used to "kick" a VCPU out of KVM_RUN, this field can be used
to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability.
Rather than blocking the signal outside KVM_RUN, userspace can set up
a signal handler that sets run->immediate_exit to a non-zero value.
This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available.
__u8 padding1[6];
/* out */
__u32 exit_reason;
When KVM_RUN has returned successfully (return value 0), this informs
application code why KVM_RUN has returned. Allowable values for this
field are detailed below.
__u8 ready_for_interrupt_injection;
If request_interrupt_window has been specified, this field indicates
an interrupt can be injected now with KVM_INTERRUPT.
__u8 if_flag;
The value of the current interrupt flag. Only valid if in-kernel
local APIC is not used.
__u16 flags;
More architecture-specific flags detailing state of the VCPU that may
affect the device's behavior. The only currently defined flag is
KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
VCPU is in system management mode.
3941
3942
3943
3944
3945
3946
3947
3948
3949
3950
3951
3952
3953
3954
3955
3956
3957
3958
3959
3960
3961
3962
3963
3964
3965
3966
3967
3968
3969
3970
3971
3972
3973
3974
3975
3976
3977
3978
3979
3980
3981
3982
3983
3984
3985
3986
3987
3988
3989
3990
/* in (pre_kvm_run), out (post_kvm_run) */
__u64 cr8;
The value of the cr8 register. Only valid if in-kernel local APIC is
not used. Both input and output.
__u64 apic_base;
The value of the APIC BASE msr. Only valid if in-kernel local
APIC is not used. Both input and output.
union {
/* KVM_EXIT_UNKNOWN */
struct {
__u64 hardware_exit_reason;
} hw;
If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
reasons. Further architecture-specific information is available in
hardware_exit_reason.
/* KVM_EXIT_FAIL_ENTRY */
struct {
__u64 hardware_entry_failure_reason;
} fail_entry;
If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
to unknown reasons. Further architecture-specific information is
available in hardware_entry_failure_reason.
/* KVM_EXIT_EXCEPTION */
struct {
__u32 exception;
__u32 error_code;
} ex;
Unused.
/* KVM_EXIT_IO */
struct {
#define KVM_EXIT_IO_IN 0
#define KVM_EXIT_IO_OUT 1
__u8 direction;
__u8 size; /* bytes */
__u16 port;
__u32 count;
__u64 data_offset; /* relative to kvm_run start */
} io;
If exit_reason is KVM_EXIT_IO, then the vcpu has
executed a port I/O instruction which could not be satisfied by kvm.
data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
where kvm expects application code to place the data for the next
KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
struct {
struct kvm_debug_exit_arch arch;
} debug;