- Jan 20, 2022
-
-
Wei Wang authored
Use the api number 134 for KVM_GET_XSAVE2, instead of 42, which has been used by KVM_GET_XSAVE. Also, fix the WARNINGs of the underlines being too short. Reported-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Wei Wang <wei.w.wang@intel.com> Tested-by:
Stephen Rothwell <sfr@canb.auug.org.au> Message-Id: <20220120045003.315177-1-wei.w.wang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 14, 2022
-
-
Guang Zeng authored
With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set xstate data from/to KVM. This doesn't work when dynamic xfeatures (e.g. AMX) are exposed to the guest as they require a larger buffer size. Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2). KVM_SET_XSAVE is extended to work with both legacy and new capabilities by doing properly-sized memdup_user() based on the guest fpu container. KVM_GET_XSAVE is kept for backward-compatible reason. Instead, KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred interface for getting xstate buffer (4KB or larger size) from KVM (Link: https://lkml.org/lkml/2021/12/15/510 ) Also, update the api doc with the new KVM_GET_XSAVE2 ioctl. Signed-off-by:
Guang Zeng <guang.zeng@intel.com> Signed-off-by:
Wei Wang <wei.w.wang@intel.com> Signed-off-by:
Jing Liu <jing2.liu@intel.com> Signed-off-by:
Kevin Tian <kevin.tian@intel.com> Signed-off-by:
Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-19-yang.zhong@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 07, 2022
-
-
Jing Liu authored
KVM_GET_SUPPORTED_CPUID should not include any dynamic xstates in CPUID[0xD] if they have not been requested with prctl. Otherwise a process which directly passes KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2 would now fail even if it doesn't intend to use a dynamically enabled feature. Userspace must know that prctl is required and allocate >4K xstate buffer before setting any dynamic bit. Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jing Liu <jing2.liu@intel.com> Signed-off-by:
Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-5-yang.zhong@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
David Woodhouse authored
This adds basic support for delivering 2 level event channels to a guest. Initially, it only supports delivery via the IRQ routing table, triggered by an eventfd. In order to do so, it has a kvm_xen_set_evtchn_fast() function which will use the pre-mapped shared_info page if it already exists and is still valid, while the slow path through the irqfd_inject workqueue will remap the shared_info page if necessary. It sets the bits in the shared_info page but not the vcpu_info; that is deferred to __kvm_xen_has_interrupt() which raises the vector to the appropriate vCPU. Add a 'verbose' mode to xen_shinfo_test while adding test cases for this. Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-5-dwmw2@infradead.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
David Woodhouse authored
Use the newly reinstated gfn_to_pfn_cache to maintain a kernel mapping of the Xen shared_info page so that it can be accessed in atomic context. Note that we do not participate in dirty tracking for the shared info page and we do not explicitly mark it dirty every single tim we deliver an event channel interrupts. We wouldn't want to do that even if we *did* have a valid vCPU context with which to do so. Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-4-dwmw2@infradead.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Dec 07, 2021
-
-
Janis Schoetterl-Glausch authored
They are defined in include/uapi/linux/kvm.h as KVM_S390_GET_SKEYS_NONE and KVM_S390_SKEYS_MAX, but the api documetation talks of KVM_S390_GET_KEYS_NONE and KVM_S390_SKEYS_ALLOC_MAX respectively. Signed-off-by:
Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by:
Janosch Frank <frankja@linux.ibm.com> Message-Id: <20211118102522.569660-1-scgl@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com>
-
- Nov 11, 2021
-
-
Peter Gonda authored
For SEV to work with intra host migration, contents of the SEV info struct such as the ASID (used to index the encryption key in the AMD SP) and the list of memory regions need to be transferred to the target VM. This change adds a commands for a target VMM to get a source SEV VM's sev info. Signed-off-by:
Peter Gonda <pgonda@google.com> Suggested-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Marc Orr <marcorr@google.com> Cc: Marc Orr <marcorr@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <20211021174303.385706-3-pgonda@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 18, 2021
-
-
Oliver Upton authored
Handling the migration of TSCs correctly is difficult, in part because Linux does not provide userspace with the ability to retrieve a (TSC, realtime) clock pair for a single instant in time. In lieu of a more convenient facility, KVM can report similar information in the kvm_clock structure. Provide userspace with a host TSC & realtime pair iff the realtime clock is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid realtime value, advance the KVM clock by the amount of elapsed time. Do not step the KVM clock backwards, though, as it is a monotonic oscillator. Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Oliver Upton <oupton@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210916181538.968978-5-oupton@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 04, 2021
-
-
Anup Patel authored
Document RISC-V specific parts of the KVM API, such as: - The interrupt numbers passed to the KVM_INTERRUPT ioctl. - The states supported by the KVM_{GET,SET}_MP_STATE ioctls. - The registers supported by the KVM_{GET,SET}_ONE_REG interface and the encoding of those register ids. - The exit reason KVM_EXIT_RISCV_SBI for SBI calls forwarded to userspace tool. CC: Jonathan Corbet <corbet@lwn.net> CC: linux-doc@vger.kernel.org Signed-off-by:
Anup Patel <anup.patel@wdc.com> Acked-by:
Palmer Dabbelt <palmerdabbelt@google.com>
-
- Aug 20, 2021
-
-
Maxim Levitsky authored
KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts while running. This change is mostly intended for more robust single stepping of the guest and it has the following benefits when enabled: * Resuming from a breakpoint is much more reliable. When resuming execution from a breakpoint, with interrupts enabled, more often than not, KVM would inject an interrupt and make the CPU jump immediately to the interrupt handler and eventually return to the breakpoint, to trigger it again. From the user point of view it looks like the CPU never executed a single instruction and in some cases that can even prevent forward progress, for example, when the breakpoint is placed by an automated script (e.g lx-symbols), which does something in response to the breakpoint and then continues the guest automatically. If the script execution takes enough time for another interrupt to arrive, the guest will be stuck on the same breakpoint RIP forever. * Normal single stepping is much more predictable, since it won't land the debugger into an interrupt handler. * RFLAGS.TF has less chance to be leaked to the guest: We set that flag behind the guest's back to do single stepping but if single step lands us into an interrupt/exception handler it will be leaked to the guest in the form of being pushed to the stack. This doesn't completely eliminate this problem as exceptions can still happen, but at least this reduces the chances of this happening. Signed-off-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jing Zhang authored
Add documentations for linear and logarithmic histogram statistics. Signed-off-by:
Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-3-jingzhangos@google.com> [Small changes to the phrasing. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jul 26, 2021
-
-
Mauro Carvalho Chehab authored
The conversion tools used during DocBook/LaTeX/html/Markdown->ReST conversion and some cut-and-pasted text contain some characters that aren't easily reachable on standard keyboards and/or could cause troubles when parsed by the documentation build system. Replace the occurences of the following characters: - U+00a0 (' '): NO-BREAK SPACE as it can cause lines being truncated on PDF output Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Message-Id: <ff70cb42d63f3a1da66af1b21b8d038418ed5189.1626947264.git.mchehab+huawei@kernel.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
'KVM_CAP_ENFORCE_PV_CPUID' doesn't match the define in include/uapi/linux/kvm.h. Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210722092628.236474-1-vkuznets@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jul 25, 2021
-
-
Mauro Carvalho Chehab authored
The conversion tools used during DocBook/LaTeX/html/Markdown->ReST conversion and some cut-and-pasted text contain some characters that aren't easily reachable on standard keyboards and/or could cause troubles when parsed by the documentation build system. Replace the occurences of the following characters: - U+00a0 (' '): NO-BREAK SPACE as it can cause lines being truncated on PDF output Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/ff70cb42d63f3a1da66af1b21b8d038418ed5189.1626947264.git.mchehab+huawei@kernel.org Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Ioana Ciornei authored
Add a '::' so that a code block is interpreted properly and also add a blank line before the start of a list. Fixes: fdc09ddd ("KVM: stats: Add documentation for binary statistics interface") Signed-off-by:
Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by:
Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20210722100356.635078-4-ciorneiioana@gmail.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Ioana Ciornei authored
Fix some small build warnings. The title underline was too short in some cases and a code block was not indented. Documentation/virt/kvm/api.rst:7216: WARNING: Title underline too short. Fixes: 6dba9403 ("KVM: x86: Introduce KVM_GET_SREGS2 / KVM_SET_SREGS2") Signed-off-by:
Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20210722100356.635078-3-ciorneiioana@gmail.com Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Jun 24, 2021
-
-
Aaron Lewis authored
Add a fallback mechanism to the in-kernel instruction emulator that allows userspace the opportunity to process an instruction the emulator was unable to. When the in-kernel instruction emulator fails to process an instruction it will either inject a #UD into the guest or exit to userspace with exit reason KVM_INTERNAL_ERROR. This is because it does not know how to proceed in an appropriate manner. This feature lets userspace get involved to see if it can figure out a better path forward. Signed-off-by:
Aaron Lewis <aaronlewis@google.com> Reviewed-by:
David Edmondson <david.edmondson@oracle.com> Message-Id: <20210510144834.658457-2-aaronlewis@google.com> Reviewed-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest instability. Initialize last_vmentry_cpu to -1 and use it to detect if the vCPU has been run at least once when its CPUID model is changed. KVM does not correctly handle changes to paging related settings in the guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc... KVM could theoretically zap all shadow pages, but actually making that happen is a mess due to lock inversion (vcpu->mutex is held). And even then, updating paging settings on the fly would only work if all vCPUs are stopped, updated in concert with identical settings, then restarted. To support running vCPUs with different vCPU models (that affect paging), KVM would need to track all relevant information in kvm_mmu_page_role. Note, that's the _page_ role, not the full mmu_role. Updating mmu_role isn't sufficient as a vCPU can reuse a shadow page translation that was created by a vCPU with different settings and thus completely skip the reserved bit checks (that are tied to CPUID). Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as it would require doubling gfn_track from a u16 to a u32, i.e. would increase KVM's memory footprint by 2 bytes for every 4kb of guest memory. E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT would all need to be tracked. In practice, there is no remotely sane use case for changing any paging related CPUID entries on the fly, so just sweep it under the rug (after yelling at userspace). Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-8-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jing Zhang authored
This new API provides a file descriptor for every VM and VCPU to read KVM statistics data in binary format. It is meant to provide a lightweight, flexible, scalable and efficient lock-free solution for user space telemetry applications to pull the statistics data periodically for large scale systems. The pulling frequency could be as high as a few times per second. The statistics descriptors are defined by KVM in kernel and can be by userspace to discover VM/VCPU statistics during the one-time setup stage. The statistics data itself could be read out by userspace telemetry periodically without any extra parsing or setup effort. There are a few existed interface protocols and definitions, but no one can fulfil all the requirements this interface implemented as below: 1. During high frequency periodic stats reading, there should be no extra efforts except the stats data read itself. 2. Support stats annotation, like type (cumulative, instantaneous, peak, histogram, etc) and unit (counter, time, size, cycles, etc). 3. The stats data reading should be free of lock/synchronization. We don't care about the consistency between all the stats data. All stats data can not be read out at exactly the same time. We really care about the change or trend of the stats data. The lock-free solution is not just for efficiency and scalability, also for the stats data accuracy and usability. For example, in the situation that all the stats data readings are protected by a global lock, if one VCPU died somehow with that lock held, then all stats data reading would be blocked, then we have no way from stats data that which VCPU has died. 4. The stats data reading workload can be handed over to other unprivileged process. Reviewed-by:
David Matlack <dmatlack@google.com> Reviewed-by:
Ricardo Koller <ricarkol@google.com> Reviewed-by:
Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Signed-off-by:
Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-6-jingzhangos@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jun 22, 2021
-
-
Bharata B Rao authored
Now that we have H_RPT_INVALIDATE fully implemented, enable support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability Signed-off-by:
Bharata B Rao <bharata@linux.ibm.com> Reviewed-by:
David Gibson <david@gibson.dropbear.id.au> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-6-bharata@linux.ibm.com
-
Steven Price authored
A new capability (KVM_CAP_ARM_MTE) identifies that the kernel supports granting a guest access to the tags, and provides a mechanism for the VMM to enable it. A new ioctl (KVM_ARM_MTE_COPY_TAGS) provides a simple way for a VMM to access the tags of a guest without having to maintain a PROT_MTE mapping in userspace. The above capability gates access to the ioctl. Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Steven Price <steven.price@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-7-steven.price@arm.com
-
- Jun 17, 2021
-
-
Ashish Kalra authored
This hypercall is used by the SEV guest to notify a change in the page encryption status to the hypervisor. The hypercall should be invoked only when the encryption attribute is changed from encrypted -> decrypted and vice versa. By default all guest pages are considered encrypted. The hypercall exits to userspace to manage the guest shared regions and integrate with the userspace VMM's migration code. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by:
Steve Rutherford <srutherford@google.com> Signed-off-by:
Brijesh Singh <brijesh.singh@amd.com> Signed-off-by:
Ashish Kalra <ashish.kalra@amd.com> Co-developed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Co-developed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Maxim Levitsky authored
This is a new version of KVM_GET_SREGS / KVM_SET_SREGS. It has the following changes: * Has flags for future extensions * Has vcpu's PDPTRs, allowing to save/restore them on migration. * Lacks obsolete interrupt bitmap (done now via KVM_SET_VCPU_EVENTS) New capability, KVM_CAP_SREGS2 is added to signal the userspace of this ioctl. Signed-off-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-8-mlevitsk@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows for limiting Hyper-V features to those exposed to the guest in Hyper-V CPUIDs (0x40000003, 0x40000004, ...). Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210521095204.2161214-3-vkuznets@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- May 20, 2021
-
-
Mauro Carvalho Chehab authored
The document which describes the SGX kernel architecture was added at commit 3fa97bf0 ("Documentation/x86: Document SGX kernel architecture") but the reference at virt/kvm/api.rst is pointing to some non-existing document. Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/138c24633c6e4edf862a2b4d77033c603fc10406.1621413933.git.mchehab+huawei@kernel.org Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- May 07, 2021
-
-
Siddharth Chandrasekaran authored
The capability that exposes new ioctl KVM_X86_SET_MSR_FILTER to userspace is specified incorrectly as the ioctl itself (instead of KVM_CAP_X86_MSR_FILTER). This patch fixes it. Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering") Reviewed-by:
Alexander Graf <graf@amazon.de> Signed-off-by:
Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <20210503120059.9283-1-sidcha@amazon.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 26, 2021
-
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 21, 2021
-
-
Nathan Tempelman authored
Add a capability for userspace to mirror SEV encryption context from one vm to another. On our side, this is intended to support a Migration Helper vCPU, but it can also be used generically to support other in-guest workloads scheduled by the host. The intention is for the primary guest and the mirror to have nearly identical memslots. The primary benefits of this are that: 1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they can't accidentally clobber each other. 2) The VMs can have different memory-views, which is necessary for post-copy migration (the migration vCPUs on the target need to read and write to pages, when the primary guest would VMEXIT). This does not change the threat model for AMD SEV. Any memory involved is still owned by the primary guest and its initial state is still attested to through the normal SEV_LAUNCH_* flows. If userspace wanted to circumvent SEV, they could achieve the same effect by simply attaching a vCPU to the primary VM. This patch deliberately leaves userspace in charge of the memslots for the mirror, as it already has the power to mess with them in the primary guest. This patch does not support SEV-ES (much less SNP), as it does not handle handing off attested VMSAs to the mirror. For additional context, we need a Migration Helper because SEV PSP migration is far too slow for our live migration on its own. Using an in-guest migrator lets us speed this up significantly. Signed-off-by:
Nathan Tempelman <natet@google.com> Message-Id: <20210408223214.2582277-1-natet@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 20, 2021
-
-
Sean Christopherson authored
Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace to grant a VM access to a priveleged attribute, with args[0] holding a file handle to a valid SGX attribute file. The SGX subsystem restricts access to a subset of enclave attributes to provide additional security for an uncompromised kernel, e.g. to prevent malware from using the PROVISIONKEY to ensure its nodes are running inside a geniune SGX enclave and/or to obtain a stable fingerprint. To prevent userspace from circumventing such restrictions by running an enclave in a VM, KVM restricts guest access to privileged attributes by default. Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Kai Huang <kai.huang@intel.com> Message-Id: <0b099d65e933e068e3ea934b0523bab070cb8cea.1618196135.git.kai.huang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Emanuele Giuseppe Esposito authored
KVM_CAP_PPC_MULTITCE is a capability, not an ioctl. Therefore move it from section 4.97 to the new 8.31 (other capabilities). To fill the gap, move KVM_X86_SET_MSR_FILTER (was 4.126) to 4.97, and shifted Xen-related ioctl (were 4.127 - 4.130) by one place (4.126 - 4.129). Also fixed minor typo in KVM_GET_MSR_INDEX_LIST ioctl description (section 4.3). Signed-off-by:
Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210316170814.64286-1-eesposit@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 17, 2021
-
-
Paolo Bonzini authored
This capability will allow the user to know which KVM_GUESTDBG_* bits are supported. Signed-off-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-3-mlevitsk@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 09, 2021
-
-
Marc Zyngier authored
Although the KVM_ARM_VCPU_INIT documentation mention that the registers are reset to their "initial values", it doesn't describe what these values are. Describe this state explicitly. Reviewed-by:
Alexandru Elisei <alexandru.elisei@arm.com> Acked-by:
Will Deacon <will@kernel.org> Signed-off-by:
Marc Zyngier <maz@kernel.org>
-
- Apr 07, 2021
-
-
Alexandru Elisei authored
Commit 21b6f32f ("KVM: arm64: guest debug, define API headers") added the arm64 KVM_GUESTDBG_USE_HW flag for the KVM_SET_GUEST_DEBUG ioctl and commit 834bf887 ("KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG") documented and implemented the flag functionality. Since its introduction, at no point was the flag known by any name other than KVM_GUESTDBG_USE_HW for the arm64 architecture, so refer to it as such in the documentation. CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210407144857.199746-2-alexandru.elisei@arm.com
-
Jianyong Wu authored
Implement the hypervisor side of the KVM PTP interface. The service offers wall time and cycle count from host to guest. The caller must specify whether they want the host's view of either the virtual or physical counter. Signed-off-by:
Jianyong Wu <jianyong.wu@arm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com
-
- Mar 19, 2021
-
-
Emanuele Giuseppe Esposito authored
The ioctl KVM_SET_BOOT_CPU_ID fails when called after vcpu creation. Add this explanation in the documentation. Signed-off-by:
Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210319091650.11967-1-eesposit@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Mar 18, 2021
-
-
Sean Christopherson authored
Fix a plethora of issues with MSR filtering by installing the resulting filter as an atomic bundle instead of updating the live filter one range at a time. The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as the hardware MSR bitmaps won't be updated until the next VM-Enter, but the relevant software struct is atomically updated, which is what KVM really needs. Similar to the approach used for modifying memslots, make arch.msr_filter a SRCU-protected pointer, do all the work configuring the new filter outside of kvm->lock, and then acquire kvm->lock only when the new filter has been vetted and created. That way vCPU readers either see the old filter or the new filter in their entirety, not some half-baked state. Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a TOCTOU bug, but that's just the tip of the iceberg... - Nothing is __rcu annotated, making it nigh impossible to audit the code for correctness. - kvm_add_msr_filter() has an unpaired smp_wmb(). Violation of kernel coding style aside, the lack of a smb_rmb() anywhere casts all code into doubt. - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs count before taking the lock. - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug. The entire approach of updating the live filter is also flawed. While installing a new filter is inherently racy if vCPUs are running, fixing the above issues also makes it trivial to ensure certain behavior is deterministic, e.g. KVM can provide deterministic behavior for MSRs with identical settings in the old and new filters. An atomic update of the filter also prevents KVM from getting into a half-baked state, e.g. if installing a filter fails, the existing approach would leave the filter in a half-baked state, having already committed whatever bits of the filter were already processed. [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering") Cc: stable@vger.kernel.org Cc: Alexander Graf <graf@amazon.com> Reported-by:
Yuan Yao <yaoyuan0329os@gmail.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20210316184436.2544875-2-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Mar 12, 2021
-
-
Marc Zyngier authored
KVM/arm64 has forever used a 40bit default IPA space, partially due to its 32bit heritage (where the only choice is 40bit). However, there are implementations in the wild that have a *cough* much smaller *cough* IPA space, which leads to a misprogramming of VTCR_EL2, and a guest that is stuck on its first memory access if userspace dares to ask for the default IPA setting (which most VMMs do). Instead, blundly reject the creation of such VM, as we can't satisfy the requirements from userspace (with a one-off warning). Also clarify the boot warning, and document that the VM creation will fail when an unsupported IPA size is provided. Although this is an ABI change, it doesn't really change much for userspace: - the guest couldn't run before this change, but no error was returned. At least userspace knows what is happening. - a memory slot that was accepted because it did fit the default IPA space now doesn't even get a chance to be registered. The other thing that is left doing is to convince userspace to actually use the IPA space setting instead of relying on the antiquated default. Fixes: 233a7cb2 ("kvm: arm64: Allow tuning the physical address size for VM") Signed-off-by:
Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Reviewed-by:
Andrew Jones <drjones@redhat.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org
-
- Mar 09, 2021
-
-
Jonathan Neuschäfer authored
Signed-off-by:
Jonathan Neuschäfer <j.neuschaefer@gmx.net> Link: https://lore.kernel.org/r/20210301214722.2310911-1-j.neuschaefer@gmx.net Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Mar 02, 2021
-
-
David Woodhouse authored
This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by:
Joao Martins <joao.m.martins@oracle.com> Signed-off-by:
Joao Martins <joao.m.martins@oracle.com> Signed-off-by:
David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Kai Huang authored
It should be 7.23 instead of 7.22, which has already been taken by KVM_CAP_X86_BUS_LOCK_EXIT. Signed-off-by:
Kai Huang <kai.huang@intel.com> Message-Id: <20210226094832.380394-1-kai.huang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-