Skip to content
  1. Oct 16, 2018
  2. Oct 09, 2018
  3. Sep 19, 2018
    • Drew Schmitt's avatar
      KVM: x86: Control guest reads of MSR_PLATFORM_INFO · 6fbbde9a
      Drew Schmitt authored
      
      
      Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
      to reads of MSR_PLATFORM_INFO.
      
      Disabling access to reads of this MSR gives userspace the control to "expose"
      this platform-dependent information to guests in a clear way. As it exists
      today, guests that read this MSR would get unpopulated information if userspace
      hadn't already set it (and prior to this patch series, only the CPUID faulting
      information could have been populated). This existing interface could be
      confusing if guests don't handle the potential for incorrect/incomplete
      information gracefully (e.g. zero reported for base frequency).
      
      Signed-off-by: default avatarDrew Schmitt <dasch@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6fbbde9a
  4. Sep 12, 2018
  5. Aug 22, 2018
  6. Aug 06, 2018
    • Wanpeng Li's avatar
      KVM: X86: Implement "send IPI" hypercall · 4180bf1b
      Wanpeng Li authored
      Using hypercall to send IPIs by one vmexit instead of one by one for
      xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
      mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
      is enabled in qemu, however, latest AMD EPYC still just supports xapic
      mode which can get great improvement by Exit-less IPIs. This patchset
      lets a guest send multicast IPIs, with at most 128 destinations per
      hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.
      
      Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
      is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141
      
      ):
      
      x2apic cluster mode, vanilla
      
       Dry-run:                         0,            2392199 ns
       Self-IPI:                  6907514,           15027589 ns
       Normal IPI:              223910476,          251301666 ns
       Broadcast IPI:                   0,         9282161150 ns
       Broadcast lock:                  0,         8812934104 ns
      
      x2apic cluster mode, pv-ipi
      
       Dry-run:                         0,            2449341 ns
       Self-IPI:                  6720360,           15028732 ns
       Normal IPI:              228643307,          255708477 ns
       Broadcast IPI:                   0,         7572293590 ns  => 22% performance boost
       Broadcast lock:                  0,         8316124651 ns
      
      x2apic physical mode, vanilla
      
       Dry-run:                         0,            3135933 ns
       Self-IPI:                  8572670,           17901757 ns
       Normal IPI:              226444334,          255421709 ns
       Broadcast IPI:                   0,        19845070887 ns
       Broadcast lock:                  0,        19827383656 ns
      
      x2apic physical mode, pv-ipi
      
       Dry-run:                         0,            2446381 ns
       Self-IPI:                  6788217,           15021056 ns
       Normal IPI:              219454441,          249583458 ns
       Broadcast IPI:                   0,         7806540019 ns  => 154% performance boost
       Broadcast lock:                  0,         9143618799 ns
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4180bf1b
    • Jim Mattson's avatar
      kvm: nVMX: Introduce KVM_CAP_NESTED_STATE · 8fcc4b59
      Jim Mattson authored
      
      
      For nested virtualization L0 KVM is managing a bit of state for L2 guests,
      this state can not be captured through the currently available IOCTLs. In
      fact the state captured through all of these IOCTLs is usually a mix of L1
      and L2 state. It is also dependent on whether the L2 guest was running at
      the moment when the process was interrupted to save its state.
      
      With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
      and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
      that is in VMX operation.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      [karahmed@ - rename structs and functions and make them ready for AMD and
                   address previous comments.
                 - handle nested.smm state.
                 - rebase & a bit of refactoring.
                 - Merge 7/8 and 8/8 into one patch. ]
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8fcc4b59
  7. Jul 30, 2018
    • Janosch Frank's avatar
      KVM: s390: Add huge page enablement control · a4499382
      Janosch Frank authored
      
      
      General KVM huge page support on s390 has to be enabled via the
      kvm.hpage module parameter. Either nested or hpage can be enabled, as
      we currently do not support vSIE for huge backed guests. Once the vSIE
      support is added we will either drop the parameter or enable it as
      default.
      
      For a guest the feature has to be enabled through the new
      KVM_CAP_S390_HPAGE_1M capability and the hpage module
      parameter. Enabling it means that cmm can't be enabled for the vm and
      disables pfmf and storage key interpretation.
      
      This is due to the fact that in some cases, in upcoming patches, we
      have to split huge pages in the guest mapping to be able to set more
      granular memory protection on 4k pages. These split pages have fake
      page tables that are not visible to the Linux memory management which
      subsequently will not manage its PGSTEs, while the SIE will. Disabling
      these features lets us manage PGSTE data in a consistent matter and
      solve that problem.
      
      Signed-off-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      a4499382
  8. Jul 21, 2018
  9. Jun 22, 2018
  10. Jun 01, 2018
  11. May 26, 2018
  12. May 25, 2018
  13. May 17, 2018
    • Michael S. Tsirkin's avatar
      kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8
      Michael S. Tsirkin authored
      
      
      KVM_HINTS_DEDICATED seems to be somewhat confusing:
      
      Guest doesn't really care whether it's the only task running on a host
      CPU as long as it's not preempted.
      
      And there are more reasons for Guest to be preempted than host CPU
      sharing, for example, with memory overcommit it can get preempted on a
      memory access, post copy migration can cause preemption, etc.
      
      Let's call it KVM_HINTS_REALTIME which seems to better
      match what guests expect.
      
      Also, the flag most be set on all vCPUs - current guests assume this.
      Note so in the documentation.
      
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      633711e8
  14. Apr 20, 2018
    • Marc Zyngier's avatar
      arm/arm64: KVM: Add PSCI version selection API · 85bd0ba1
      Marc Zyngier authored
      
      
      Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
      or 1.0 to a guest, defaulting to the latest version of the PSCI
      implementation that is compatible with the requested version. This is
      no different from doing a firmware upgrade on KVM.
      
      But in order to give a chance to hypothetical badly implemented guests
      that would have a fit by discovering something other than PSCI 0.2,
      let's provide a new API that allows userspace to pick one particular
      version of the API.
      
      This is implemented as a new class of "firmware" registers, where
      we expose the PSCI version. This allows the PSCI version to be
      save/restored as part of a guest migration, and also set to
      any supported version if the guest requires it.
      
      Cc: stable@vger.kernel.org #4.16
      Reviewed-by: default avatarChristoffer Dall <cdall@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      85bd0ba1
  15. Mar 28, 2018
  16. Mar 16, 2018
  17. Mar 14, 2018
  18. Mar 06, 2018
  19. Mar 01, 2018
  20. Feb 24, 2018
  21. Jan 19, 2018
    • Paul Mackerras's avatar
      KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds · 3214d01f
      Paul Mackerras authored
      
      
      This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace
      information about the underlying machine's level of vulnerability
      to the recently announced vulnerabilities CVE-2017-5715,
      CVE-2017-5753 and CVE-2017-5754, and whether the machine provides
      instructions to assist software to work around the vulnerabilities.
      
      The ioctl returns two u64 words describing characteristics of the
      CPU and required software behaviour respectively, plus two mask
      words which indicate which bits have been filled in by the kernel,
      for extensibility.  The bit definitions are the same as for the
      new H_GET_CPU_CHARACTERISTICS hypercall.
      
      There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which
      indicates whether the new ioctl is available.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      3214d01f
  22. Jan 16, 2018
    • Wanpeng Li's avatar
      KVM: X86: use paravirtualized TLB Shootdown · 858a43aa
      Wanpeng Li authored
      
      
      Remote TLB flush does a busy wait which is fine in bare-metal
      scenario. But with-in the guest, the vcpus might have been pre-empted or
      blocked. In this scenario, the initator vcpu would end up busy-waiting
      for a long amount of time; it also consumes CPU unnecessarily to wake
      up the target of the shootdown.
      
      This patch set adds support for KVM's new paravirtualized TLB flush;
      remote TLB flush does not wait for vcpus that are sleeping, instead
      KVM will flush the TLB as soon as the vCPU starts running again.
      
      The improvement is clearly visible when the host is overcommitted; in this
      case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
      prevents preempted vCPUs from stealing precious execution time from the
      running ones.
      
      Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
      so 64 pCPUs, and each VM is 64 vCPUs.
      
      ebizzy -M
                    vanilla    optimized     boost
      1VM            46799       48670         4%
      2VM            23962       42691        78%
      3VM            16152       37539       132%
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      858a43aa
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Enable migration of decrementer register · 5855564c
      Paul Mackerras authored
      
      
      This adds a register identifier for use with the one_reg interface
      to allow the decrementer expiry time to be read and written by
      userspace.  The decrementer expiry time is in guest timebase units
      and is equal to the sum of the decrementer and the guest timebase.
      (The expiry time is used rather than the decrementer value itself
      because the expiry time is not constantly changing, though the
      decrementer value is, while the guest vcpu is not running.)
      
      Without this, a guest vcpu migrated to a new host will see its
      decrementer set to some random value.  On POWER8 and earlier, the
      decrementer is 32 bits wide and counts down at 512MHz, so the
      guest vcpu will potentially see no decrementer interrupts for up
      to about 4 seconds, which will lead to a stall.  With POWER9, the
      decrementer is now 56 bits side, so the stall can be much longer
      (up to 2.23 years) and more noticeable.
      
      To help work around the problem in cases where userspace has not been
      updated to migrate the decrementer expiry time, we now set the
      default decrementer expiry at vcpu creation time to the current time
      rather than the maximum possible value.  This should mean an
      immediate decrementer interrupt when a migrated vcpu starts
      running.  In cases where the decrementer is 32 bits wide and more
      than 4 seconds elapse between the creation of the vcpu and when it
      first runs, the decrementer would have wrapped around to positive
      values and there may still be a stall - but this is no worse than
      the current situation.  In the large-decrementer case, we are sure
      to get an immediate decrementer interrupt (assuming the time from
      vcpu creation to first run is less than 2.23 years) and we thus
      avoid a very long stall.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      5855564c
  23. Jan 02, 2018
  24. Dec 06, 2017
  25. Dec 04, 2017
    • Brijesh Singh's avatar
      KVM: Define SEV key management command id · dc48bae0
      Brijesh Singh authored
      Define Secure Encrypted Virtualization (SEV) key management command id
      and structure. The command definition is available in SEV KM spec
      0.14 (http://support.amd.com/TechDocs/55766_SEV-KM
      
       API_Specification.pdf)
      and Documentation/virtual/kvm/amd-memory-encryption.txt.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      dc48bae0
    • Brijesh Singh's avatar
      KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl · 69eaedee
      Brijesh Singh authored
      
      
      If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION
      and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to
      register/unregister the guest memory regions which may contain the encrypted
      data (e.g guest RAM, PCI BAR, SMRAM etc).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      69eaedee
Loading