Skip to content
Snippets Groups Projects
  1. Jan 20, 2022
  2. Jan 14, 2022
  3. Jan 07, 2022
  4. Dec 07, 2021
  5. Nov 11, 2021
    • Peter Gonda's avatar
      KVM: SEV: Add support for SEV intra host migration · b5663931
      Peter Gonda authored
      
      For SEV to work with intra host migration, contents of the SEV info struct
      such as the ASID (used to index the encryption key in the AMD SP) and
      the list of memory regions need to be transferred to the target VM.
      This change adds a commands for a target VMM to get a source SEV VM's sev
      info.
      
      Signed-off-by: default avatarPeter Gonda <pgonda@google.com>
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMarc Orr <marcorr@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20211021174303.385706-3-pgonda@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5663931
  6. Oct 18, 2021
  7. Oct 04, 2021
  8. Aug 20, 2021
    • Maxim Levitsky's avatar
      KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ · 61e5f69e
      Maxim Levitsky authored
      
      KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts
      while running.
      
      This change is mostly intended for more robust single stepping
      of the guest and it has the following benefits when enabled:
      
      * Resuming from a breakpoint is much more reliable.
        When resuming execution from a breakpoint, with interrupts enabled,
        more often than not, KVM would inject an interrupt and make the CPU
        jump immediately to the interrupt handler and eventually return to
        the breakpoint, to trigger it again.
      
        From the user point of view it looks like the CPU never executed a
        single instruction and in some cases that can even prevent forward
        progress, for example, when the breakpoint is placed by an automated
        script (e.g lx-symbols), which does something in response to the
        breakpoint and then continues the guest automatically.
        If the script execution takes enough time for another interrupt to
        arrive, the guest will be stuck on the same breakpoint RIP forever.
      
      * Normal single stepping is much more predictable, since it won't
        land the debugger into an interrupt handler.
      
      * RFLAGS.TF has less chance to be leaked to the guest:
      
        We set that flag behind the guest's back to do single stepping
        but if single step lands us into an interrupt/exception handler
        it will be leaked to the guest in the form of being pushed
        to the stack.
        This doesn't completely eliminate this problem as exceptions
        can still happen, but at least this reduces the chances
        of this happening.
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      61e5f69e
    • Jing Zhang's avatar
      KVM: stats: Update doc for histogram statistics · 0176ec51
      Jing Zhang authored
      
      Add documentations for linear and logarithmic histogram statistics.
      
      Signed-off-by: default avatarJing Zhang <jingzhangos@google.com>
      Message-Id: <20210802165633.1866976-3-jingzhangos@google.com>
      [Small changes to the phrasing. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0176ec51
  9. Jul 26, 2021
  10. Jul 25, 2021
  11. Jun 24, 2021
    • Aaron Lewis's avatar
      kvm: x86: Allow userspace to handle emulation errors · 19238e75
      Aaron Lewis authored
      
      Add a fallback mechanism to the in-kernel instruction emulator that
      allows userspace the opportunity to process an instruction the emulator
      was unable to.  When the in-kernel instruction emulator fails to process
      an instruction it will either inject a #UD into the guest or exit to
      userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
      not know how to proceed in an appropriate manner.  This feature lets
      userspace get involved to see if it can figure out a better path
      forward.
      
      Signed-off-by: default avatarAaron Lewis <aaronlewis@google.com>
      Reviewed-by: default avatarDavid Edmondson <david.edmondson@oracle.com>
      Message-Id: <20210510144834.658457-2-aaronlewis@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      19238e75
    • Sean Christopherson's avatar
      KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken · 63f5a190
      Sean Christopherson authored
      
      Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
      instability.  Initialize last_vmentry_cpu to -1 and use it to detect if
      the vCPU has been run at least once when its CPUID model is changed.
      
      KVM does not correctly handle changes to paging related settings in the
      guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc...  KVM
      could theoretically zap all shadow pages, but actually making that happen
      is a mess due to lock inversion (vcpu->mutex is held).  And even then,
      updating paging settings on the fly would only work if all vCPUs are
      stopped, updated in concert with identical settings, then restarted.
      
      To support running vCPUs with different vCPU models (that affect paging),
      KVM would need to track all relevant information in kvm_mmu_page_role.
      Note, that's the _page_ role, not the full mmu_role.  Updating mmu_role
      isn't sufficient as a vCPU can reuse a shadow page translation that was
      created by a vCPU with different settings and thus completely skip the
      reserved bit checks (that are tied to CPUID).
      
      Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
      it would require doubling gfn_track from a u16 to a u32, i.e. would
      increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
      E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
      would all need to be tracked.
      
      In practice, there is no remotely sane use case for changing any paging
      related CPUID entries on the fly, so just sweep it under the rug (after
      yelling at userspace).
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210622175739.3610207-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      63f5a190
    • Jing Zhang's avatar
      KVM: stats: Add documentation for binary statistics interface · fdc09ddd
      Jing Zhang authored
      
      This new API provides a file descriptor for every VM and VCPU to read
      KVM statistics data in binary format.
      It is meant to provide a lightweight, flexible, scalable and efficient
      lock-free solution for user space telemetry applications to pull the
      statistics data periodically for large scale systems. The pulling
      frequency could be as high as a few times per second.
      The statistics descriptors are defined by KVM in kernel and can be
      by userspace to discover VM/VCPU statistics during the one-time setup
      stage.
      The statistics data itself could be read out by userspace telemetry
      periodically without any extra parsing or setup effort.
      There are a few existed interface protocols and definitions, but no
      one can fulfil all the requirements this interface implemented as
      below:
      1. During high frequency periodic stats reading, there should be no
         extra efforts except the stats data read itself.
      2. Support stats annotation, like type (cumulative, instantaneous,
         peak, histogram, etc) and unit (counter, time, size, cycles, etc).
      3. The stats data reading should be free of lock/synchronization. We
         don't care about the consistency between all the stats data. All
         stats data can not be read out at exactly the same time. We really
         care about the change or trend of the stats data. The lock-free
         solution is not just for efficiency and scalability, also for the
         stats data accuracy and usability. For example, in the situation
         that all the stats data readings are protected by a global lock,
         if one VCPU died somehow with that lock held, then all stats data
         reading would be blocked, then we have no way from stats data that
         which VCPU has died.
      4. The stats data reading workload can be handed over to other
         unprivileged process.
      
      Reviewed-by: default avatarDavid Matlack <dmatlack@google.com>
      Reviewed-by: default avatarRicardo Koller <ricarkol@google.com>
      Reviewed-by: default avatarKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Reviewed-by: default avatarFuad Tabba <tabba@google.com>
      Signed-off-by: default avatarJing Zhang <jingzhangos@google.com>
      Message-Id: <20210618222709.1858088-6-jingzhangos@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fdc09ddd
  12. Jun 22, 2021
  13. Jun 17, 2021
  14. May 20, 2021
  15. May 07, 2021
  16. Apr 26, 2021
  17. Apr 21, 2021
    • Nathan Tempelman's avatar
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman authored
      
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      
      Signed-off-by: default avatarNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  18. Apr 20, 2021
  19. Apr 17, 2021
  20. Apr 09, 2021
  21. Apr 07, 2021
  22. Mar 19, 2021
  23. Mar 18, 2021
    • Sean Christopherson's avatar
      KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish · b318e8de
      Sean Christopherson authored
      Fix a plethora of issues with MSR filtering by installing the resulting
      filter as an atomic bundle instead of updating the live filter one range
      at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
      the hardware MSR bitmaps won't be updated until the next VM-Enter, but
      the relevant software struct is atomically updated, which is what KVM
      really needs.
      
      Similar to the approach used for modifying memslots, make arch.msr_filter
      a SRCU-protected pointer, do all the work configuring the new filter
      outside of kvm->lock, and then acquire kvm->lock only when the new filter
      has been vetted and created.  That way vCPU readers either see the old
      filter or the new filter in their entirety, not some half-baked state.
      
      Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
      TOCTOU bug, but that's just the tip of the iceberg...
      
        - Nothing is __rcu annotated, making it nigh impossible to audit the
          code for correctness.
        - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
          coding style aside, the lack of a smb_rmb() anywhere casts all code
          into doubt.
        - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
          count before taking the lock.
        - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
      
      The entire approach of updating the live filter is also flawed.  While
      installing a new filter is inherently racy if vCPUs are running, fixing
      the above issues also makes it trivial to ensure certain behavior is
      deterministic, e.g. KVM can provide deterministic behavior for MSRs with
      identical settings in the old and new filters.  An atomic update of the
      filter also prevents KVM from getting into a half-baked state, e.g. if
      installing a filter fails, the existing approach would leave the filter
      in a half-baked state, having already committed whatever bits of the
      filter were already processed.
      
      [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
      
      
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Reported-by: default avatarYuan Yao <yaoyuan0329os@gmail.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210316184436.2544875-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b318e8de
  24. Mar 12, 2021
    • Marc Zyngier's avatar
      KVM: arm64: Reject VM creation when the default IPA size is unsupported · 7d717558
      Marc Zyngier authored
      
      KVM/arm64 has forever used a 40bit default IPA space, partially
      due to its 32bit heritage (where the only choice is 40bit).
      
      However, there are implementations in the wild that have a *cough*
      much smaller *cough* IPA space, which leads to a misprogramming of
      VTCR_EL2, and a guest that is stuck on its first memory access
      if userspace dares to ask for the default IPA setting (which most
      VMMs do).
      
      Instead, blundly reject the creation of such VM, as we can't
      satisfy the requirements from userspace (with a one-off warning).
      Also clarify the boot warning, and document that the VM creation
      will fail when an unsupported IPA size is provided.
      
      Although this is an ABI change, it doesn't really change much
      for userspace:
      
      - the guest couldn't run before this change, but no error was
        returned. At least userspace knows what is happening.
      
      - a memory slot that was accepted because it did fit the default
        IPA space now doesn't even get a chance to be registered.
      
      The other thing that is left doing is to convince userspace to
      actually use the IPA space setting instead of relying on the
      antiquated default.
      
      Fixes: 233a7cb2 ("kvm: arm64: Allow tuning the physical address size for VM")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org
      7d717558
  25. Mar 09, 2021
  26. Mar 02, 2021
Loading