Skip to content
  1. Apr 02, 2022
    • Paolo Bonzini's avatar
      KVM: MIPS: remove reference to trap&emulate virtualization · fe5f6914
      Paolo Bonzini authored
      
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe5f6914
    • Paolo Bonzini's avatar
      KVM: x86: document limitations of MSR filtering · ce2f72e2
      Paolo Bonzini authored
      
      
      MSR filtering requires an exit to userspace that is hard to implement and
      would be very slow in the case of nested VMX vmexit and vmentry MSR
      accesses.  Document the limitation.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ce2f72e2
    • David Woodhouse's avatar
      KVM: Remove dirty handling from gfn_to_pfn_cache completely · cf1d88b3
      David Woodhouse authored
      
      
      It isn't OK to cache the dirty status of a page in internal structures
      for an indefinite period of time.
      
      Any time a vCPU exits the run loop to userspace might be its last; the
      VMM might do its final check of the dirty log, flush the last remaining
      dirty pages to the destination and complete a live migration. If we
      have internal 'dirty' state which doesn't get flushed until the vCPU
      is finally destroyed on the source after migration is complete, then
      we have lost data because that will escape the final copy.
      
      This problem already exists with the use of kvm_vcpu_unmap() to mark
      pages dirty in e.g. VMX nesting.
      
      Note that the actual Linux MM already considers the page to be dirty
      since we have a writeable mapping of it. This is just about the KVM
      dirty logging.
      
      For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
      track which gfn_to_pfn_caches have been used and explicitly mark the
      corresponding pages dirty before returning to userspace. But we would
      have needed external tracking of that anyway, rather than walking the
      full list of GPCs to find those belonging to this vCPU which are dirty.
      
      So let's rely *solely* on that external tracking, and keep it simple
      rather than laying a tempting trap for callers to fall into.
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20220303154127.202856-3-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cf1d88b3
    • Sean Christopherson's avatar
      KVM: Don't actually set a request when evicting vCPUs for GFN cache invd · df06dae3
      Sean Christopherson authored
      
      
      Don't actually set a request bit in vcpu->requests when making a request
      purely to force a vCPU to exit the guest.  Logging a request but not
      actually consuming it would cause the vCPU to get stuck in an infinite
      loop during KVM_RUN because KVM would see the pending request and bail
      from VM-Enter to service the request.
      
      Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
      nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
      too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
      without implementing handling of the request, especially since getting
      test coverage of MMU notifier interaction with specific KVM features
      usually requires a directed test.
      
      Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
      to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
      mode, it's supposed to _avoid_ waking vCPUs that are blocking.
      
      Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
      what it wants to accomplish, and to genericize the name so that it can
      used for similar but unrelated scenarios, should they arise in the future.
      Add a comment and documentation to explain why the "no action" request
      exists.
      
      Add compile-time assertions to help detect improper usage.  Use the inner
      assertless helper in the one s390 path that makes requests without a
      hardcoded request.
      
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220223165302.3205276-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      df06dae3
  2. Mar 29, 2022
  3. Mar 21, 2022
    • Oliver Upton's avatar
      KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 · 6d849191
      Oliver Upton authored
      
      
      KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
      advertise the set of quirks which may be disabled to userspace, so it is
      impossible to predict the behavior of KVM. Worse yet,
      KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
      it fails to reject attempts to set invalid quirk bits.
      
      The only valid workaround for the quirky quirks API is to add a new CAP.
      Actually advertise the set of quirks that can be disabled to userspace
      so it can predict KVM's behavior. Reject values for cap->args[0] that
      contain invalid bits.
      
      Finally, add documentation for the new capability and describe the
      existing quirks.
      
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20220301060351.442881-5-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6d849191
  4. Mar 09, 2022
  5. Mar 01, 2022
  6. Feb 25, 2022
  7. Feb 22, 2022
  8. Feb 21, 2022
    • Will Deacon's avatar
      KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field · 34739fd9
      Will Deacon authored
      
      
      When handling reset and power-off PSCI calls from the guest, we
      initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to
      re-run the vCPU after issuing the call.
      
      Unfortunately, this also means that the VMM cannot see which PSCI call
      was issued and therefore cannot distinguish between PSCI SYSTEM_RESET
      and SYSTEM_RESET2 calls, which is necessary in order to determine the
      validity of the "reset_type" in X1.
      
      Allocate bit 0 of the previously unused 'flags' field of the
      system_event structure so that we can indicate the PSCI call used to
      initiate the reset.
      
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Alexandru Elisei <alexandru.elisei@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
      34739fd9
  9. Feb 17, 2022
  10. Feb 14, 2022
  11. Feb 08, 2022
    • Alexandru Elisei's avatar
      KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU · 583cda1b
      Alexandru Elisei authored
      
      
      Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
      device ioctl. If the VCPU is scheduled on a physical CPU which has a
      different PMU, the perf events needed to emulate a guest PMU won't be
      scheduled in and the guest performance counters will stop counting. Treat
      it as an userspace error and refuse to run the VCPU in this situation.
      
      Suggested-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
      583cda1b
    • Alexandru Elisei's avatar
      KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute · 6ee7fca2
      Alexandru Elisei authored
      
      
      When KVM creates an event and there are more than one PMUs present on the
      system, perf_init_event() will go through the list of available PMUs and
      will choose the first one that can create the event. The order of the PMUs
      in this list depends on the probe order, which can change under various
      circumstances, for example if the order of the PMU nodes change in the DTB
      or if asynchronous driver probing is enabled on the kernel command line
      (with the driver_async_probe=armv8-pmu option).
      
      Another consequence of this approach is that on heteregeneous systems all
      virtual machines that KVM creates will use the same PMU. This might cause
      unexpected behaviour for userspace: when a VCPU is executing on the
      physical CPU that uses this default PMU, PMU events in the guest work
      correctly; but when the same VCPU executes on another CPU, PMU events in
      the guest will suddenly stop counting.
      
      Fortunately, perf core allows user to specify on which PMU to create an
      event by using the perf_event_attr->type field, which is used by
      perf_init_event() as an index in the radix tree of available PMUs.
      
      Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
      attribute to allow userspace to specify the arm_pmu that KVM will use when
      creating events for that VCPU. KVM will make no attempt to run the VCPU on
      the physical CPUs that share the PMU, leaving it up to userspace to manage
      the VCPU threads' affinity accordingly.
      
      To ensure that KVM doesn't expose an asymmetric system to the guest, the
      PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run,
      the PMU cannot be changed in order to avoid changing the list of available
      events for a VCPU, or to change the semantics of existing events.
      
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
      6ee7fca2
    • Marc Zyngier's avatar
      KVM: arm64: Do not change the PMU event filter after a VCPU has run · 5177fe91
      Marc Zyngier authored
      
      
      Userspace can specify which events a guest is allowed to use with the
      KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be
      identified by a guest from reading the PMCEID{0,1}_EL0 registers.
      
      Changing the PMU event filter after a VCPU has run can cause reads of the
      registers performed before the filter is changed to return different values
      than reads performed with the new event filter in place. The architecture
      defines the two registers as read-only, and this behaviour contradicts
      that.
      
      Keep track when the first VCPU has run and deny changes to the PMU event
      filter to prevent this from happening.
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      [ Alexandru E: Added commit message, updated ioctl documentation ]
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
      5177fe91
  12. Jan 28, 2022
    • Paolo Bonzini's avatar
      KVM: x86: add system attribute to retrieve full set of supported xsave states · dd6e6312
      Paolo Bonzini authored
      
      
      Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
      VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
      have not been enabled.  Probing those, for example so that they can be
      passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
      The latter is in fact worse, even though that is what the rest of the
      API uses, because it would require supported_xcr0 to be moved from the
      KVM module to the kernel just for this use.  In addition, the value
      would be nonsensical (or an error would have to be returned) until
      the KVM module is loaded in.
      
      Therefore, to limit the growth of system ioctls, add a /dev/kvm
      variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
      with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dd6e6312
  13. Jan 20, 2022
  14. Jan 14, 2022
  15. Jan 07, 2022
  16. Dec 17, 2021
    • David Rientjes's avatar
      crypto: ccp - Add SEV_INIT_EX support · 3d725965
      David Rientjes authored
      
      
      Add new module parameter to allow users to use SEV_INIT_EX instead of
      SEV_INIT. This helps users who lock their SPI bus to use the PSP for SEV
      functionality. The 'init_ex_path' parameter defaults to NULL which means
      the kernel will use SEV_INIT, if a path is specified SEV_INIT_EX will be
      used with the data found at the path. On certain PSP commands this
      file is written to as the PSP updates the NV memory region. Depending on
      file system initialization this file open may fail during module init
      but the CCP driver for SEV already has sufficient retries for platform
      initialization. During normal operation of PSP system and SEV commands
      if the PSP has not been initialized it is at run time. If the file at
      'init_ex_path' does not exist the PSP will not be initialized. The user
      must create the file prior to use with 32Kb of 0xFFs per spec.
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Co-developed-by: default avatarPeter Gonda <pgonda@google.com>
      Signed-off-by: default avatarPeter Gonda <pgonda@google.com>
      Reviewed-by: default avatarMarc Orr <marcorr@google.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Acked-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: John Allen <john.allen@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: linux-crypto@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      3d725965
  17. Dec 08, 2021
  18. Dec 07, 2021
  19. Nov 11, 2021
    • Peter Gonda's avatar
      KVM: SEV: Add support for SEV intra host migration · b5663931
      Peter Gonda authored
      
      
      For SEV to work with intra host migration, contents of the SEV info struct
      such as the ASID (used to index the encryption key in the AMD SP) and
      the list of memory regions need to be transferred to the target VM.
      This change adds a commands for a target VMM to get a source SEV VM's sev
      info.
      
      Signed-off-by: default avatarPeter Gonda <pgonda@google.com>
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMarc Orr <marcorr@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20211021174303.385706-3-pgonda@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b5663931
  20. Oct 18, 2021
  21. Oct 12, 2021
  22. Oct 04, 2021
  23. Sep 30, 2021
  24. Sep 14, 2021
  25. Aug 20, 2021
    • Maxim Levitsky's avatar
      KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ · 61e5f69e
      Maxim Levitsky authored
      
      
      KVM_GUESTDBG_BLOCKIRQ will allow KVM to block all interrupts
      while running.
      
      This change is mostly intended for more robust single stepping
      of the guest and it has the following benefits when enabled:
      
      * Resuming from a breakpoint is much more reliable.
        When resuming execution from a breakpoint, with interrupts enabled,
        more often than not, KVM would inject an interrupt and make the CPU
        jump immediately to the interrupt handler and eventually return to
        the breakpoint, to trigger it again.
      
        From the user point of view it looks like the CPU never executed a
        single instruction and in some cases that can even prevent forward
        progress, for example, when the breakpoint is placed by an automated
        script (e.g lx-symbols), which does something in response to the
        breakpoint and then continues the guest automatically.
        If the script execution takes enough time for another interrupt to
        arrive, the guest will be stuck on the same breakpoint RIP forever.
      
      * Normal single stepping is much more predictable, since it won't
        land the debugger into an interrupt handler.
      
      * RFLAGS.TF has less chance to be leaked to the guest:
      
        We set that flag behind the guest's back to do single stepping
        but if single step lands us into an interrupt/exception handler
        it will be leaked to the guest in the form of being pushed
        to the stack.
        This doesn't completely eliminate this problem as exceptions
        can still happen, but at least this reduces the chances
        of this happening.
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210811122927.900604-6-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      61e5f69e
    • Jing Zhang's avatar
      KVM: stats: Update doc for histogram statistics · 0176ec51
      Jing Zhang authored
      
      
      Add documentations for linear and logarithmic histogram statistics.
      
      Signed-off-by: default avatarJing Zhang <jingzhangos@google.com>
      Message-Id: <20210802165633.1866976-3-jingzhangos@google.com>
      [Small changes to the phrasing. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0176ec51
Loading