Skip to content
  1. Apr 02, 2022
    • Paolo Bonzini's avatar
      KVM: MIPS: remove reference to trap&emulate virtualization · fe5f6914
      Paolo Bonzini authored
      
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe5f6914
    • Paolo Bonzini's avatar
      KVM: x86: document limitations of MSR filtering · ce2f72e2
      Paolo Bonzini authored
      
      
      MSR filtering requires an exit to userspace that is hard to implement and
      would be very slow in the case of nested VMX vmexit and vmentry MSR
      accesses.  Document the limitation.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ce2f72e2
    • David Woodhouse's avatar
      KVM: Remove dirty handling from gfn_to_pfn_cache completely · cf1d88b3
      David Woodhouse authored
      
      
      It isn't OK to cache the dirty status of a page in internal structures
      for an indefinite period of time.
      
      Any time a vCPU exits the run loop to userspace might be its last; the
      VMM might do its final check of the dirty log, flush the last remaining
      dirty pages to the destination and complete a live migration. If we
      have internal 'dirty' state which doesn't get flushed until the vCPU
      is finally destroyed on the source after migration is complete, then
      we have lost data because that will escape the final copy.
      
      This problem already exists with the use of kvm_vcpu_unmap() to mark
      pages dirty in e.g. VMX nesting.
      
      Note that the actual Linux MM already considers the page to be dirty
      since we have a writeable mapping of it. This is just about the KVM
      dirty logging.
      
      For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
      track which gfn_to_pfn_caches have been used and explicitly mark the
      corresponding pages dirty before returning to userspace. But we would
      have needed external tracking of that anyway, rather than walking the
      full list of GPCs to find those belonging to this vCPU which are dirty.
      
      So let's rely *solely* on that external tracking, and keep it simple
      rather than laying a tempting trap for callers to fall into.
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20220303154127.202856-3-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cf1d88b3
    • Sean Christopherson's avatar
      KVM: Don't actually set a request when evicting vCPUs for GFN cache invd · df06dae3
      Sean Christopherson authored
      
      
      Don't actually set a request bit in vcpu->requests when making a request
      purely to force a vCPU to exit the guest.  Logging a request but not
      actually consuming it would cause the vCPU to get stuck in an infinite
      loop during KVM_RUN because KVM would see the pending request and bail
      from VM-Enter to service the request.
      
      Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
      nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
      too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
      without implementing handling of the request, especially since getting
      test coverage of MMU notifier interaction with specific KVM features
      usually requires a directed test.
      
      Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
      to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
      mode, it's supposed to _avoid_ waking vCPUs that are blocking.
      
      Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
      what it wants to accomplish, and to genericize the name so that it can
      used for similar but unrelated scenarios, should they arise in the future.
      Add a comment and documentation to explain why the "no action" request
      exists.
      
      Add compile-time assertions to help detect improper usage.  Use the inner
      assertless helper in the one s390 path that makes requests without a
      hardcoded request.
      
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220223165302.3205276-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      df06dae3
  2. Mar 29, 2022
  3. Mar 21, 2022
    • Oliver Upton's avatar
      KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 · 6d849191
      Oliver Upton authored
      
      
      KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
      advertise the set of quirks which may be disabled to userspace, so it is
      impossible to predict the behavior of KVM. Worse yet,
      KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
      it fails to reject attempts to set invalid quirk bits.
      
      The only valid workaround for the quirky quirks API is to add a new CAP.
      Actually advertise the set of quirks that can be disabled to userspace
      so it can predict KVM's behavior. Reject values for cap->args[0] that
      contain invalid bits.
      
      Finally, add documentation for the new capability and describe the
      existing quirks.
      
      Signed-off-by: default avatarOliver Upton <oupton@google.com>
      Message-Id: <20220301060351.442881-5-oupton@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6d849191
  4. Mar 09, 2022
  5. Mar 01, 2022
  6. Feb 25, 2022
  7. Feb 22, 2022
  8. Feb 21, 2022
    • Will Deacon's avatar
      KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field · 34739fd9
      Will Deacon authored
      
      
      When handling reset and power-off PSCI calls from the guest, we
      initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to
      re-run the vCPU after issuing the call.
      
      Unfortunately, this also means that the VMM cannot see which PSCI call
      was issued and therefore cannot distinguish between PSCI SYSTEM_RESET
      and SYSTEM_RESET2 calls, which is necessary in order to determine the
      validity of the "reset_type" in X1.
      
      Allocate bit 0 of the previously unused 'flags' field of the
      system_event structure so that we can indicate the PSCI call used to
      initiate the reset.
      
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Alexandru Elisei <alexandru.elisei@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
      34739fd9
  9. Feb 18, 2022
  10. Feb 17, 2022
  11. Feb 14, 2022
  12. Feb 10, 2022
    • David Matlack's avatar
      KVM: x86/mmu: Split huge pages mapped by the TDP MMU during KVM_CLEAR_DIRTY_LOG · cb00a70b
      David Matlack authored
      
      
      When using KVM_DIRTY_LOG_INITIALLY_SET, huge pages are not
      write-protected when dirty logging is enabled on the memslot. Instead
      they are write-protected once userspace invokes KVM_CLEAR_DIRTY_LOG for
      the first time and only for the specific sub-region being cleared.
      
      Enhance KVM_CLEAR_DIRTY_LOG to also try to split huge pages prior to
      write-protecting to avoid causing write-protection faults on vCPU
      threads. This also allows userspace to smear the cost of huge page
      splitting across multiple ioctls, rather than splitting the entire
      memslot as is the case when initially-all-set is not used.
      
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220119230739.2234394-17-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      cb00a70b
    • David Matlack's avatar
      KVM: x86/mmu: Split huge pages mapped by the TDP MMU when dirty logging is enabled · a3fe5dbd
      David Matlack authored
      
      
      When dirty logging is enabled without initially-all-set, try to split
      all huge pages in the memslot down to 4KB pages so that vCPUs do not
      have to take expensive write-protection faults to split huge pages.
      
      Eager page splitting is best-effort only. This commit only adds the
      support for the TDP MMU, and even there splitting may fail due to out
      of memory conditions. Failures to split a huge page is fine from a
      correctness standpoint because KVM will always follow up splitting by
      write-protecting any remaining huge pages.
      
      Eager page splitting moves the cost of splitting huge pages off of the
      vCPU threads and onto the thread enabling dirty logging on the memslot.
      This is useful because:
      
       1. Splitting on the vCPU thread interrupts vCPUs execution and is
          disruptive to customers whereas splitting on VM ioctl threads can
          run in parallel with vCPU execution.
      
       2. Splitting all huge pages at once is more efficient because it does
          not require performing VM-exit handling or walking the page table for
          every 4KiB page in the memslot, and greatly reduces the amount of
          contention on the mmu_lock.
      
      For example, when running dirty_log_perf_test with 96 virtual CPUs, 1GiB
      per vCPU, and 1GiB HugeTLB memory, the time it takes vCPUs to write to
      all of their memory after dirty logging is enabled decreased by 95% from
      2.94s to 0.14s.
      
      Eager Page Splitting is over 100x more efficient than the current
      implementation of splitting on fault under the read lock. For example,
      taking the same workload as above, Eager Page Splitting reduced the CPU
      required to split all huge pages from ~270 CPU-seconds ((2.94s - 0.14s)
      * 96 vCPU threads) to only 1.55 CPU-seconds.
      
      Eager page splitting does increase the amount of time it takes to enable
      dirty logging since it has split all huge pages. For example, the time
      it took to enable dirty logging in the 96GiB region of the
      aforementioned test increased from 0.001s to 1.55s.
      
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220119230739.2234394-16-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a3fe5dbd
  13. Feb 08, 2022
    • Alexandru Elisei's avatar
      KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU · 583cda1b
      Alexandru Elisei authored
      
      
      Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
      device ioctl. If the VCPU is scheduled on a physical CPU which has a
      different PMU, the perf events needed to emulate a guest PMU won't be
      scheduled in and the guest performance counters will stop counting. Treat
      it as an userspace error and refuse to run the VCPU in this situation.
      
      Suggested-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
      583cda1b
    • Alexandru Elisei's avatar
      KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute · 6ee7fca2
      Alexandru Elisei authored
      
      
      When KVM creates an event and there are more than one PMUs present on the
      system, perf_init_event() will go through the list of available PMUs and
      will choose the first one that can create the event. The order of the PMUs
      in this list depends on the probe order, which can change under various
      circumstances, for example if the order of the PMU nodes change in the DTB
      or if asynchronous driver probing is enabled on the kernel command line
      (with the driver_async_probe=armv8-pmu option).
      
      Another consequence of this approach is that on heteregeneous systems all
      virtual machines that KVM creates will use the same PMU. This might cause
      unexpected behaviour for userspace: when a VCPU is executing on the
      physical CPU that uses this default PMU, PMU events in the guest work
      correctly; but when the same VCPU executes on another CPU, PMU events in
      the guest will suddenly stop counting.
      
      Fortunately, perf core allows user to specify on which PMU to create an
      event by using the perf_event_attr->type field, which is used by
      perf_init_event() as an index in the radix tree of available PMUs.
      
      Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
      attribute to allow userspace to specify the arm_pmu that KVM will use when
      creating events for that VCPU. KVM will make no attempt to run the VCPU on
      the physical CPUs that share the PMU, leaving it up to userspace to manage
      the VCPU threads' affinity accordingly.
      
      To ensure that KVM doesn't expose an asymmetric system to the guest, the
      PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run,
      the PMU cannot be changed in order to avoid changing the list of available
      events for a VCPU, or to change the semantics of existing events.
      
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
      6ee7fca2
    • Marc Zyngier's avatar
      KVM: arm64: Do not change the PMU event filter after a VCPU has run · 5177fe91
      Marc Zyngier authored
      
      
      Userspace can specify which events a guest is allowed to use with the
      KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be
      identified by a guest from reading the PMCEID{0,1}_EL0 registers.
      
      Changing the PMU event filter after a VCPU has run can cause reads of the
      registers performed before the filter is changed to return different values
      than reads performed with the new event filter in place. The architecture
      defines the two registers as read-only, and this behaviour contradicts
      that.
      
      Keep track when the first VCPU has run and deny changes to the PMU event
      filter to prevent this from happening.
      
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      [ Alexandru E: Added commit message, updated ioctl documentation ]
      Signed-off-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
      5177fe91
  14. Feb 03, 2022
  15. Feb 02, 2022
    • Helge Deller's avatar
      Revert "fbcon: Disable accelerated scrolling" · 87ab9f6b
      Helge Deller authored
      
      
      This reverts commit 39aead83.
      
      Revert the first (of 2) commits which disabled scrolling acceleration in
      fbcon/fbdev.  It introduced a regression for fbdev-supported graphic cards
      because of the performance penalty by doing screen scrolling by software
      instead of using the existing graphic card 2D hardware acceleration.
      
      Console scrolling acceleration was disabled by dropping code which
      checked at runtime the driver hardware capabilities for the
      BINFO_HWACCEL_COPYAREA or FBINFO_HWACCEL_FILLRECT flags and if set, it
      enabled scrollmode SCROLL_MOVE which uses hardware acceleration to move
      screen contents.  After dropping those checks scrollmode was hard-wired
      to SCROLL_REDRAW instead, which forces all graphic cards to redraw every
      character at the new screen position when scrolling.
      
      This change effectively disabled all hardware-based scrolling acceleration for
      ALL drivers, because now all kind of 2D hardware acceleration (bitblt,
      fillrect) in the drivers isn't used any longer.
      
      The original commit message mentions that only 3 DRM drivers (nouveau, omapdrm
      and gma500) used hardware acceleration in the past and thus code for checking
      and using scrolling acceleration is obsolete.
      
      This statement is NOT TRUE, because beside the DRM drivers there are around 35
      other fbdev drivers which depend on fbdev/fbcon and still provide hardware
      acceleration for fbdev/fbcon.
      
      The original commit message also states that syzbot found lots of bugs in fbcon
      and thus it's "often the solution to just delete code and remove features".
      This is true, and the bugs - which actually affected all users of fbcon,
      including DRM - were fixed, or code was dropped like e.g. the support for
      software scrollback in vgacon (commit 973c096f).
      
      So to further analyze which bugs were found by syzbot, I've looked through all
      patches in drivers/video which were tagged with syzbot or syzkaller back to
      year 2005. The vast majority fixed the reported issues on a higher level, e.g.
      when screen is to be resized, or when font size is to be changed. The few ones
      which touched driver code fixed a real driver bug, e.g. by adding a check.
      
      But NONE of those patches touched code of either the SCROLL_MOVE or the
      SCROLL_REDRAW case.
      
      That means, there was no real reason why SCROLL_MOVE had to be ripped-out and
      just SCROLL_REDRAW had to be used instead. The only reason I can imagine so far
      was that SCROLL_MOVE wasn't used by DRM and as such it was assumed that it
      could go away. That argument completely missed the fact that SCROLL_MOVE is
      still heavily used by fbdev (non-DRM) drivers.
      
      Some people mention that using memcpy() instead of the hardware acceleration is
      pretty much the same speed. But that's not true, at least not for older graphic
      cards and machines where we see speed decreases by factor 10 and more and thus
      this change leads to console responsiveness way worse than before.
      
      That's why the original commit is to be reverted. By reverting we
      reintroduce hardware-based scrolling acceleration and fix the
      performance regression for fbdev drivers.
      
      There isn't any impact on DRM when reverting those patches.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220202135531.92183-3-deller@gmx.de
      87ab9f6b
    • Helge Deller's avatar
      Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)" · 1148836f
      Helge Deller authored
      
      
      This reverts commit b3ec8cdf.
      
      Revert the second (of 2) commits which disabled scrolling acceleration
      in fbcon/fbdev.  It introduced a regression for fbdev-supported graphic
      cards because of the performance penalty by doing screen scrolling by
      software instead of using the existing graphic card 2D hardware
      acceleration.
      
      Console scrolling acceleration was disabled by dropping code which
      checked at runtime the driver hardware capabilities for the
      BINFO_HWACCEL_COPYAREA or FBINFO_HWACCEL_FILLRECT flags and if set, it
      enabled scrollmode SCROLL_MOVE which uses hardware acceleration to move
      screen contents.  After dropping those checks scrollmode was hard-wired
      to SCROLL_REDRAW instead, which forces all graphic cards to redraw every
      character at the new screen position when scrolling.
      
      This change effectively disabled all hardware-based scrolling acceleration for
      ALL drivers, because now all kind of 2D hardware acceleration (bitblt,
      fillrect) in the drivers isn't used any longer.
      
      The original commit message mentions that only 3 DRM drivers (nouveau, omapdrm
      and gma500) used hardware acceleration in the past and thus code for checking
      and using scrolling acceleration is obsolete.
      
      This statement is NOT TRUE, because beside the DRM drivers there are around 35
      other fbdev drivers which depend on fbdev/fbcon and still provide hardware
      acceleration for fbdev/fbcon.
      
      The original commit message also states that syzbot found lots of bugs in fbcon
      and thus it's "often the solution to just delete code and remove features".
      This is true, and the bugs - which actually affected all users of fbcon,
      including DRM - were fixed, or code was dropped like e.g. the support for
      software scrollback in vgacon (commit 973c096f).
      
      So to further analyze which bugs were found by syzbot, I've looked through all
      patches in drivers/video which were tagged with syzbot or syzkaller back to
      year 2005. The vast majority fixed the reported issues on a higher level, e.g.
      when screen is to be resized, or when font size is to be changed. The few ones
      which touched driver code fixed a real driver bug, e.g. by adding a check.
      
      But NONE of those patches touched code of either the SCROLL_MOVE or the
      SCROLL_REDRAW case.
      
      That means, there was no real reason why SCROLL_MOVE had to be ripped-out and
      just SCROLL_REDRAW had to be used instead. The only reason I can imagine so far
      was that SCROLL_MOVE wasn't used by DRM and as such it was assumed that it
      could go away. That argument completely missed the fact that SCROLL_MOVE is
      still heavily used by fbdev (non-DRM) drivers.
      
      Some people mention that using memcpy() instead of the hardware acceleration is
      pretty much the same speed. But that's not true, at least not for older graphic
      cards and machines where we see speed decreases by factor 10 and more and thus
      this change leads to console responsiveness way worse than before.
      
      That's why the original commit is to be reverted. By reverting we
      reintroduce hardware-based scrolling acceleration and fix the
      performance regression for fbdev drivers.
      
      There isn't any impact on DRM when reverting those patches.
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v5.16+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220202135531.92183-2-deller@gmx.de
      1148836f
  16. Feb 01, 2022
  17. Jan 28, 2022
  18. Jan 27, 2022
Loading