Skip to content
  1. Nov 23, 2022
  2. Nov 17, 2022
    • David Matlack's avatar
      KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL · 9eb8ca04
      David Matlack authored
      
      
      Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL on every halt,
      rather than just sampling the module parameter when the VM is first
      created. This restore the original behavior of kvm.halt_poll_ns for VMs
      that have not opted into KVM_CAP_HALT_POLL.
      
      Notably, this change restores the ability for admins to disable or
      change the maximum halt-polling time system wide for VMs not using
      KVM_CAP_HALT_POLL.
      
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Fixes: acd05785 ("kvm: add capability for halt polling")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20221117001657.1067231-4-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9eb8ca04
    • David Matlack's avatar
      KVM: Avoid re-reading kvm->max_halt_poll_ns during halt-polling · 175d5dc7
      David Matlack authored
      
      
      Avoid re-reading kvm->max_halt_poll_ns multiple times during
      halt-polling except when it is explicitly useful, e.g. to check if the
      max time changed across a halt. kvm->max_halt_poll_ns can be changed at
      any time by userspace via KVM_CAP_HALT_POLL.
      
      This bug is unlikely to cause any serious side-effects. In the worst
      case one halt polls for shorter or longer than it should, and then is
      fixed up on the next halt. Furthmore, this is still possible since
      kvm->max_halt_poll_ns are not synchronized with halts.
      
      Fixes: acd05785 ("kvm: add capability for halt polling")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20221117001657.1067231-3-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      175d5dc7
    • David Matlack's avatar
      KVM: Cap vcpu->halt_poll_ns before halting rather than after · 97b6847a
      David Matlack authored
      
      
      Cap vcpu->halt_poll_ns based on the max halt polling time just before
      halting, rather than after the last halt. This arguably provides better
      accuracy if an admin disables halt polling in between halts, although
      the improvement is nominal.
      
      A side-effect of this change is that grow_halt_poll_ns() no longer needs
      to access vcpu->kvm->max_halt_poll_ns, which will be useful in a future
      commit where the max halt polling time can come from the module parameter
      halt_poll_ns instead.
      
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20221117001657.1067231-2-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      97b6847a
  3. Oct 31, 2022
  4. Oct 27, 2022
    • Sean Christopherson's avatar
      KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache · ecbcf030
      Sean Christopherson authored
      
      
      Reject kvm_gpc_check() and kvm_gpc_refresh() if the cache is inactive.
      Not checking the active flag during refresh is particularly egregious, as
      KVM can end up with a valid, inactive cache, which can lead to a variety
      of use-after-free bugs, e.g. consuming a NULL kernel pointer or missing
      an mmu_notifier invalidation due to the cache not being on the list of
      gfns to invalidate.
      
      Note, "active" needs to be set if and only if the cache is on the list
      of caches, i.e. is reachable via mmu_notifier events.  If a relevant
      mmu_notifier event occurs while the cache is "active" but not on the
      list, KVM will not acquire the cache's lock and so will not serailize
      the mmu_notifier event with active users and/or kvm_gpc_refresh().
      
      A race between KVM_XEN_ATTR_TYPE_SHARED_INFO and KVM_XEN_HVM_EVTCHN_SEND
      can be exploited to trigger the bug.
      
      1. Deactivate shinfo cache:
      
      kvm_xen_hvm_set_attr
      case KVM_XEN_ATTR_TYPE_SHARED_INFO
       kvm_gpc_deactivate
        kvm_gpc_unmap
         gpc->valid = false
         gpc->khva = NULL
        gpc->active = false
      
      Result: active = false, valid = false
      
      2. Cause cache refresh:
      
      kvm_arch_vm_ioctl
      case KVM_XEN_HVM_EVTCHN_SEND
       kvm_xen_hvm_evtchn_send
        kvm_xen_set_evtchn
         kvm_xen_set_evtchn_fast
          kvm_gpc_check
          return -EWOULDBLOCK because !gpc->valid
         kvm_xen_set_evtchn_fast
          return -EWOULDBLOCK
         kvm_gpc_refresh
          hva_to_pfn_retry
           gpc->valid = true
           gpc->khva = not NULL
      
      Result: active = false, valid = true
      
      3. Race ioctl KVM_XEN_HVM_EVTCHN_SEND against ioctl
      KVM_XEN_ATTR_TYPE_SHARED_INFO:
      
      kvm_arch_vm_ioctl
      case KVM_XEN_HVM_EVTCHN_SEND
       kvm_xen_hvm_evtchn_send
        kvm_xen_set_evtchn
         kvm_xen_set_evtchn_fast
          read_lock gpc->lock
                                                kvm_xen_hvm_set_attr case
                                                KVM_XEN_ATTR_TYPE_SHARED_INFO
                                                 mutex_lock kvm->lock
                                                 kvm_xen_shared_info_init
                                                  kvm_gpc_activate
                                                   gpc->khva = NULL
          kvm_gpc_check
           [ Check passes because gpc->valid is
             still true, even though gpc->khva
             is already NULL. ]
          shinfo = gpc->khva
          pending_bits = shinfo->evtchn_pending
          CRASH: test_and_set_bit(..., pending_bits)
      
      Fixes: 982ed0de ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
      Cc: stable@vger.kernel.org
      Reported-by: default avatar: Michal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221013211234.1318131-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ecbcf030
    • Michal Luczaj's avatar
      KVM: Initialize gfn_to_pfn_cache locks in dedicated helper · 52491a38
      Michal Luczaj authored
      
      
      Move the gfn_to_pfn_cache lock initialization to another helper and
      call the new helper during VM/vCPU creation.  There are race
      conditions possible due to kvm_gfn_to_pfn_cache_init()'s
      ability to re-initialize the cache's locks.
      
      For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and
      kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock.
      
                      (thread 1)                |           (thread 2)
                                                |
       kvm_xen_set_evtchn_fast                  |
        read_lock_irqsave(&gpc->lock, ...)      |
                                                | kvm_gfn_to_pfn_cache_init
                                                |  rwlock_init(&gpc->lock)
        read_unlock_irqrestore(&gpc->lock, ...) |
      
      Rename "cache_init" and "cache_destroy" to activate+deactivate to
      avoid implying that the cache really is destroyed/freed.
      
      Note, there more races in the newly named kvm_gpc_activate() that will
      be addressed separately.
      
      Fixes: 982ed0de ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      [sean: call out that this is a bug fix]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221013211234.1318131-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      52491a38
    • Hou Wenlong's avatar
      KVM: debugfs: Return retval of simple_attr_open() if it fails · 180418e2
      Hou Wenlong authored
      
      
      Although simple_attr_open() fails only with -ENOMEM with current code
      base, it would be nicer to return retval of simple_attr_open() directly
      in kvm_debugfs_open().
      
      No functional change intended.
      
      Signed-off-by: default avatarHou Wenlong <houwenlong.hwl@antgroup.com>
      Message-Id: <69d64d93accd1f33691b8a383ae555baee80f943.1665975828.git.houwenlong.hwl@antgroup.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      180418e2
  5. Oct 22, 2022
  6. Oct 07, 2022
  7. Sep 29, 2022
  8. Sep 26, 2022
  9. Aug 19, 2022
    • Li kunyu's avatar
      KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device() · eceb6e1d
      Li kunyu authored
      
      
      The variable is initialized but it is only used after its assignment.
      
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi kunyu <kunyu@nfschina.com>
      Message-Id: <20220819021535.483702-1-kunyu@nfschina.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eceb6e1d
    • Li kunyu's avatar
      KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow() · 28249139
      Li kunyu authored
      
      
      The variable is initialized but it is only used after its assignment.
      
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi kunyu <kunyu@nfschina.com>
      Message-Id: <20220819022804.483914-1-kunyu@nfschina.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28249139
    • Chao Peng's avatar
      KVM: Rename mmu_notifier_* to mmu_invalidate_* · 20ec3ebd
      Chao Peng authored
      
      
      The motivation of this renaming is to make these variables and related
      helper functions less mmu_notifier bound and can also be used for non
      mmu_notifier based page invalidation. mmu_invalidate_* was chosen to
      better describe the purpose of 'invalidating' a page that those
      variables are used for.
      
        - mmu_notifier_seq/range_start/range_end are renamed to
          mmu_invalidate_seq/range_start/range_end.
      
        - mmu_notifier_retry{_hva} helper functions are renamed to
          mmu_invalidate_retry{_hva}.
      
        - mmu_notifier_count is renamed to mmu_invalidate_in_progress to
          avoid confusion with mn_active_invalidate_count.
      
        - While here, also update kvm_inc/dec_notifier_count() to
          kvm_mmu_invalidate_begin/end() to match the change for
          mmu_notifier_count.
      
      No functional change intended.
      
      Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      20ec3ebd
    • Sean Christopherson's avatar
      KVM: Move coalesced MMIO initialization (back) into kvm_create_vm() · c2b82397
      Sean Christopherson authored
      
      
      Invoke kvm_coalesced_mmio_init() from kvm_create_vm() now that allocating
      and initializing coalesced MMIO objects is separate from registering any
      associated devices.  Moving coalesced MMIO cleans up the last oddity
      where KVM does VM creation/initialization after kvm_create_vm(), and more
      importantly after kvm_arch_post_init_vm() is called and the VM is added
      to the global vm_list, i.e. after the VM is fully created as far as KVM
      is concerned.
      
      Originally, kvm_coalesced_mmio_init() was called by kvm_create_vm(), but
      the original implementation was completely devoid of error handling.
      Commit 6ce5a090 ("KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s
      error handling" fixed the various bugs, and in doing so rightly moved the
      call to after kvm_create_vm() because kvm_coalesced_mmio_init() also
      registered the coalesced MMIO device.  Commit 2b3c246a ("KVM: Make
      coalesced mmio use a device per zone") cleaned up that mess by having
      each zone register a separate device, i.e. moved device registration to
      its logical home in kvm_vm_ioctl_register_coalesced_mmio().  As a result,
      kvm_coalesced_mmio_init() is now a "pure" initialization helper and can
      be safely called from kvm_create_vm().
      
      Opportunstically drop the #ifdef, KVM provides stubs for
      kvm_coalesced_mmio_{init,free}() when CONFIG_KVM_MMIO=n (s390).
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220816053937.2477106-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2b82397
    • Sean Christopherson's avatar
      KVM: Unconditionally get a ref to /dev/kvm module when creating a VM · 405294f2
      Sean Christopherson authored
      
      
      Unconditionally get a reference to the /dev/kvm module when creating a VM
      instead of using try_get_module(), which will fail if the module is in
      the process of being forcefully unloaded.  The error handling when
      try_get_module() fails doesn't properly unwind all that has been done,
      e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
      from the global list.  Not removing VMs from the global list tends to be
      fatal, e.g. leads to use-after-free explosions.
      
      The obvious alternative would be to add proper unwinding, but the
      justification for using try_get_module(), "rmmod --wait", is completely
      bogus as support for "rmmod --wait", i.e. delete_module() without
      O_NONBLOCK, was removed by commit 3f2b9c9c ("module: remove rmmod
      --wait option.") nearly a decade ago.
      
      It's still possible for try_get_module() to fail due to the module dying
      (more like being killed), as the module will be tagged MODULE_STATE_GOING
      by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice
      with forced unloading is an exercise in futility and gives a falsea sense
      of security.  Using try_get_module() only prevents acquiring _new_
      references, it doesn't magically put the references held by other VMs,
      and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but
      guaranteed to cause spectacular fireworks; the window where KVM will fail
      try_get_module() is tiny compared to the window where KVM is building and
      running the VM with an elevated module refcount.
      
      Addressing KVM's inability to play nice with "rmmod --force" is firmly
      out-of-scope.  Forcefully unloading any module taints kernel (for obvious
      reasons)  _and_ requires the kernel to be built with
      CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the
      amusing disclaimer that it's "mainly for kernel developers and desperate
      users".  In other words, KVM is free to scoff at bug reports due to using
      "rmmod --force" while VMs may be running.
      
      Fixes: 5f6de5cb ("KVM: Prevent module exit until all VMs are freed")
      Cc: stable@vger.kernel.org
      Cc: David Matlack <dmatlack@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220816053937.2477106-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      405294f2
    • Sean Christopherson's avatar
      KVM: Properly unwind VM creation if creating debugfs fails · 4ba4f419
      Sean Christopherson authored
      
      
      Properly unwind VM creation if kvm_create_vm_debugfs() fails.  A recent
      change to invoke kvm_create_vm_debug() in kvm_create_vm() was led astray
      by buggy try_get_module() handling adding by commit 5f6de5cb ("KVM:
      Prevent module exit until all VMs are freed").  The debugfs error path
      effectively inherits the bad error path of try_module_get(), e.g. KVM
      leaves the to-be-free VM on vm_list even though KVM appears to do the
      right thing by calling module_put() and falling through.
      
      Opportunistically hoist kvm_create_vm_debugfs() above the call to
      kvm_arch_post_init_vm() so that the "post-init" arch hook is actually
      invoked after the VM is initialized (ignoring kvm_coalesced_mmio_init()
      for the moment).  x86 is the only non-nop implementation of the post-init
      hook, and it doesn't allocate/initialize any objects that are reachable
      via debugfs code (spawns a kthread worker for the NX huge page mitigation).
      
      Leave the buggy try_get_module() alone for now, it will be fixed in a
      separate commit.
      
      Fixes: b74ed7a6 ("KVM: Actually create debugfs in kvm_create_vm()")
      Reported-by: default avatar <syzbot+744e173caec2e1627ee0@syzkaller.appspotmail.com>
      Cc: Oliver Upton <oliver.upton@linux.dev>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarOliver Upton <oliver.upton@linux.dev>
      Message-Id: <20220816053937.2477106-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ba4f419
  10. Aug 10, 2022
  11. Jul 29, 2022
  12. Jun 24, 2022
    • Vineeth Pillai's avatar
      KVM: debugfs: expose pid of vcpu threads · e36de87d
      Vineeth Pillai authored
      
      
      Add a new debugfs file to expose the pid of each vcpu threads. This
      is very helpful for userland tools to get the vcpu pids without
      worrying about thread naming conventions of the VMM.
      
      Signed-off-by: default avatarVineeth Pillai (Google) <vineeth@bitbyteword.org>
      Message-Id: <20220523190327.2658-1-vineeth@bitbyteword.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e36de87d
    • David Matlack's avatar
      KVM: Allow for different capacities in kvm_mmu_memory_cache structs · 837f66c7
      David Matlack authored
      
      
      Allow the capacity of the kvm_mmu_memory_cache struct to be chosen at
      declaration time rather than being fixed for all declarations. This will
      be used in a follow-up commit to declare an cache in x86 with a capacity
      of 512+ objects without having to increase the capacity of all caches in
      KVM.
      
      This change requires each cache now specify its capacity at runtime,
      since the cache struct itself no longer has a fixed capacity known at
      compile time. To protect against someone accidentally defining a
      kvm_mmu_memory_cache struct directly (without the extra storage), this
      commit includes a WARN_ON() in kvm_mmu_topup_memory_cache().
      
      In order to support different capacities, this commit changes the
      objects pointer array to be dynamically allocated the first time the
      cache is topped-up.
      
      While here, opportunistically clean up the stack-allocated
      kvm_mmu_memory_cache structs in riscv and arm64 to use designated
      initializers.
      
      No functional change intended.
      
      Reviewed-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20220516232138.1783324-22-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      837f66c7
  13. Jun 20, 2022
    • Sean Christopherson's avatar
      KVM: Do not zero initialize 'pfn' in hva_to_pfn() · 943dfea8
      Sean Christopherson authored
      
      
      Drop the unnecessary initialization of the local 'pfn' variable in
      hva_to_pfn().  First and foremost, '0' is not an invalid pfn, it's a
      perfectly valid pfn on most architectures.  I.e. if hva_to_pfn() were to
      return an "uninitializd" pfn, it would actually be interpeted as a legal
      pfn by most callers.
      
      Second, hva_to_pfn() can't return an uninitialized pfn as hva_to_pfn()
      explicitly sets pfn to an error value (or returns an error value directly)
      if a helper returns failure, and all helpers set the pfn on success.
      
      The zeroing of 'pfn' was introduced by commit 2fc84311 ("KVM:
      reorganize hva_to_pfn"), probably to avoid "uninitialized variable"
      warnings on statements that return pfn.  However, no compiler seems
      to produce them, making the initialization unnecessary.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      943dfea8
    • Sean Christopherson's avatar
      KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page() · b14b2690
      Sean Christopherson authored
      
      
      Rename and refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()
      to better reflect what KVM is actually checking, and to eliminate extra
      pfn_to_page() lookups.  The kvm_release_pfn_*() an kvm_try_get_pfn()
      helpers in particular benefit from "refouncted" nomenclature, as it's not
      all that obvious why KVM needs to get/put refcounts for some PG_reserved
      pages (ZERO_PAGE and ZONE_DEVICE).
      
      Add a comment to call out that the list of exceptions to PG_reserved is
      all but guaranteed to be incomplete.  The list has mostly been compiled
      by people throwing noodles at KVM and finding out they stick a little too
      well, e.g. the ZERO_PAGE's refcount overflowed and ZONE_DEVICE pages
      didn't get freed.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-10-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b14b2690
    • Sean Christopherson's avatar
      KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page() · 284dc493
      Sean Christopherson authored
      
      
      Operate on a 'struct page' instead of a pfn when checking if a page is a
      ZONE_DEVICE page, and rename the helper accordingly.  Generally speaking,
      KVM doesn't actually care about ZONE_DEVICE memory, i.e. shouldn't do
      anything special for ZONE_DEVICE memory.  Rather, KVM wants to treat
      ZONE_DEVICE memory like regular memory, and the need to identify
      ZONE_DEVICE memory only arises as an exception to PG_reserved pages. In
      other words, KVM should only ever check for ZONE_DEVICE memory after KVM
      has already verified that there is a struct page associated with the pfn.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-9-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      284dc493
    • Sean Christopherson's avatar
      KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page() · b1624f99
      Sean Christopherson authored
      
      
      Drop helpers to convert a gfn/gpa to a 'struct page' in the context of a
      vCPU.  KVM doesn't require that guests be backed by 'struct page' memory,
      thus any use of helpers that assume 'struct page' is bound to be flawed,
      as was the case for the recently removed last user in x86's nested VMX.
      
      No functional change intended.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-8-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b1624f99
    • Sean Christopherson's avatar
      KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn · 6573a691
      Sean Christopherson authored
      
      
      Drop a WARN_ON() if kvm_pfn_to_page() encounters a "reserved" pfn, which
      in this context means a struct page that has PG_reserved but is not a/the
      ZERO_PAGE and is not a ZONE_DEVICE page.  The usage, via gfn_to_page(),
      in x86 is safe as gfn_to_page() is used only to retrieve a page from
      KVM-controlled memslot, but the usage in PPC and s390 operates on
      arbitrary gfns and thus memslots that can be backed by incompatible
      memory.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-7-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6573a691
    • Sean Christopherson's avatar
      KVM: Avoid pfn_to_page() and vice versa when releasing pages · 8e1c6914
      Sean Christopherson authored
      Invert the order of KVM's page/pfn release helpers so that the "inner"
      helper operates on a page instead of a pfn.  As pointed out by Linus[*],
      converting between struct page and a pfn isn't necessarily cheap, and
      that's not even counting the overhead of is_error_noslot_pfn() and
      kvm_is_reserved_pfn().  Even if the checks were dirt cheap, there's no
      reason to convert from a page to a pfn and back to a page, just to mark
      the page dirty/accessed or to put a reference to the page.
      
      Opportunistically drop a stale declaration of kvm_set_page_accessed()
      from kvm_host.h (there was no implementation).
      
      No functional change intended.
      
      [*] https://lore.kernel.org/all/CAHk-=wifQimj2d6npq-wCi5onYPjzQg4vyO4tFcPJJZr268cRw@mail.gmail.com
      
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-5-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8e1c6914
    • Sean Christopherson's avatar
      KVM: Don't set Accessed/Dirty bits for ZERO_PAGE · a1040b0d
      Sean Christopherson authored
      
      
      Don't set Accessed/Dirty bits for a struct page with PG_reserved set,
      i.e. don't set A/D bits for the ZERO_PAGE.  The ZERO_PAGE (or pages
      depending on the architecture) should obviously never be written, and
      similarly there's no point in marking it accessed as the page will never
      be swapped out or reclaimed.  The comment in page-flags.h is quite clear
      that PG_reserved pages should be managed only by their owner, and
      strictly following that mandate also simplifies KVM's logic.
      
      Fixes: 7df003c8 ("KVM: fix overflow of zero page refcount with ksm running")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-4-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a1040b0d
    • Sean Christopherson's avatar
      KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn() · 28b85ae0
      Sean Christopherson authored
      
      
      Remove a check from kvm_release_pfn() to bail if the provided @pfn is
      zero.  Zero is a perfectly valid pfn on most architectures, and should
      not be used to indicate an error or an invalid pfn.  The bogus check was
      added by commit 91724814 ("x86/kvm: Cache gfn to pfn translation"),
      which also did the bad thing of zeroing the pfn and gfn to mark a cache
      invalid.  Thankfully, that bad behavior was axed by commit 357a18ad
      ("KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache").
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20220429010416.2788472-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      28b85ae0
  14. Jun 15, 2022
  15. Jun 09, 2022
    • Maxim Levitsky's avatar
      KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking · 18869f26
      Maxim Levitsky authored
      
      
      On SVM, if preemption happens right after the call to finish_rcuwait
      but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself
      will re-enable AVIC, and then we will try to re-enable it again
      in kvm_arch_vcpu_unblocking which will lead to a warning
      in __avic_vcpu_load.
      
      The same problem can happen if the vCPU is preempted right after the call
      to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait
      and in this case, we will end up with AVIC enabled during sleep -
      Ooops.
      
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      18869f26
  16. Jun 08, 2022
    • Zeng Guang's avatar
      KVM: Move kvm_arch_vcpu_precreate() under kvm->lock · 1d5e740d
      Zeng Guang authored
      
      
      kvm_arch_vcpu_precreate() targets to handle arch specific VM resource
      to be prepared prior to the actual creation of vCPU. For example, x86
      platform may need do per-VM allocation based on max_vcpu_ids at the
      first vCPU creation. It probably leads to concurrency control on this
      allocation as multiple vCPU creation could happen simultaneously. From
      the architectual point of view, it's necessary to execute
      kvm_arch_vcpu_precreate() under protect of kvm->lock.
      
      Currently only arm64, x86 and s390 have non-nop implementations at the
      stage of vCPU pre-creation. Remove the lock acquiring in s390's design
      and make sure all architecture can run kvm_arch_vcpu_precreate() safely
      under kvm->lock without recrusive lock issue.
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarZeng Guang <guang.zeng@intel.com>
      Message-Id: <20220419154409.11842-1-guang.zeng@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1d5e740d
Loading