Skip to content
  1. Jun 17, 2021
  2. Jun 08, 2021
    • Lai Jiangshan's avatar
      KVM: X86: MMU: Use the correct inherited permissions to get shadow page · b1bd5cba
      Lai Jiangshan authored
      When computing the access permissions of a shadow page, use the effective
      permissions of the walk up to that point, i.e. the logic AND of its parents'
      permissions.  Two guest PxE entries that point at the same table gfn need to
      be shadowed with different shadow pages if their parents' permissions are
      different.  KVM currently uses the effective permissions of the last
      non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
      full ("uwx") permissions, and the effective permissions are recorded only
      in role.access and merged into the leaves, this can lead to incorrect
      reuse of a shadow page and eventually to a missing guest protection page
      fault.
      
      For example, here is a shared pagetable:
      
         pgd[]   pud[]        pmd[]            virtual address pointers
                           /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
              /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
         pgd-|           (shared pmd[] as above)
              \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                           \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
      
        pud1 and pud2 point to the same pmd table, so:
        - ptr1 and ptr3 points to the same page.
        - ptr2 and ptr4 points to the same page.
      
      (pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
      
      - First, the guest reads from ptr1 first and KVM prepares a shadow
        page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
        "u--" comes from the effective permissions of pgd, pud1 and
        pmd1, which are stored in pt->access.  "u--" is used also to get
        the pagetable for pud1, instead of "uw-".
      
      - Then the guest writes to ptr2 and KVM reuses pud1 which is present.
        The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
        even though the pud1 pmd (because of the incorrect argument to
        kvm_mmu_get_page in the previous step) has role.access="u--".
      
      - Then the guest reads from ptr3.  The hypervisor reuses pud1's
        shadow pmd for pud2, because both use "u--" for their permissions.
        Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
      
      - At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
        because pud1's shadow page structures included an "uw-" page even though
        its role.access was "u--".
      
      Any kind of shared pagetable might have the similar problem when in
      virtual machine without TDP enabled if the permissions are different
      from different ancestors.
      
      In order to fix the problem, we change pt->access to be an array, and
      any access in it will not include permissions ANDed from child ptes.
      
      The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
      
      
      Remember to test it with TDP disabled.
      
      The problem had existed long before the commit 41074d07 ("KVM: MMU:
      Fix inherited permissions for emulated guest pte updates"), and it
      is hard to find which is the culprit.  So there is no fixes tag here.
      
      Signed-off-by: default avatarLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
      Cc: stable@vger.kernel.org
      Fixes: cea0f0e7 ("[PATCH] KVM: MMU: Shadow page table caching")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b1bd5cba
  3. May 27, 2021
  4. May 07, 2021
  5. Apr 26, 2021
  6. Apr 21, 2021
  7. Apr 20, 2021
  8. Apr 17, 2021
  9. Apr 09, 2021
  10. Apr 08, 2021
  11. Apr 07, 2021
  12. Apr 06, 2021
  13. Mar 24, 2021
  14. Mar 19, 2021
  15. Mar 18, 2021
    • Sean Christopherson's avatar
      KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish · b318e8de
      Sean Christopherson authored
      Fix a plethora of issues with MSR filtering by installing the resulting
      filter as an atomic bundle instead of updating the live filter one range
      at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
      the hardware MSR bitmaps won't be updated until the next VM-Enter, but
      the relevant software struct is atomically updated, which is what KVM
      really needs.
      
      Similar to the approach used for modifying memslots, make arch.msr_filter
      a SRCU-protected pointer, do all the work configuring the new filter
      outside of kvm->lock, and then acquire kvm->lock only when the new filter
      has been vetted and created.  That way vCPU readers either see the old
      filter or the new filter in their entirety, not some half-baked state.
      
      Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
      TOCTOU bug, but that's just the tip of the iceberg...
      
        - Nothing is __rcu annotated, making it nigh impossible to audit the
          code for correctness.
        - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
          coding style aside, the lack of a smb_rmb() anywhere casts all code
          into doubt.
        - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
          count before taking the lock.
        - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
      
      The entire approach of updating the live filter is also flawed.  While
      installing a new filter is inherently racy if vCPUs are running, fixing
      the above issues also makes it trivial to ensure certain behavior is
      deterministic, e.g. KVM can provide deterministic behavior for MSRs with
      identical settings in the old and new filters.  An atomic update of the
      filter also prevents KVM from getting into a half-baked state, e.g. if
      installing a filter fails, the existing approach would leave the filter
      in a half-baked state, having already committed whatever bits of the
      filter were already processed.
      
      [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
      
      
      
      Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
      Cc: stable@vger.kernel.org
      Cc: Alexander Graf <graf@amazon.com>
      Reported-by: default avatarYuan Yao <yaoyuan0329os@gmail.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210316184436.2544875-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b318e8de
  16. Mar 15, 2021
  17. Mar 12, 2021
    • Marc Zyngier's avatar
      KVM: arm64: Reject VM creation when the default IPA size is unsupported · 7d717558
      Marc Zyngier authored
      
      
      KVM/arm64 has forever used a 40bit default IPA space, partially
      due to its 32bit heritage (where the only choice is 40bit).
      
      However, there are implementations in the wild that have a *cough*
      much smaller *cough* IPA space, which leads to a misprogramming of
      VTCR_EL2, and a guest that is stuck on its first memory access
      if userspace dares to ask for the default IPA setting (which most
      VMMs do).
      
      Instead, blundly reject the creation of such VM, as we can't
      satisfy the requirements from userspace (with a one-off warning).
      Also clarify the boot warning, and document that the VM creation
      will fail when an unsupported IPA size is provided.
      
      Although this is an ABI change, it doesn't really change much
      for userspace:
      
      - the guest couldn't run before this change, but no error was
        returned. At least userspace knows what is happening.
      
      - a memory slot that was accepted because it did fit the default
        IPA space now doesn't even get a chance to be registered.
      
      The other thing that is left doing is to convince userspace to
      actually use the IPA space setting instead of relying on the
      antiquated default.
      
      Fixes: 233a7cb2 ("kvm: arm64: Allow tuning the physical address size for VM")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Reviewed-by: default avatarEric Auger <eric.auger@redhat.com>
      Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org
      7d717558
  18. Mar 09, 2021
  19. Mar 02, 2021
  20. Feb 26, 2021
  21. Feb 22, 2021
    • Lukas Bulwahn's avatar
      KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID · 356c7558
      Lukas Bulwahn authored
      
      
      Commit c21d54f0 ("KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID
      as a system ioctl") added an enumeration in the KVM_GET_SUPPORTED_HV_CPUID
      documentation improperly for rst, and caused new warnings in make htmldocs:
      
        Documentation/virt/kvm/api.rst:4536: WARNING: Unexpected indentation.
        Documentation/virt/kvm/api.rst:4538: WARNING: Block quote ends without a blank line; unexpected unindent.
      
      Fix that issue and another historic rst markup issue from the initial
      rst conversion in the KVM_GET_SUPPORTED_HV_CPUID documentation.
      
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Message-Id: <20210104095938.24838-1-lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      356c7558
  22. Feb 10, 2021
Loading