Skip to content
  1. Sep 03, 2018
  2. Sep 01, 2018
  3. Aug 31, 2018
  4. Aug 30, 2018
  5. Aug 27, 2018
  6. Aug 24, 2018
  7. Aug 23, 2018
  8. Aug 22, 2018
    • Ard Biesheuvel's avatar
      module: use relative references for __ksymtab entries · 7290d580
      Ard Biesheuvel authored
      An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries,
      each consisting of two 64-bit fields containing absolute references, to
      the symbol itself and to a char array containing its name, respectively.
      
      When we build the same configuration with KASLR enabled, we end up with an
      additional ~192 KB of relocations in the .init section, i.e., one 24 byte
      entry for each absolute reference, which all need to be processed at boot
      time.
      
      Given how the struct kernel_symbol that describes each entry is completely
      local to module.c (except for the references emitted by EXPORT_SYMBOL()
      itself), we can easily modify it to contain two 32-bit relative references
      instead.  This reduces the size of the __ksymtab section by 50% for all
      64-bit architectures, and gets rid of the runtime relocations entirely for
      architectures implementing KASLR, either via standard PIE linking (arm64)
      or using custom host tools (x86).
      
      Note that the binary search involving __ksymtab contents relies on each
      section being sorted by symbol name.  This is implemented based on the
      input section names, not the names in the ksymtab entries, so this patch
      does not interfere with that.
      
      Given that the use of place-relative relocations requires support both in
      the toolchain and in the module loader, we cannot enable this feature for
      all architectures.  So make it dependent on whether
      CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.org
      
      
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarJessica Yu <jeyu@kernel.org>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarWill Deacon <will.deacon@arm.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morris <james.morris@microsoft.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7290d580
    • Sean Christopherson's avatar
      KVM: vmx: Add defines for SGX ENCLS exiting · 802ec461
      Sean Christopherson authored
      
      
      Hardware support for basic SGX virtualization adds a new execution
      control (ENCLS_EXITING), VMCS field (ENCLS_EXITING_BITMAP) and exit
      reason (ENCLS), that enables a VMM to intercept specific ENCLS leaf
      functions, e.g. to inject faults when the VMM isn't exposing SGX to
      a VM.  When ENCLS_EXITING is enabled, the VMM can set/clear bits in
      the bitmap to intercept/allow ENCLS leaf functions in non-root, e.g.
      setting bit 2 in the ENCLS_EXITING_BITMAP will cause ENCLS[EINIT]
      to VMExit(ENCLS).
      
      Note: EXIT_REASON_ENCLS was previously added by commit 1f519992
      ("KVM: VMX: add missing exit reasons").
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20180814163334.25724-2-sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      802ec461
  9. Aug 20, 2018
  10. Aug 17, 2018
    • Sean Christopherson's avatar
      x86/speculation/l1tf: Exempt zeroed PTEs from inversion · f19f5c49
      Sean Christopherson authored
      
      
      It turns out that we should *not* invert all not-present mappings,
      because the all zeroes case is obviously special.
      
      clear_page() does not undergo the XOR logic to invert the address bits,
      i.e. PTE, PMD and PUD entries that have not been individually written
      will have val=0 and so will trigger __pte_needs_invert(). As a result,
      {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
      (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
      because the page at physical address 0 is reserved early in boot
      specifically to mitigate L1TF, so explicitly exempt them from the
      inversion when reading the PFN.
      
      Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
      on a VMA that has VM_PFNMAP and was mmap'd to as something other than
      PROT_NONE but never used. mprotect() sends the PROT_NONE request down
      prot_none_walk(), which walks the PTEs to check the PFNs.
      prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
      -EACCES because it thinks mprotect() is trying to adjust a high MMIO
      address.
      
      [ This is a very modified version of Sean's original patch, but all
        credit goes to Sean for doing this and also pointing out that
        sometimes the __pte_needs_invert() function only gets the protection
        bits, not the full eventual pte.  But zero remains special even in
        just protection bits, so that's ok.   - Linus ]
      
      Fixes: f22cc87f ("x86/speculation/l1tf: Invert all not present mappings")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Acked-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f19f5c49
  11. Aug 15, 2018
  12. Aug 10, 2018
    • Joerg Roedel's avatar
      x86/mm/pti: Move user W+X check into pti_finalize() · d878efce
      Joerg Roedel authored
      
      
      The user page-table gets the updated kernel mappings in pti_finalize(),
      which runs after the RO+X permissions got applied to the kernel page-table
      in mark_readonly().
      
      But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in
      mark_readonly() for insecure mappings.  This causes false-positive
      warnings, because the user page-table did not get the updated mappings yet.
      
      Move the W+X check for the user page-table into pti_finalize() after it
      updated all required mappings.
      
      [ tglx: Folded !NX supported fix ]
      
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: linux-mm@kvack.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Waiman Long <llong@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
      Cc: joro@8bytes.org
      Link: https://lkml.kernel.org/r/1533727000-9172-1-git-send-email-joro@8bytes.org
      d878efce
  13. Aug 08, 2018
  14. Aug 07, 2018
  15. Aug 06, 2018
    • Dave Hansen's avatar
      x86/mm/init: Remove freed kernel image areas from alias mapping · c40a56a7
      Dave Hansen authored
      
      
      The kernel image is mapped into two places in the virtual address space
      (addresses without KASLR, of course):
      
      	1. The kernel direct map (0xffff880000000000)
      	2. The "high kernel map" (0xffffffff81000000)
      
      We actually execute out of #2.  If we get the address of a kernel symbol,
      it points to #2, but almost all physical-to-virtual translations point to
      
      Parts of the "high kernel map" alias are mapped in the userspace page
      tables with the Global bit for performance reasons.  The parts that we map
      to userspace do not (er, should not) have secrets. When PTI is enabled then
      the global bit is usually not set in the high mapping and just used to
      compensate for poor performance on systems which lack PCID.
      
      This is fine, except that some areas in the kernel image that are adjacent
      to the non-secret-containing areas are unused holes.  We free these holes
      back into the normal page allocator and reuse them as normal kernel memory.
      The memory will, of course, get *used* via the normal map, but the alias
      mapping is kept.
      
      This otherwise unused alias mapping of the holes will, by default keep the
      Global bit, be mapped out to userspace, and be vulnerable to Meltdown.
      
      Remove the alias mapping of these pages entirely.  This is likely to
      fracture the 2M page mapping the kernel image near these areas, but this
      should affect a minority of the area.
      
      The pageattr code changes *all* aliases mapping the physical pages that it
      operates on (by default).  We only want to modify a single alias, so we
      need to tweak its behavior.
      
      This unmapping behavior is currently dependent on PTI being in place.
      Going forward, we should at least consider doing this for all
      configurations.  Having an extra read-write alias for memory is not exactly
      ideal for debugging things like random memory corruption and this does
      undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page
      Frame Ownership (XPFO).
      
      Before this patch:
      
      current_kernel:---[ High Kernel Mapping ]---
      current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
      current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
      current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
      current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
      current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
      current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
      current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW                     NX pte
      current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW         PSE         NX pmd
      current_kernel-0xffffffff83200000-0xffffffffa0000000         462M                               pmd
      
        current_user:---[ High Kernel Mapping ]---
        current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
        current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
        current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
        current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW                     NX pte
        current_user-0xffffffff82000000-0xffffffff82600000           6M     ro         PSE     GLB NX pmd
        current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
      
      After this patch:
      
      current_kernel:---[ High Kernel Mapping ]---
      current_kernel-0xffffffff80000000-0xffffffff81000000          16M                               pmd
      current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
      current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
      current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
      current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
      current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
      current_kernel-0xffffffff82488000-0xffffffff82600000        1504K                               pte
      current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW         PSE         NX pmd
      current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW                     NX pte
      current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K                               pte
      
        current_user:---[ High Kernel Mapping ]---
        current_user-0xffffffff80000000-0xffffffff81000000          16M                               pmd
        current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
        current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro                 GLB x  pte
        current_user-0xffffffff81e11000-0xffffffff82000000        1980K                               pte
        current_user-0xffffffff82000000-0xffffffff82400000           4M     ro         PSE     GLB NX pmd
        current_user-0xffffffff82400000-0xffffffff82488000         544K     ro                     NX pte
        current_user-0xffffffff82488000-0xffffffff82600000        1504K                               pte
        current_user-0xffffffff82600000-0xffffffffa0000000         474M                               pmd
      
      [ tglx: Do not unmap on 32bit as there is only one mapping ]
      
      Fixes: 0f561fce ("x86/pti: Enable global pages for shared areas")
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com
      c40a56a7
    • Wanpeng Li's avatar
      KVM: X86: Implement "send IPI" hypercall · 4180bf1b
      Wanpeng Li authored
      Using hypercall to send IPIs by one vmexit instead of one by one for
      xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
      mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
      is enabled in qemu, however, latest AMD EPYC still just supports xapic
      mode which can get great improvement by Exit-less IPIs. This patchset
      lets a guest send multicast IPIs, with at most 128 destinations per
      hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.
      
      Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
      is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141
      
      ):
      
      x2apic cluster mode, vanilla
      
       Dry-run:                         0,            2392199 ns
       Self-IPI:                  6907514,           15027589 ns
       Normal IPI:              223910476,          251301666 ns
       Broadcast IPI:                   0,         9282161150 ns
       Broadcast lock:                  0,         8812934104 ns
      
      x2apic cluster mode, pv-ipi
      
       Dry-run:                         0,            2449341 ns
       Self-IPI:                  6720360,           15028732 ns
       Normal IPI:              228643307,          255708477 ns
       Broadcast IPI:                   0,         7572293590 ns  => 22% performance boost
       Broadcast lock:                  0,         8316124651 ns
      
      x2apic physical mode, vanilla
      
       Dry-run:                         0,            3135933 ns
       Self-IPI:                  8572670,           17901757 ns
       Normal IPI:              226444334,          255421709 ns
       Broadcast IPI:                   0,        19845070887 ns
       Broadcast lock:                  0,        19827383656 ns
      
      x2apic physical mode, pv-ipi
      
       Dry-run:                         0,            2446381 ns
       Self-IPI:                  6788217,           15021056 ns
       Normal IPI:              219454441,          249583458 ns
       Broadcast IPI:                   0,         7806540019 ns  => 154% performance boost
       Broadcast lock:                  0,         9143618799 ns
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4180bf1b
    • Tianyu Lan's avatar
      KVM: x86: Add tlb remote flush callback in kvm_x86_ops. · b08660e5
      Tianyu Lan authored
      
      
      This patch is to provide a way for platforms to register hv tlb remote
      flush callback and this helps to optimize operation of tlb flush
      among vcpus for nested virtualization case.
      
      Signed-off-by: default avatarLan Tianyu <Tianyu.Lan@microsoft.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b08660e5
    • Tianyu Lan's avatar
      X86/Hyper-V: Add hyperv_nested_flush_guest_mapping ftrace support · 60cfce4c
      Tianyu Lan authored
      
      
      This patch is to add hyperv_nested_flush_guest_mapping support to trace
      hvFlushGuestPhysicalAddressSpace hypercall.
      
      Signed-off-by: default avatarLan Tianyu <Tianyu.Lan@microsoft.com>
      Acked-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      60cfce4c
    • Tianyu Lan's avatar
      X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support · eb914cfe
      Tianyu Lan authored
      
      
      Hyper-V supports a pv hypercall HvFlushGuestPhysicalAddressSpace to
      flush nested VM address space mapping in l1 hypervisor and it's to
      reduce overhead of flushing ept tlb among vcpus. This patch is to
      implement it.
      
      Signed-off-by: default avatarLan Tianyu <Tianyu.Lan@microsoft.com>
      Acked-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eb914cfe
    • Junaid Shahid's avatar
      kvm: x86: Remove CR3_PCID_INVD flag · 208320ba
      Junaid Shahid authored
      
      
      It is a duplicate of X86_CR3_PCID_NOFLUSH. So just use that instead.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      208320ba
    • Junaid Shahid's avatar
      kvm: x86: Add multi-entry LRU cache for previous CR3s · b94742c9
      Junaid Shahid authored
      
      
      Adds support for storing multiple previous CR3/root_hpa pairs maintained
      as an LRU cache, so that the lockless CR3 switch path can be used when
      switching back to any of them.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b94742c9
    • Junaid Shahid's avatar
      kvm: x86: Flush only affected TLB entries in kvm_mmu_invlpg* · faff8758
      Junaid Shahid authored
      
      
      This needs a minor bug fix. The updated patch is as follows.
      
      Thanks,
      Junaid
      
      ------------------------------------------------------------------------------
      
      kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB
      entries for the specific guest virtual address, instead of flushing all
      TLB entries associated with the VM.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      faff8758
    • Junaid Shahid's avatar
      kvm: x86: Support selectively freeing either current or previous MMU root · 08fb59d8
      Junaid Shahid authored
      
      
      kvm_mmu_free_roots() now takes a mask specifying which roots to free, so
      that either one of the roots (active/previous) can be individually freed
      when needed.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      08fb59d8
    • Junaid Shahid's avatar
      kvm: x86: Add a root_hpa parameter to kvm_mmu->invlpg() · 7eb77e9f
      Junaid Shahid authored
      
      
      This allows invlpg() to be called using either the active root_hpa
      or the prev_root_hpa.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7eb77e9f
    • Junaid Shahid's avatar
      kvm: x86: Skip TLB flush on fast CR3 switch when indicated by guest · ade61e28
      Junaid Shahid authored
      
      
      When PCIDs are enabled, the MSb of the source operand for a MOV-to-CR3
      instruction indicates that the TLB doesn't need to be flushed.
      
      This change enables this optimization for MOV-to-CR3s in the guest
      that have been intercepted by KVM for shadow paging and are handled
      within the fast CR3 switch path.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ade61e28
    • Junaid Shahid's avatar
      kvm: vmx: Support INVPCID in shadow paging mode · eb4b248e
      Junaid Shahid authored
      
      
      Implement support for INVPCID in shadow paging mode as well.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      eb4b248e
    • Junaid Shahid's avatar
      kvm: x86: Introduce KVM_REQ_LOAD_CR3 · 6e42782f
      Junaid Shahid authored
      
      
      The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the
      current root_hpa.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6e42782f
    • Junaid Shahid's avatar
      kvm: x86: Add fast CR3 switch code path · 7c390d35
      Junaid Shahid authored
      
      
      When using shadow paging, a CR3 switch in the guest results in a VM Exit.
      In the common case, that VM exit doesn't require much processing by KVM.
      However, it does acquire the MMU lock, which can start showing signs of
      contention under some workloads even on a 2 VCPU VM when the guest is
      using KPTI. Therefore, we add a fast path that avoids acquiring the MMU
      lock in the most common cases e.g. when switching back and forth between
      the kernel and user mode CR3s used by KPTI with no guest page table
      changes in between.
      
      For now, this fast path is implemented only for 64-bit guests and hosts
      to avoid the handling of PDPTEs, but it can be extended later to 32-bit
      guests and/or hosts as well.
      
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7c390d35
Loading