Skip to content
  1. May 24, 2019
  2. May 15, 2019
    • Sean Christopherson's avatar
      Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" · f93f7ede
      Sean Christopherson authored
      
      
      The RDPMC-exiting control is dependent on the existence of the RDPMC
      instruction itself, i.e. is not tied to the "Architectural Performance
      Monitoring" feature.  For all intents and purposes, the control exists
      on all CPUs with VMX support since RDPMC also exists on all VCPUs with
      VMX supported.  Per Intel's SDM:
      
        The RDPMC instruction was introduced into the IA-32 Architecture in
        the Pentium Pro processor and the Pentium processor with MMX technology.
        The earlier Pentium processors have performance-monitoring counters, but
        they must be read with the RDMSR instruction.
      
      Because RDPMC-exiting always exists, KVM requires the control and refuses
      to load if it's not available.  As a result, hiding the PMU from a guest
      breaks nested virtualization if the guest attemts to use KVM.
      
      While it's not explicitly stated in the RDPMC pseudocode, the VM-Exit
      check for RDPMC-exiting follows standard fault vs. VM-Exit prioritization
      for privileged instructions, e.g. occurs after the CPL/CR0.PE/CR4.PCE
      checks, but before the counter referenced in ECX is checked for validity.
      
      In other words, the original KVM behavior of injecting a #GP was correct,
      and the KVM unit test needs to be adjusted accordingly, e.g. eat the #GP
      when the unit test guest (L3 in this case) executes RDPMC without
      RDPMC-exiting set in the unit test host (L2).
      
      This reverts commit e51bfdb6.
      
      Fixes: e51bfdb6 ("KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU")
      Reported-by: default avatarDavid Hill <hilld@binarystorm.net>
      Cc: Saar Amar <saaramar@microsoft.com>
      Cc: Mihai Carabas <mihai.carabas@oracle.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f93f7ede
    • Sean Christopherson's avatar
      KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible · d69129b4
      Sean Christopherson authored
      
      
      If L1 is using an MSR bitmap, unconditionally merge the MSR bitmaps from
      L0 and L1 for MSR_{KERNEL,}_{FS,GS}_BASE.  KVM unconditionally exposes
      MSRs L1.  If KVM is also running in L1 then it's highly likely L1 is
      also exposing the MSRs to L2, i.e. KVM doesn't need to intercept L2
      accesses.
      
      Based on code from Jintack Lim.
      
      Cc: Jintack Lim <jintack@xxxxxxxxxxxxxxx>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@xxxxxxxxx>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d69129b4
  3. May 08, 2019
  4. Apr 30, 2019
  5. Apr 27, 2019
    • Rick Edgecombe's avatar
      KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit · f2fde6a5
      Rick Edgecombe authored
      
      
      The not-so-recent change to move VMX's VM-Exit handing to a dedicated
      "function" unintentionally exposed KVM to a speculative attack from the
      guest by executing a RET prior to stuffing the RSB.  Make RSB stuffing
      happen immediately after VM-Exit, before any unpaired returns.
      
      Alternatively, the VM-Exit path could postpone full RSB stuffing until
      its current location by stuffing the RSB only as needed, or by avoiding
      returns in the VM-Exit path entirely, but both alternatives are beyond
      ugly since vmx_vmexit() has multiple indirect callers (by way of
      vmx_vmenter()).  And putting the RSB stuffing immediately after VM-Exit
      makes it much less likely to be re-broken in the future.
      
      Note, the cost of PUSH/POP could be avoided in the normal flow by
      pairing the PUSH RAX with the POP RAX in __vmx_vcpu_run() and adding an
      a POP to nested_vmx_check_vmentry_hw(), but such a weird/subtle
      dependency is likely to cause problems in the long run, and PUSH/POP
      will take all of a few cycles, which is peanuts compared to the number
      of cycles required to fill the RSB.
      
      Fixes: 453eafbe ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
      Reported-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Co-developed-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f2fde6a5
  6. Apr 18, 2019
    • Sean Christopherson's avatar
      KVM: lapic: Track lapic timer advance per vCPU · 39497d76
      Sean Christopherson authored
      
      
      Automatically adjusting the globally-shared timer advancement could
      corrupt the timer, e.g. if multiple vCPUs are concurrently adjusting
      the advancement value.  That could be partially fixed by using a local
      variable for the arithmetic, but it would still be susceptible to a
      race when setting timer_advance_adjust_done.
      
      And because virtual_tsc_khz and tsc_scaling_ratio are per-vCPU, the
      correct calibration for a given vCPU may not apply to all vCPUs.
      
      Furthermore, lapic_timer_advance_ns is marked __read_mostly, which is
      effectively violated when finding a stable advancement takes an extended
      amount of timer.
      
      Opportunistically change the definition of lapic_timer_advance_ns to
      a u32 so that it matches the style of struct kvm_timer.  Explicitly
      pass the param to kvm_create_lapic() so that it doesn't have to be
      exposed to lapic.c, thus reducing the probability of unintentionally
      using the global value instead of the per-vCPU value.
      
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Reviewed-by: default avatarLiran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Fixes: 3b8a5df6 ("KVM: LAPIC: Tune lapic_timer_advance_ns automatically")
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      39497d76
  7. Apr 16, 2019
Loading