Skip to content
  1. Mar 17, 2021
  2. Mar 03, 2021
  3. Mar 02, 2021
    • Babu Moger's avatar
      KVM: SVM: Clear the CR4 register on reset · 9e46f6c6
      Babu Moger authored
      
      
      This problem was reported on a SVM guest while executing kexec.
      Kexec fails to load the new kernel when the PCID feature is enabled.
      
      When kexec starts loading the new kernel, it starts the process by
      resetting the vCPU's and then bringing each vCPU online one by one.
      The vCPU reset is supposed to reset all the register states before the
      vCPUs are brought online. However, the CR4 register is not reset during
      this process. If this register is already setup during the last boot,
      all the flags can remain intact. The X86_CR4_PCIDE bit can only be
      enabled in long mode. So, it must be enabled much later in SMP
      initialization.  Having the X86_CR4_PCIDE bit set during SMP boot can
      cause a boot failures.
      
      Fix the issue by resetting the CR4 register in init_vmcb().
      
      Signed-off-by: default avatarBabu Moger <babu.moger@amd.com>
      Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      9e46f6c6
    • David Woodhouse's avatar
      KVM: x86/xen: Add support for vCPU runstate information · 30b5c851
      David Woodhouse authored
      
      
      This is how Xen guests do steal time accounting. The hypervisor records
      the amount of time spent in each of running/runnable/blocked/offline
      states.
      
      In the Xen accounting, a vCPU is still in state RUNSTATE_running while
      in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules
      does the state become RUNSTATE_blocked. In KVM this means that even when
      the vCPU exits the kvm_run loop, the state remains RUNSTATE_running.
      
      The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the
      KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use
      KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given
      amount of time to the blocked state and subtract it from the running
      state.
      
      The state_entry_time corresponds to get_kvmclock_ns() at the time the
      vCPU entered the current state, and the total times of all four states
      should always add up to state_entry_time.
      
      Co-developed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20210301125309.874953-2-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      30b5c851
    • David Woodhouse's avatar
      KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info · 7d7c5f76
      David Woodhouse authored
      
      
      When clearing the per-vCPU shared regions, set the return value to zero
      to indicate success. This was causing spurious errors to be returned to
      userspace on soft reset.
      
      Also add a paranoid BUILD_BUG_ON() for compat structure compatibility.
      
      Fixes: 0c165b3c ("KVM: x86/xen: Allow reset of Xen attributes")
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20210301125309.874953-1-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7d7c5f76
    • Paolo Bonzini's avatar
      KVM: x86: allow compiling out the Xen hypercall interface · b59b153d
      Paolo Bonzini authored
      
      
      The Xen hypercall interface adds to the attack surface of the hypervisor
      and will be used quite rarely.  Allow compiling it out.
      
      Suggested-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b59b153d
  4. Feb 26, 2021
  5. Feb 25, 2021
    • Sean Christopherson's avatar
      KVM: SVM: Fix nested VM-Exit on #GP interception handling · 2df8d380
      Sean Christopherson authored
      
      
      Fix the interpreation of nested_svm_vmexit()'s return value when
      synthesizing a nested VM-Exit after intercepting an SVM instruction while
      L2 was running.  The helper returns '0' on success, whereas a return
      value of '0' in the exit handler path means "exit to userspace".  The
      incorrect return value causes KVM to exit to userspace without filling
      the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0".
      
      Fixes: 14c2bf81 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210224005627.657028-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2df8d380
  6. Feb 23, 2021
  7. Feb 22, 2021
    • David Stevens's avatar
      KVM: x86/mmu: Consider the hva in mmu_notifier retry · 4a42d848
      David Stevens authored
      
      
      Track the range being invalidated by mmu_notifier and skip page fault
      retries if the fault address is not affected by the in-progress
      invalidation. Handle concurrent invalidations by finding the minimal
      range which includes all ranges being invalidated. Although the combined
      range may include unrelated addresses and cannot be shrunk as individual
      invalidation operations complete, it is unlikely the marginal gains of
      proper range tracking are worth the additional complexity.
      
      The primary benefit of this change is the reduction in the likelihood of
      extreme latency when handing a page fault due to another thread having
      been preempted while modifying host virtual addresses.
      
      Signed-off-by: default avatarDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210222024522.1751719-3-stevensd@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4a42d848
    • Sean Christopherson's avatar
      KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault · 5f8a7cf2
      Sean Christopherson authored
      
      
      Don't retry a page fault due to an mmu_notifier invalidation when
      handling a page fault for a GPA that did not resolve to a memslot, i.e.
      an MMIO page fault.  Invalidations from the mmu_notifier signal a change
      in a host virtual address (HVA) mapping; without a memslot, there is no
      HVA and thus no possibility that the invalidation is relevant to the
      page fault being handled.
      
      Note, the MMIO vs. memslot generation checks handle the case where a
      pending memslot will create a memslot overlapping the faulting GPA.  The
      mmu_notifier checks are orthogonal to memslot updates.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210222024522.1751719-2-stevensd@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5f8a7cf2
    • Paolo Bonzini's avatar
      KVM: nSVM: prepare guest save area while is_guest_mode is true · d2df592f
      Paolo Bonzini authored
      Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and
      nested_prepare_vmcb_control.  This results in is_guest_mode being false
      until the end of nested_prepare_vmcb_control.
      
      This is a problem because nested_prepare_vmcb_save can in turn cause
      changes to the intercepts and these have to be applied to the "host VMCB"
      (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts
      into svm->vmcb.
      
      In particular, without this change we forget to set the CR0 read and CR0
      write intercepts when running a real mode L2 guest with NPT disabled.
      The guest is therefore able to see the CR0.PG bit that KVM sets to
      enable "paged real mode".  This patch fixes the svm.flat mode_switch
      test case with npt=0.  There are no other problematic calls in
      nested_prepare_vmcb_save.
      
      Moving is_guest_mode to the end is done since commit 06fc7772
      ("KVM: SVM: Activate nested state only when guest state is complete",
      2010-04-25).  However, back then KVM didn't grab a different VMCB
      when updating the intercepts, it had already copied/merged L1's stuff
      to L0's VMCB, and then updated L0's VMCB regardless of is_nested().
      Later recalc_intercepts was introduced in commit 384c6368
      ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12).
      This introduced the bug, because recalc_intercepts now throws away
      the intercept manipulations that svm_set_cr0 had done in the meanwhile
      to svm->vmcb.
      
      [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/
      
      
      
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Tested-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d2df592f
    • Jens Axboe's avatar
      arch: setup PF_IO_WORKER threads like PF_KTHREAD · 4727dc20
      Jens Axboe authored
      
      
      PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the
      sense that we don't assign ->set_child_tid with our own structure. Just
      ensure that every arch sets up the PF_IO_WORKER threads like kthreads
      in the arch implementation of copy_thread().
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4727dc20
  8. Feb 21, 2021
  9. Feb 19, 2021
Loading