- Mar 28, 2019
-
-
Paolo Bonzini authored
The documentation does not mention how to delete a slot, add the information. Reported-by:
Nathaniel McCallum <npmccallum@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
The series to add memcg accounting to KVM allocations[1] states: There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. While it is correct to account KVM kernel allocations to the cgroup of the process that created the VM, it's technically incorrect to state that the KVM kernel memory allocations are tied to the life of the VM process. This is because the VM itself, i.e. struct kvm, is not tied to the life of the process which created it, rather it is tied to the life of its associated file descriptor. In other words, kvm_destroy_vm() is not invoked until fput() decrements its associated file's refcount to zero. A simple example is to fork() in Qemu and have the child sleep indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file descriptor *and* the rogue child is killed. The allocations are guaranteed to be *accounted* to the process which created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep the VM "alive" but can't do anything useful with its reference. Note that because 'struct kvm' also holds a reference to the mm_struct of its owner, the above behavior also applies to userspace allocations. Given that mucking with a VM's file descriptor can lead to subtle and undesirable behavior, e.g. memcg charges persisting after a VM is shut down, explicitly document a VM's lifecycle and its impact on the VM's resources. Alternatively, KVM could aggressively free resources when the creating process exits, e.g. via mmu_notifier->release(). However, mmu_notifier isn't guaranteed to be available, and freeing resources when the creator exits is likely to be error prone and fragile as KVM would need to ensure that it only freed resources that are truly out of reach. In practice, the existing behavior shouldn't be problematic as a properly configured system will prevent a child process from being moved out of the appropriate cgroup hierarchy, i.e. prevent hiding the process from the OOM killer, and will prevent an unprivileged user from being able to to hold a reference to struct kvm via another method, e.g. debugfs. [1]https://patchwork.kernel.org/patch/10806707/ Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
KVM's API requires thats ioctls must be issued from the same process that created the VM. In other words, userspace can play games with a VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the creator can do anything useful. Explicitly reject device ioctls that are issued by a process other than the VM's creator, and update KVM's API documentation to extend its requirements to device ioctls. Fixes: 852b6d57 ("kvm: add device control API") Cc: <stable@vger.kernel.org> Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Per Paolo[1], instantiating multiple VMs in a single process is legal; but this conflicts with KVM's API documentation, which states: The only supported use is one virtual machine per process, and one vcpu per thread. However, an earlier section in the documentation states: Only run VM ioctls from the same process (address space) that was used to create the VM. and: Only run vcpu ioctls from the same thread that was used to create the vcpu. This suggests that the conflicting documentation is simply an incorrect ordering of of words, i.e. what's really meant is that a virtual machine can't be shared across multiple processes and a vCPU can't be shared across multiple threads. Tweak the blurb on issuing ioctls to use a more assertive tone, and rewrite the "supported use" sentence to reference said blurb instead of poorly restating it in different terms. Opportunistically add missing punctuation. [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com Fixes: 9c1b96e3 ("KVM: Document basic API") Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> [Improve notes on asynchronous ioctl] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
The cr4_pae flag is a bit of a misnomer, its purpose is really to track whether the guest PTE that is being shadowed is a 4-byte entry or an 8-byte entry. Prior to supporting nested EPT, the size of the gpte was reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes, but it was mostly harmless since the size of the gpte never mattered. Now that a spte may be tracking an indirect EPT entry, relying on CR4.PAE is wrong and ill-named. For direct shadow pages, force the gpte_size to '1' as they are always 8-byte entries; EPT entries can only be 8-bytes and KVM always uses 8-byte entries for NPT and its identity map (when running with EPT but not unrestricted guest). Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a unique scenario as the size of the entries are not dictated by CR4.PAE, but neither is the shadow page a direct map. To handle this scenario, set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination, to denote a nested EPT shadow page. Use the information to avoid incorrectly zapping an unsync'd indirect page in __kvm_sync_page(). Providing a consistent and accurate gpte_size fixes a bug reported by Vitaly where fast_cr3_switch() always fails when switching from L2 to L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages, whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE. Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Reported-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Mar 15, 2019
-
-
Sean Christopherson authored
The series to add memcg accounting to KVM allocations[1] states: There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. While it is correct to account KVM kernel allocations to the cgroup of the process that created the VM, it's technically incorrect to state that the KVM kernel memory allocations are tied to the life of the VM process. This is because the VM itself, i.e. struct kvm, is not tied to the life of the process which created it, rather it is tied to the life of its associated file descriptor. In other words, kvm_destroy_vm() is not invoked until fput() decrements its associated file's refcount to zero. A simple example is to fork() in Qemu and have the child sleep indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file descriptor *and* the rogue child is killed. The allocations are guaranteed to be *accounted* to the process which created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep the VM "alive" but can't do anything useful with its reference. Note that because 'struct kvm' also holds a reference to the mm_struct of its owner, the above behavior also applies to userspace allocations. Given that mucking with a VM's file descriptor can lead to subtle and undesirable behavior, e.g. memcg charges persisting after a VM is shut down, explicitly document a VM's lifecycle and its impact on the VM's resources. Alternatively, KVM could aggressively free resources when the creating process exits, e.g. via mmu_notifier->release(). However, mmu_notifier isn't guaranteed to be available, and freeing resources when the creator exits is likely to be error prone and fragile as KVM would need to ensure that it only freed resources that are truly out of reach. In practice, the existing behavior shouldn't be problematic as a properly configured system will prevent a child process from being moved out of the appropriate cgroup hierarchy, i.e. prevent hiding the process from the OOM killer, and will prevent an unprivileged user from being able to to hold a reference to struct kvm via another method, e.g. debugfs. [1]https://patchwork.kernel.org/patch/10806707/ Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Mar 06, 2019
-
-
Cornelia Huck authored
If something goes wrong in the kvm io bus handling, the virtio-ccw diagnose may return a negative error value in the cookie gpr. Document this. Reviewed-by:
Halil Pasic <pasic@linux.ibm.com> Signed-off-by:
Cornelia Huck <cohuck@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- Feb 20, 2019
-
-
Nir Weiner authored
The hard-coded value 10000 in grow_halt_poll_ns() stands for the initial start value when raising up vcpu->halt_poll_ns. It actually sets the first timeout to the first polling session. This value has significant effect on how tolerant we are to outliers. On the standard case, higher value is better - we will spend more time in the polling busyloop, handle events/interrupts faster and result in better performance. But on outliers it puts us in a busy loop that does nothing. Even if the shrink factor is zero, we will still waste time on the first iteration. The optimal value changes between different workloads. It depends on outliers rate and polling sessions length. As this value has significant effect on the dynamic halt-polling algorithm, it should be configurable and exposed. Reviewed-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by:
Liran Alon <liran.alon@oracle.com> Signed-off-by:
Nir Weiner <nir.weiner@oracle.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Remove x86 KVM's fast invalidate mechanism, i.e. revert all patches from the original series[1]. Though not explicitly stated, for all intents and purposes the fast invalidate mechanism was added to speed up the scenario where removing a memslot, e.g. as part of accessing reading PCI ROM, caused KVM to flush all shadow entries[1]. Now that the memslot case flushes only shadow entries belonging to the memslot, i.e. doesn't use the fast invalidate mechanism, the only remaining usage of the mechanism are when the VM is being destroyed and when the MMIO generation rolls over. When a VM is being destroyed, either there are no active vcpus, i.e. there's no lock contention, or the VM has ungracefully terminated, in which case we want to reclaim its pages as quickly as possible, i.e. not release the MMU lock if there are still CPUs executing in the VM. The MMIO generation scenario is almost literally a one-in-a-million occurrence, i.e. is not a performance sensitive scenario. Given that lock-breaking is not desirable (VM teardown) or irrelevant (MMIO generation overflow), remove the fast invalidate mechanism to simplify the code (a small amount) and to discourage future code from zapping all pages as using such a big hammer should be a last restort. This reverts commit f6f8adee. [1] https://lkml.kernel.org/r/1369960590-14138-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com Cc: Xiao Guangrong <guangrong.xiao@gmail.com> Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
...now that KVM won't explode by moving it out of bit 0. Using bit 63 eliminates the need to jump over bit 0, e.g. when calculating a new memslots generation or when propagating the memslots generation to an MMIO spte. Signed-off-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 11, 2019
-
-
Christophe de Dinechin authored
The URL of [api-spec] in Documentation/virtual/kvm/amd-memory-encryption.rst is no longer valid, replaced space with underscore. Signed-off-by:
Christophe de Dinechin <dinechin@redhat.com> Reviewed-by:
Brijesh Singh <brijesh.singh@amd.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- Dec 14, 2018
-
-
Vitaly Kuznetsov authored
With every new Hyper-V Enlightenment we implement we're forced to add a KVM_CAP_HYPERV_* capability. While this approach works it is fairly inconvenient: the majority of the enlightenments we do have corresponding CPUID feature bit(s) and userspace has to know this anyways to be able to expose the feature to the guest. Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one cap to rule them all!") returning all Hyper-V CPUID feature leaves. Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible: Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000, 0x40000001) and we would probably confuse userspace in case we decide to return these twice. KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead. Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
There are two problems with KVM_GET_DIRTY_LOG. First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases. The latter is due to a benign race like this: 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects them. 2. The guest modifies the pages, causing them to be marked ditry. 3. Userspace actually copies the pages. 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though they were not written to since (3). This is especially a problem for large guests, where the time between (1) and (3) can be substantial. This patch introduces a new capability which, when enabled, makes KVM_GET_DIRTY_LOG not write-protect the pages it returns. Instead, userspace has to explicitly clear the dirty log bits just before using the content of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a 64-page granularity rather than requiring to sync a full memslot; this way, the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The first such capability to be handled in virt/kvm/ will be manual dirty page reprotection. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 17, 2018
-
-
Jim Mattson authored
This is a per-VM capability which can be enabled by userspace so that the faulting linear address will be included with the information about a pending #PF in L2, and the "new DR6 bits" will be included with the information about a pending #DB in L2. With this capability enabled, the L1 hypervisor can now intercept #PF before CR2 is modified. Under VMX, the L1 hypervisor can now intercept #DB before DR6 and DR7 are modified. When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should generally provide an appropriate payload when injecting a #PF or #DB exception via KVM_SET_VCPU_EVENTS. However, to support restoring old checkpoints, this payload is not required. Note that bit 16 of the "new DR6 bits" is set to indicate that a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. This is the reverse of DR6.RTM, which is cleared in this scenario. This capability also enables exception.pending in struct kvm_vcpu_events, which allows userspace to distinguish between pending and injected exceptions. Reported-by:
Jim Mattson <jmattson@google.com> Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a later commit) adds the following fields to struct kvm_vcpu_events: exception_has_payload, exception_payload, and exception.pending. With this capability set, all of the details of vcpu->arch.exception, including the payload for a pending exception, are reported to userspace in response to KVM_GET_VCPU_EVENTS. With this capability clear, the original ABI is preserved, and the exception.injected field is set for either pending or injected exceptions. When userspace calls KVM_SET_VCPU_EVENTS with KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer translated to exception.pending. KVM_SET_VCPU_EVENTS can now only establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set. Reported-by:
Jim Mattson <jmattson@google.com> Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 16, 2018
-
-
Jim Mattson authored
The header file indicates that there are 36 reserved bytes at the end of this structure. Adjust the documentation to agree with the header file. Signed-off-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
Coalesced pio is based on coalesced mmio and can be used for some port like rtc port, pci-host config port and so on. Specially in case of rtc as coalesced pio, some versions of windows guest access rtc frequently because of rtc as system tick. guest access rtc like this: write register index to 0x70, then write or read data from 0x71. writing 0x70 port is just as index and do nothing else. So we can use coalesced pio to handle this scene to reduce VM-EXIT time. When starting and closing a virtual machine, it will access pci-host config port frequently. So setting these port as coalesced pio can reduce startup and shutdown time. without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8 using perf tools: (guest OS : windows 7 64bit) IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 86 30.99% 74.59% 9us 29us 10.75us (+- 3.41%) 0xcf8:POUT 1119 2.60% 2.12% 2.79us 56.83us 3.41us (+- 2.23%) with my patch IO Port Access Samples Samples% Time% Min Time Max Time Avg time 0x70:POUT 106 32.02% 29.47% 0us 10us 1.57us (+- 7.38%) 0xcf8:POUT 1065 1.67% 0.28% 0.41us 65.44us 0.66us (+- 10.55%) Signed-off-by:
Peng Hao <peng.hao2@zte.com.cn> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Peng Hao authored
Signed-off-by:
Peng Hao <peng.hao2@zte.com.cn> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Vitaly Kuznetsov authored
Using hypercall for sending IPIs is faster because this allows to specify any number of vCPUs (even > 64 with sparse CPU set), the whole procedure will take only one VMEXIT. Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi hypercall can't be 'fast' (passing parameters through registers) but apparently this is not true, Windows always uses it as 'fast' so we need to support that. Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 09, 2018
-
-
Paul Mackerras authored
This adds a KVM_PPC_NO_HASH flag to the flags field of the kvm_ppc_smmu_info struct, and arranges for it to be set when running as a nested hypervisor, as an unambiguous indication to userspace that HPT guests are not supported. Reporting the KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as indicating only that the new HPT features in ISA V3.0 are not supported, leaving it ambiguous whether pre-V3.0 HPT features are supported. Reviewed-by:
David Gibson <david@gibson.dropbear.id.au> Signed-off-by:
Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
With this, userspace can enable a KVM-HV guest to run nested guests under it. The administrator can control whether any nested guests can be run; setting the "nested" module parameter to false prevents any guests becoming nested hypervisors (that is, any attempt to enable the nested capability on a guest will fail). Guests which are already nested hypervisors will continue to be so. Reviewed-by:
David Gibson <david@gibson.dropbear.id.au> Signed-off-by:
Paul Mackerras <paulus@ozlabs.org>
-
Paul Mackerras authored
This adds a one-reg register identifier which can be used to read and set the virtual PTCR for the guest. This register identifies the address and size of the virtual partition table for the guest, which contains information about the nested guests under this guest. Migrating this value is the only extra requirement for migrating a guest which has nested guests (assuming of course that the destination host supports nested virtualization in the kvm-hv module). Reviewed-by:
David Gibson <david@gibson.dropbear.id.au> Signed-off-by:
Paul Mackerras <paulus@ozlabs.org> Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- Oct 03, 2018
-
-
Suzuki K Poulose authored
Allow specifying the physical address size limit for a new VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in bits[7:0] of the type field. For backward compatibility the value 0 is reserved and implies 40bits. Also, lift the limit of the IPA to host limit and allow lower IPA sizes (e.g, 32). The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE for the availability of this feature. The cap check returns the maximum limit for the physical address shift supported by the host. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Sep 19, 2018
-
-
Drew Schmitt authored
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by:
Drew Schmitt <dasch@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Sep 12, 2018
-
-
Janosch Frank authored
We currently do not notify all gmaps when using gmap_pmdp_xchg(), due to locking constraints. This makes ucontrol VMs, which is the only VM type that creates multiple gmaps, incompatible with huge pages. Also we would need to hold the guest_table_lock of all gmaps that have this vmaddr maped to synchronize access to the pmd. ucontrol VMs are rather exotic and creating a new locking concept is no easy task. Hence we return EINVAL when trying to active KVM_CAP_S390_HPAGE_1M and report it as being not available when checking for it. Fixes: a4499382 ("KVM: s390: Add huge page enablement control") Signed-off-by:
Janosch Frank <frankja@linux.ibm.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20180801112508.138159-1-frankja@linux.ibm.com> Signed-off-by:
Janosch Frank <frankja@linux.ibm.com>
-
- Sep 09, 2018
-
-
Henrik Austad authored
This is a respin with a wider audience (all that get_maintainer returned) and I know this spams a *lot* of people. Not sure what would be the correct way, so my apologies for ruining your inbox. The 00-INDEX files are supposed to give a summary of all files present in a directory, but these files are horribly out of date and their usefulness is brought into question. Often a simple "ls" would reveal the same information as the filenames are generally quite descriptive as a short introduction to what the file covers (it should not surprise anyone what Documentation/sched/sched-design-CFS.txt covers) A few years back it was mentioned that these files were no longer really needed, and they have since then grown further out of date, so perhaps it is time to just throw them out. A short status yields the following _outdated_ 00-INDEX files, first counter is files listed in 00-INDEX but missing in the directory, last is files present but not listed in 00-INDEX. List of outdated 00-INDEX: Documentation: (4/10) Documentation/sysctl: (0/1) Documentation/timers: (1/0) Documentation/blockdev: (3/1) Documentation/w1/slaves: (0/1) Documentation/locking: (0/1) Documentation/devicetree: (0/5) Documentation/power: (1/1) Documentation/powerpc: (0/5) Documentation/arm: (1/0) Documentation/x86: (0/9) Documentation/x86/x86_64: (1/1) Documentation/scsi: (4/4) Documentation/filesystems: (2/9) Documentation/filesystems/nfs: (0/2) Documentation/cgroup-v1: (0/2) Documentation/kbuild: (0/4) Documentation/spi: (1/0) Documentation/virtual/kvm: (1/0) Documentation/scheduler: (0/2) Documentation/fb: (0/1) Documentation/block: (0/1) Documentation/networking: (6/37) Documentation/vm: (1/3) Then there are 364 subdirectories in Documentation/ with several files that are missing 00-INDEX alltogether (and another 120 with a single file and no 00-INDEX). I don't really have an opinion to whether or not we /should/ have 00-INDEX, but the above 00-INDEX should either be removed or be kept up to date. If we should keep the files, I can try to keep them updated, but I rather not if we just want to delete them anyway. As a starting point, remove all index-files and references to 00-INDEX and see where the discussion is going. Signed-off-by:
Henrik Austad <henrik@austad.us> Acked-by:
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Just-do-it-by:
Steven Rostedt <rostedt@goodmis.org> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Acked-by:
Paul Moore <paul@paul-moore.com> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by:
Mark Brown <broonie@kernel.org> Acked-by:
Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: [Almost everybody else] Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- Aug 22, 2018
-
-
Dongjiu Geng authored
In the documentation description, this capability's name is KVM_CAP_ARM_SET_SERROR_ESR, but in the header file this capability's name is KVM_CAP_ARM_INJECT_SERROR_ESR, so change the documentation description to make it same. Signed-off-by:
Dongjiu Geng <gengdongjiu@huawei.com> Reported-by:
Andrew Jones <drjones@redhat.com> Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Aug 06, 2018
-
-
Wanpeng Li authored
Using hypercall to send IPIs by one vmexit instead of one by one for xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster mode. Intel guest can enter x2apic cluster mode when interrupt remmaping is enabled in qemu, however, latest AMD EPYC still just supports xapic mode which can get great improvement by Exit-less IPIs. This patchset lets a guest send multicast IPIs, with at most 128 destinations per hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode. Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141 ): x2apic cluster mode, vanilla Dry-run: 0, 2392199 ns Self-IPI: 6907514, 15027589 ns Normal IPI: 223910476, 251301666 ns Broadcast IPI: 0, 9282161150 ns Broadcast lock: 0, 8812934104 ns x2apic cluster mode, pv-ipi Dry-run: 0, 2449341 ns Self-IPI: 6720360, 15028732 ns Normal IPI: 228643307, 255708477 ns Broadcast IPI: 0, 7572293590 ns => 22% performance boost Broadcast lock: 0, 8316124651 ns x2apic physical mode, vanilla Dry-run: 0, 3135933 ns Self-IPI: 8572670, 17901757 ns Normal IPI: 226444334, 255421709 ns Broadcast IPI: 0, 19845070887 ns Broadcast lock: 0, 19827383656 ns x2apic physical mode, pv-ipi Dry-run: 0, 2446381 ns Self-IPI: 6788217, 15021056 ns Normal IPI: 219454441, 249583458 ns Broadcast IPI: 0, 7806540019 ns => 154% performance boost Broadcast lock: 0, 9143618799 ns Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
Wanpeng Li <wanpengli@tencent.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jim Mattson authored
For nested virtualization L0 KVM is managing a bit of state for L2 guests, this state can not be captured through the currently available IOCTLs. In fact the state captured through all of these IOCTLs is usually a mix of L1 and L2 state. It is also dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM that is in VMX operation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by:
Jim Mattson <jmattson@google.com> [karahmed@ - rename structs and functions and make them ready for AMD and address previous comments. - handle nested.smm state. - rebase & a bit of refactoring. - Merge 7/8 and 8/8 into one patch. ] Signed-off-by:
KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jul 30, 2018
-
-
Janosch Frank authored
General KVM huge page support on s390 has to be enabled via the kvm.hpage module parameter. Either nested or hpage can be enabled, as we currently do not support vSIE for huge backed guests. Once the vSIE support is added we will either drop the parameter or enable it as default. For a guest the feature has to be enabled through the new KVM_CAP_S390_HPAGE_1M capability and the hpage module parameter. Enabling it means that cmm can't be enabled for the vm and disables pfmf and storage key interpretation. This is due to the fact that in some cases, in upcoming patches, we have to split huge pages in the guest mapping to be able to set more granular memory protection on 4k pages. These split pages have fake page tables that are not visible to the Linux memory management which subsequently will not manage its PGSTEs, while the SIE will. Disabling these features lets us manage PGSTE data in a consistent matter and solve that problem. Signed-off-by:
Janosch Frank <frankja@linux.ibm.com> Reviewed-by:
David Hildenbrand <david@redhat.com>
-
- Jul 21, 2018
-
-
James Morse authored
arm64's new use of KVMs get_events/set_events API calls isn't just or RAS, it allows an SError that has been made pending by KVM as part of its device emulation to be migrated. Wire this up for 32bit too. We only need to read/write the HCR_VA bit, and check that no esr has been provided, as we don't yet support VDFSR. Signed-off-by:
James Morse <james.morse@arm.com> Reviewed-by:
Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Dongjiu Geng authored
For the arm64 RAS Extension, user space can inject a virtual-SError with specified ESR. So user space needs to know whether KVM support to inject such SError, this interface adds this query for this capability. KVM will check whether system support RAS Extension, if supported, KVM returns true to user space, otherwise returns false. Signed-off-by:
Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by:
James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Dongjiu Geng authored
For the migrating VMs, user space may need to know the exception state. For example, in the machine A, KVM make an SError pending, when migrate to B, KVM also needs to pend an SError. This new IOCTL exports user-invisible states related to SError. Together with appropriate user space changes, user space can get/set the SError exception state to do migrate/snapshot/suspend. Signed-off-by:
Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by:
James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
Update the documentation to reflect the ordering requirements of restoring the GICD_IIDR register before any other registers and the effects this has on restoring the interrupt groups for an emulated GICv2 instance. Also remove some outdated limitations in the documentation while we're at it. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jun 22, 2018
-
-
Vitaly Kuznetsov authored
KVM_CAP_HYPERV_TLBFLUSH collided with KVM_CAP_S390_PSW-BPB, its paragraph number should now be 8.18. Signed-off-by:
Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- Jun 01, 2018
-
-
Liran Alon authored
We can document other "Known Limitations" but the ones currently referenced don't hold anymore... Signed-off-by:
Liran Alon <liran.alon@oracle.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Liran Alon authored
Fix outdated statement that KVM is not able to expose SLAT (Second-Layer-Address-Translation) to guests. This was implemented a long time ago... Signed-off-by:
Liran Alon <liran.alon@oracle.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- May 26, 2018
-
-
Liran Alon authored
Signed-off-by:
Liran Alon <liran.alon@oracle.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
Document the subtle nuances that KVM_CAP_X86_DISABLE_EXITS induces in the KVM_GET_SUPPORTED_CPUID API. Fixes: 4d5422ce ("KVM: X86: Provide a capability to disable MWAIT intercepts") Signed-off-by:
Jim Mattson <jmattson@google.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-