- Apr 21, 2015
-
-
Paul Mackerras authored
This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vmnnnn, where nnnn is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by:
Paul Mackerras <paulus@samba.org> Signed-off-by:
Alexander Graf <agraf@suse.de>
-
- Apr 10, 2015
-
-
Radim Krčmář authored
kvm_write_guest_cached() does not mark all written pages as dirty and code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot with cross page accesses. Fix all the easy way. The check is '<= 1' to have the same result for 'len = 0' cache anywhere in the page. (nr_pages_needed is 0 on page boundary.) Fixes: 8f964525 ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by:
Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 08, 2015
-
-
Paolo Bonzini authored
The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit 86ab8cff (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>
-
- Mar 31, 2015
-
-
Jens Freimann authored
We have introduced struct kvm_s390_irq a while ago which allows to inject all kinds of interrupts as defined in the Principles of Operation. Add ioctl to inject interrupts with the extended struct kvm_s390_irq Signed-off-by:
Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com>
-
- Mar 30, 2015
-
-
Andre Przywara authored
Currently we have struct kvm_exit_mmio for encapsulating MMIO abort data to be passed on from syndrome decoding all the way down to the VGIC register handlers. Now as we switch the MMIO handling to be routed through the KVM MMIO bus, it does not make sense anymore to use that structure already from the beginning. So we keep the data in local variables until we put them into the kvm_io_bus framework. Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private structure. On that way we replace the data buffer in that structure with a pointer pointing to a single location in a local variable, so we get rid of some copying on the way. With all of the virtual GIC emulation code now being registered with the kvm_io_bus, we can remove all of the old MMIO handling code and its dispatching functionality. I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something), because that touches a lot of code lines without any good reason. This is based on an original patch by Nikolay. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
Using the framework provided by the recent vgic.c changes, we register a kvm_io_bus device on mapping the virtual GICv3 resources. The distributor mapping is pretty straight forward, but the redistributors need some more love, since they need to be tagged with the respective redistributor (read: VCPU) they are connected with. We use the kvm_io_bus framework to register one devices per VCPU. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
Currently we handle the redistributor registers in two separate MMIO regions, one for the overall behaviour and SPIs and one for the SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus devices for each redistributor. Since the spec mandates those two pages to be contigious, we could as well merge them and save the churn with the second KVM I/O bus device. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Mar 26, 2015
-
-
Andre Przywara authored
Using the framework provided by the recent vgic.c changes we register a kvm_io_bus device when initializing the virtual GICv2. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
Currently we use a lot of VGIC specific code to do the MMIO dispatching. Use the previous reworks to add kvm_io_bus style MMIO handlers. Those are not yet called by the MMIO abort handler, also the actual VGIC emulator function do not make use of it yet, but will be enabled with the following patches. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio argument, but actually only used the length field in there. Since we need to get rid of that structure in that part of the code anyway, let's rework the function (and it's callers) to pass the length argument to the function directly. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
The name "kvm_mmio_range" is a bit bold, given that it only covers the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename it to vgic_io_range. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Nikolay Nikolaev authored
This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by:
Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Paolo Bonzini <pbonzini@redhat.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Mar 24, 2015
-
-
Igor Mammedov authored
KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by:
Igor Mammedov <imammedo@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Mar 19, 2015
-
-
Takuya Yoshikawa authored
When all bits in mask are not set, kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do. But since it needs to be called from the generic code, it cannot be inlined, and a few function calls, two when PML is enabled, are wasted. Since it is common to see many pages remain clean, e.g. framebuffers can stay calm for a long time, it is worth eliminating this overhead. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Mar 14, 2015
-
-
Christoffer Dall authored
When a VCPU is no longer running, we currently check to see if it has a timer scheduled in the future, and if it does, we schedule a host hrtimer to notify is in case the timer expires while the VCPU is still not running. When the hrtimer fires, we mask the guest's timer and inject the timer IRQ (still relying on the guest unmasking the time when it receives the IRQ). This is all good and fine, but when migration a VM (checkpoint/restore) this introduces a race. It is unlikely, but possible, for the following sequence of events to happen: 1. Userspace stops the VM 2. Hrtimer for VCPU is scheduled 3. Userspace checkpoints the VGIC state (no pending timer interrupts) 4. The hrtimer fires, schedules work in a workqueue 5. Workqueue function runs, masks the timer and injects timer interrupt 6. Userspace checkpoints the timer state (timer masked) At restore time, you end up with a masked timer without any timer interrupts and your guest halts never receiving timer interrupts. Fix this by only kicking the VCPU in the workqueue function, and sample the expired state of the timer when entering the guest again and inject the interrupt and mask the timer only then. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Christoffer Dall authored
Migrating active interrupts causes the active state to be lost completely. This implements some additional bitmaps to track the active state on the distributor and export this to user space. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Alex Bennée authored
This helps re-factor away some of the repetitive code and makes the code flow more nicely. Signed-off-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Christoffer Dall authored
There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by:
Alex Bennee <alex.bennee@linaro.org> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Alex Bennée <alex.bennee@linaro.org> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
- Mar 13, 2015
-
-
Wei Yongjun authored
Add the missing unlock before return from function kvm_vgic_create() in the error handling case. Signed-off-by:
Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
- Mar 12, 2015
-
-
Eric Auger authored
This patch enables irqfd on arm/arm64. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Irqfd injection is restricted to SPI. The rationale behind not supporting PPI irqfd injection is that any device using a PPI would be a private-to-the-CPU device (timer for instance), so its state would have to be context-switched along with the VCPU and would require in-kernel wiring anyhow. It is not a relevant use case for irqfds. Signed-off-by:
Eric Auger <eric.auger@linaro.org> Reviewed-by:
Christoffer Dall <christoffer.dall@linaro.org> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Eric Auger authored
To prepare for irqfd addition, coarse grain locking is removed at kvm_vgic_sync_hwstate level and finer grain locking is introduced in vgic_process_maintenance only. Signed-off-by:
Eric Auger <eric.auger@linaro.org> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Eric Auger authored
Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and associated kvm_arch_intc_initialized function. This latter allows to test whether the virtual interrupt controller is initialized and ready to accept virtual IRQ injection. On some architectures, the virtual interrupt controller is dynamically instantiated, justifying that kind of check. The new function can now be used by irqfd to check whether the virtual interrupt controller is ready on KVM_IRQFD request. If not, KVM_IRQFD returns -EAGAIN. Signed-off-by:
Eric Auger <eric.auger@linaro.org> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
- Mar 11, 2015
-
-
Mark Rutland authored
Several dts only list "arm,cortex-a7-gic" or "arm,gic-400" in their GIC compatible list, and while this is correct (and supported by the GIC driver), KVM will fail to detect that it can support these cases. This patch adds the missing strings to the VGIC code. The of_device_id entries are padded to keep the probe function data aligned. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michal Simek <monstr@monstr.eu> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Paolo Bonzini authored
POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by:
Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 297e2105 Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Mar 10, 2015
-
-
Xiubo Li authored
WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... + printk(KERN_INFO "kvm: exiting hardware virtualization\n"); WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... + printk(KERN_ERR "kvm: misc device register failed\n"); Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiubo Li authored
ERROR: code indent should use tabs where possible + const struct kvm_io_range *r2)$ WARNING: please, no spaces at the start of a line + const struct kvm_io_range *r2)$ This patch fixes this ERROR & WARNING to reduce noise when checking new patches in kvm_main.c. Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiubo Li authored
WARNING: please, no space before tabs + * ^I^Ikvm->lock --> kvm->slots_lock --> kvm->irq_lock$ WARNING: please, no space before tabs +^I^I * ^I- gfn_to_hva (kvm_read_guest, gfn_to_pfn)$ WARNING: please, no space before tabs +^I^I * ^I- kvm_is_visible_gfn (mmu_check_roots)$ This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiubo Li authored
There are many Warnings like this: WARNING: Missing a blank line after declarations + struct kvm_coalesced_mmio_zone zone; + r = -EFAULT; This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiubo Li authored
WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable +EXPORT_SYMBOL_GPL(gfn_to_page); This patch fixes these warnings to reduce noise when checking new patches in kvm_main.c. Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiubo Li authored
ERROR: do not initialise statics to 0 or NULL +static int kvm_usage_count = 0; The kvm_usage_count will be placed to .bss segment when linking, so not need to set it to 0 here obviously. This patch fixes this ERROR to reduce noise when checking new patches in kvm_main.c. Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiubo Li authored
WARNING: labels should not be indented + out_free_irq_routing: This patch fixes this WARNING to reduce noise when checking new patches in kvm_main.c. Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiubo Li authored
There are many WARNINGs like this: WARNING: sizeof tr should be sizeof(tr) + if (copy_from_user(&tr, argp, sizeof tr)) In kvm_main.c many places are using 'sizeof(X)', and the other places are using 'sizeof X', while the kernel recommands to use 'sizeof(X)', so this patch will replace all 'sizeof X' to 'sizeof(X)' to make them consistent and at the same time to reduce the WARNINGs noise when we are checking new patches. Signed-off-by:
Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Thomas Huth authored
kvm_kvfree() provides exactly the same functionality as the new common kvfree() function - so let's simply replace the kvm function with the common function. Signed-off-by:
Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Christian Borntraeger authored
halt_poll_ns is used only locally. Make it static. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Kevin Mulvey authored
Fix whitespace around while Signed-off-by:
Kevin Mulvey <kmulvey@linux.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Kevin Mulvey authored
Better alignment of loop using tabs rather than spaces, this makes checkpatch.pl happier. Signed-off-by:
Kevin Mulvey <kmulvey@linux.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Feb 12, 2015
-
-
Andrea Arcangeli authored
Use the more generic get_user_pages_unlocked which has the additional benefit of passing FAULT_FLAG_ALLOW_RETRY at the very first page fault (which allows the first page fault in an unmapped area to be always able to block indefinitely by being allowed to release the mmap_sem). Signed-off-by:
Andrea Arcangeli <aarcange@redhat.com> Reviewed-by:
Andres Lagar-Cavilla <andreslc@google.com> Reviewed-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 09, 2015
-
-
Christian Borntraeger authored
We never had a 31bit QEMU/kuli running. We would need to review several ioctls to check if this creates holes, bugs or whatever to make it work. Lets just disable compat support for KVM on s390. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Acked-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Feb 06, 2015
-
-
Paolo Bonzini authored
This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-