- Jul 10, 2015
-
-
Paolo Bonzini authored
If there are no assigned devices, the guest PAT are not providing any useful information and can be overridden to writeback; VMX always does this because it has the "IPAT" bit in its extended page table entries, but SVM does not have anything similar. Hook into VFIO and legacy device assignment so that they provide this information to KVM. Reviewed-by:
Alex Williamson <alex.williamson@redhat.com> Tested-by:
Joerg Roedel <jroedel@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jul 03, 2015
-
-
Peter Zijlstra authored
Commit 1cde2930 ("sched/preempt: Add static_key() to preempt_notifiers") had two problems. First, the preempt-notifier API needs to sleep with the addition of the static_key, we do however need to hold off preemption while modifying the preempt notifier list, otherwise a preemption could observe an inconsistent list state. KVM correctly registers and unregisters preempt notifiers with preemption disabled, so the sleep caused dmesg splats. Second, KVM registers and unregisters preemption notifiers very often (in vcpu_load/vcpu_put). With a single uniprocessor guest the static key would move between 0 and 1 continuously, hitting the slow path on every userspace exit. To fix this, wrap the static_key inc/dec in a new API, and call it from KVM. Fixes: 1cde2930 ("sched/preempt: Add static_key() to preempt_notifiers") Reported-by:
Pontus Fuchs <pontus.fuchs@gmail.com> Reported-by:
Takashi Iwai <tiwai@suse.de> Tested-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jun 19, 2015
-
-
Kevin Mulvey authored
Tabs rather than spaces Signed-off-by:
Kevin Mulvey <kmulvey@linux.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Kevin Mulvey authored
fix brace spacing Signed-off-by:
Kevin Mulvey <kmulvey@linux.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Joerg Roedel authored
The allocation size of the kvm_irq_routing_table depends on the number of irq routing entries because they are all allocated with one kzalloc call. When the irq routing table gets bigger this requires high order allocations which fail from time to time: qemu-kvm: page allocation failure: order:4, mode:0xd0 This patch fixes this issue by breaking up the allocation of the table and its entries into individual kzalloc calls. These could all be satisfied with order-0 allocations, which are less likely to fail. The downside of this change is the lower performance, because of more calls to kzalloc. But given how often kvm_set_irq_routing is called in the lifetime of a guest, it doesn't really matter much. Signed-off-by:
Joerg Roedel <jroedel@suse.de> [Avoid sparse warning through rcu_access_pointer. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jun 18, 2015
-
-
Marc Zyngier authored
Back in the days, vgic.c used to have an intimate knowledge of the actual GICv2. These days, this has been abstracted away into hardware-specific backends. Remove the now useless arm-gic.h #include directive, making it clear that GICv2 specific code doesn't belong here. Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jun 17, 2015
-
-
Marc Zyngier authored
Commit fd1d0ddf (KVM: arm/arm64: check IRQ number on userland injection) rightly limited the range of interrupts userspace can inject in a guest, but failed to consider the (unlikely) case where a guest is configured with 1024 interrupts. In this case, interrupts ranging from 1020 to 1023 are unuseable, as they have a special meaning for the GIC CPU interface. Make sure that these number cannot be used as an IRQ. Also delete a redundant (and similarily buggy) check in kvm_set_irq. Reported-by:
Peter Maydell <peter.maydell@linaro.org> Cc: Andre Przywara <andre.przywara@arm.com> Cc: <stable@vger.kernel.org> # 4.1, 4.0, 3.19, 3.18 Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
If a GICv3-enabled guest tries to configure Group0, we print a warning on the console (because we don't support Group0 interrupts). This is fairly pointless, and would allow a guest to spam the console. Let's just drop the warning. Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jun 12, 2015
-
-
Marc Zyngier authored
So far, we configured the world-switch by having a small array of pointers to the save and restore functions, depending on the GIC used on the platform. Loading these values each time is a bit silly (they never change), and it makes sense to rely on the instruction patching instead. This leads to a nice cleanup of the code. Acked-by:
Will Deacon <will.deacon@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
- Jun 09, 2015
-
-
Andre Przywara authored
Commit 47a98b15 ("arm/arm64: KVM: support for un-queuing active IRQs") introduced handling of the GICD_I[SC]ACTIVER registers, but only for the GICv2 emulation. For the sake of completeness and as this is a pre-requisite for save/restore of the GICv3 distributor state, we should also emulate their handling in the distributor and redistributor frames of an emulated GICv3. Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jun 05, 2015
-
-
Paolo Bonzini authored
Only two ioctls have to be modified; the address space id is placed in the higher 16 bits of their slot id argument. As of this patch, no architecture defines more than one address space; x86 will be the first. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to use different address spaces, depending on whether the VCPU is in system management mode. We need to introduce a new family of functions for this purpose. For now, the VCPU-based functions have the same behavior as the existing per-VM ones, they just accept a different type for the first argument. Later however they will be changed to use one of many "struct kvm_memslots" stored in struct kvm, through an architecture hook. VM-based functions will unconditionally use the first memslots pointer. Whenever possible, this patch introduces slot-based functions with an __ prefix, with two wrappers for generic and vcpu-based actions. The exceptions are kvm_read_guest and kvm_write_guest, which are copied into the new functions kvm_vcpu_read_guest and kvm_vcpu_write_guest. Reviewed-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- May 28, 2015
-
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Most of the function that wrap it can be rewritten without it, except for gfn_to_pfn_prot. Just inline it into gfn_to_pfn_prot, and rewrite the other function on top of gfn_to_pfn_memslot*. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The memory slot is already available from gfn_to_memslot_dirty_bitmap. Isn't it a shame to look it up again? Plus, it makes gfn_to_page_many_atomic agnostic of multiple VCPU address spaces. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- May 26, 2015
-
-
Paolo Bonzini authored
Prepare for the case of multiple address spaces. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
kvm_memslots provides lockdep checking. Use it consistently instead of explicit dereferencing of kvm->memslots. Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
kvm_alloc_memslots is extracted out of previously scattered code that was in kvm_init_memslots_id and kvm_create_vm. kvm_free_memslot and kvm_free_memslots are new names of kvm_free_physmem and kvm_free_physmem_slot, but they also take an explicit pointer to struct kvm_memslots. This will simplify the transition to multiple address spaces, each represented by one pointer to struct kvm_memslots. Reviewed-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by:
Radim Krcmar <rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- May 19, 2015
-
-
Paolo Bonzini authored
gfn_to_pfn_async is used in just one place, and because of x86-specific treatment that place will need to look at the memory slot. Hence inline it into try_async_pf and export __gfn_to_pfn_memslot. The patch also switches the subsequent call to gfn_to_pfn_prot to use __gfn_to_pfn_memslot. This is a small optimization. Finally, remove the now-unused async argument of __gfn_to_pfn. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- May 08, 2015
-
-
Heiko Carstens authored
On cpu hotplug only KVM emits an unconditional message that its notifier has been called. It certainly can be assumed that calling cpu hotplug notifiers work, therefore there is no added value if KVM prints a message. If an error happens on cpu online KVM will still emit a warning. So let's remove this superfluous message. Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Radim Krčmář authored
Caching memslot value and using mark_page_dirty_in_slot() avoids another O(log N) search when dirtying the page. Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Message-Id: <1428695247-27603-1-git-send-email-rkrcmar@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 22, 2015
-
-
Andre Przywara authored
When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently only check it against a fixed limit, which historically is set to 127. With the new dynamic IRQ allocation the effective limit may actually be smaller (64). So when now a malicious or buggy userland injects a SPI in that range, we spill over on our VGIC bitmaps and bytemaps memory. I could trigger a host kernel NULL pointer dereference with current mainline by injecting some bogus IRQ number from a hacked kvmtool: ----------------- .... DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1) DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1) DEBUG: IRQ #114 still in the game, writing to bytemap now... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc07652e000 [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027 Hardware name: FVP Base (DT) task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000 PC is at kvm_vgic_inject_irq+0x234/0x310 LR is at kvm_vgic_inject_irq+0x30c/0x310 pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145 ..... So this patch fixes this by checking the SPI number against the actual limit. Also we remove the former legacy hard limit of 127 in the ioctl code. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@linaro.org> CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18 [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__, as suggested by Christopher Covington] Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Eric Auger authored
irqfd/arm curently does not support routing. kvm_irq_map_gsi is supposed to return all the routing entries associated with the provided gsi and return the number of those entries. We should return 0 at this point. Signed-off-by:
Eric Auger <eric.auger@linaro.org> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Apr 21, 2015
-
-
Paul Mackerras authored
This creates a debugfs directory for each HV guest (assuming debugfs is enabled in the kernel config), and within that directory, a file by which the contents of the guest's HPT (hashed page table) can be read. The directory is named vmnnnn, where nnnn is the PID of the process that created the guest. The file is named "htab". This is intended to help in debugging problems in the host's management of guest memory. The contents of the file consist of a series of lines like this: 3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196 The first field is the index of the entry in the HPT, the second and third are the HPT entry, so the third entry contains the real page number that is mapped by the entry if the entry's valid bit is set. The fourth field is the guest's view of the second doubleword of the entry, so it contains the guest physical address. (The format of the second through fourth fields are described in the Power ISA and also in arch/powerpc/include/asm/mmu-hash64.h.) Signed-off-by:
Paul Mackerras <paulus@samba.org> Signed-off-by:
Alexander Graf <agraf@suse.de>
-
- Apr 10, 2015
-
-
Radim Krčmář authored
kvm_write_guest_cached() does not mark all written pages as dirty and code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot with cross page accesses. Fix all the easy way. The check is '<= 1' to have the same result for 'len = 0' cache anywhere in the page. (nr_pages_needed is 0 on page boundary.) Fixes: 8f964525 ("KVM: Allow cross page reads and writes from cached translations.") Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com> Message-Id: <20150408121648.GA3519@potion.brq.redhat.com> Reviewed-by:
Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Apr 08, 2015
-
-
Paolo Bonzini authored
The corresponding write functions just use __copy_to_user. Do the same on the read side. This reverts what's left of commit 86ab8cff (KVM: introduce gfn_to_hva_read/kvm_read_hva/kvm_read_hva_atomic, 2012-08-21) Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1427976500-28533-1-git-send-email-pbonzini@redhat.com>
-
- Mar 31, 2015
-
-
Jens Freimann authored
We have introduced struct kvm_s390_irq a while ago which allows to inject all kinds of interrupts as defined in the Principles of Operation. Add ioctl to inject interrupts with the extended struct kvm_s390_irq Signed-off-by:
Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com>
-
- Mar 30, 2015
-
-
Andre Przywara authored
Currently we have struct kvm_exit_mmio for encapsulating MMIO abort data to be passed on from syndrome decoding all the way down to the VGIC register handlers. Now as we switch the MMIO handling to be routed through the KVM MMIO bus, it does not make sense anymore to use that structure already from the beginning. So we keep the data in local variables until we put them into the kvm_io_bus framework. Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private structure. On that way we replace the data buffer in that structure with a pointer pointing to a single location in a local variable, so we get rid of some copying on the way. With all of the virtual GIC emulation code now being registered with the kvm_io_bus, we can remove all of the old MMIO handling code and its dispatching functionality. I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something), because that touches a lot of code lines without any good reason. This is based on an original patch by Nikolay. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
Using the framework provided by the recent vgic.c changes, we register a kvm_io_bus device on mapping the virtual GICv3 resources. The distributor mapping is pretty straight forward, but the redistributors need some more love, since they need to be tagged with the respective redistributor (read: VCPU) they are connected with. We use the kvm_io_bus framework to register one devices per VCPU. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
Currently we handle the redistributor registers in two separate MMIO regions, one for the overall behaviour and SPIs and one for the SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus devices for each redistributor. Since the spec mandates those two pages to be contigious, we could as well merge them and save the churn with the second KVM I/O bus device. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Mar 26, 2015
-
-
Andre Przywara authored
Using the framework provided by the recent vgic.c changes we register a kvm_io_bus device when initializing the virtual GICv2. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
Currently we use a lot of VGIC specific code to do the MMIO dispatching. Use the previous reworks to add kvm_io_bus style MMIO handlers. Those are not yet called by the MMIO abort handler, also the actual VGIC emulator function do not make use of it yet, but will be enabled with the following patches. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio argument, but actually only used the length field in there. Since we need to get rid of that structure in that part of the code anyway, let's rework the function (and it's callers) to pass the length argument to the function directly. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
The name "kvm_mmio_range" is a bit bold, given that it only covers the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename it to vgic_io_range. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Andre Przywara authored
iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Nikolay Nikolaev authored
This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by:
Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by:
Andre Przywara <andre.przywara@arm.com> Acked-by:
Paolo Bonzini <pbonzini@redhat.com> Acked-by:
Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Mar 24, 2015
-
-
Igor Mammedov authored
KVM guest can fail to startup with following trace on host: qemu-system-x86: page allocation failure: order:4, mode:0x40d0 Call Trace: dump_stack+0x47/0x67 warn_alloc_failed+0xee/0x150 __alloc_pages_direct_compact+0x14a/0x150 __alloc_pages_nodemask+0x776/0xb80 alloc_kmem_pages+0x3a/0x110 kmalloc_order+0x13/0x50 kmemdup+0x1b/0x40 __kvm_set_memory_region+0x24a/0x9f0 [kvm] kvm_set_ioapic+0x130/0x130 [kvm] kvm_set_memory_region+0x21/0x40 [kvm] kvm_vm_ioctl+0x43f/0x750 [kvm] Failure happens when attempting to allocate pages for 'struct kvm_memslots', however it doesn't have to be present in physically contiguous (kmalloc-ed) address space, change allocation to kvm_kvzalloc() so that it will be vmalloc-ed when its size is more then a page. Signed-off-by:
Igor Mammedov <imammedo@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Mar 19, 2015
-
-
Takuya Yoshikawa authored
When all bits in mask are not set, kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do. But since it needs to be called from the generic code, it cannot be inlined, and a few function calls, two when PML is enabled, are wasted. Since it is common to see many pages remain clean, e.g. framebuffers can stay calm for a long time, it is worth eliminating this overhead. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-