- Dec 27, 2009
-
-
Heiko Carstens authored
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_create_vm': arch/s390/kvm/../../../virt/kvm/kvm_main.c:409: warning: label 'out_err' defined but not used Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Sheng Yang authored
One possible order is: KVM_CREATE_IRQCHIP ioctl(took kvm->lock) -> kvm_iobus_register_dev() -> down_write(kvm->slots_lock). The other one is in kvm_vm_ioctl_assign_device(), which take kvm->slots_lock first, then kvm->lock. Update the comment of lock order as well. Observe it due to kernel locking debug warnings. Cc: stable@kernel.org Signed-off-by:
Sheng Yang <sheng@linux.intel.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- Dec 22, 2009
-
-
Roland Dreier authored
It seems a couple places such as arch/ia64/kernel/perfmon.c and drivers/infiniband/core/uverbs_main.c could use anon_inode_getfile() instead of a private pseudo-fs + alloc_file(), if only there were a way to get a read-only file. So provide this by having anon_inode_getfile() create a read-only file if we pass O_RDONLY in flags. Signed-off-by:
Roland Dreier <rolandd@cisco.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Dec 03, 2009
-
-
Avi Kivity authored
Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Marcelo Tosatti authored
Otherwise kvm might attempt to dereference a NULL pointer. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Arnd Bergmann authored
With big endian userspace, we can't quite figure out if a pointer is 32 bit (shifted >> 32) or 64 bit when we read a 64 bit pointer. This is what happens with dirty logging. To get the pointer interpreted correctly, we thus need Arnd's patch to implement a compat layer for the ioctl: A better way to do this is to add a separate compat_ioctl() method that converts this for you. Based on initial patch from Arnd Bergmann. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Alexander Graf <agraf@suse.de> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Marcelo Tosatti authored
find_first_zero_bit works with bit numbers, not bytes. Fixes https://sourceforge.net/tracker/?func=detail&aid=2847560&group_id=180599&atid=893831 Reported-by:
"Xu, Jiajun" <jiajun.xu@intel.com> Cc: stable@kernel.org Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Zhai, Edwin authored
Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing once the cpu detects pause-based looping. Signed-off-by:
"Zhai, Edwin" <edwin.zhai@intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jiri Slaby authored
Stanse found 2 lock imbalances in kvm_request_irq_source_id and kvm_free_irq_source_id. They omit to unlock kvm->irq_lock on fail paths. Fix that by adding unlock labels at the end of the functions and jump there from the fail paths. Signed-off-by:
Jiri Slaby <jirislaby@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Alexander Graf authored
X86 CPUs need to have some magic happening to enable the virtualization extensions on them. This magic can result in unpleasant results for users, like blocking other VMMs from working (vmx) or using invalid TLB entries (svm). Currently KVM activates virtualization when the respective kernel module is loaded. This blocks us from autoloading KVM modules without breaking other VMMs. To circumvent this problem at least a bit, this patch introduces on demand activation of virtualization. This means, that instead virtualization is enabled on creation of the first virtual machine and disabled on destruction of the last one. So using this, KVM can be easily autoloaded, while keeping other hypervisors usable. Signed-off-by:
Alexander Graf <agraf@suse.de> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
The only thing it protects now is interrupt injection into lapic and this can work lockless. Even now with kvm->irq_lock in place access to lapic is not entirely serialized since vcpu access doesn't take kvm->irq_lock. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
The allows removal of irq_lock from the injection path. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Use RCU locking for mask/ack notifiers lists. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Mask irq notifier list is already there. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Maintain back mapping from irqchip/pin to gsi to speedup interrupt acknowledgment notifications. [avi: build fix on non-x86/ia64] Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Use gsi indexed array instead of scanning all entries on each interrupt injection. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
This removes assumptions that max GSIs is smaller than number of pins. Sharing is tracked on pin level not GSI level. [avi: no PIC on ia64] Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
Preemption notifiers will do that for us automatically. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- Nov 05, 2009
-
-
Alexander Graf authored
We currently use host endian long types to store information in the dirty bitmap. This works reasonably well on Little Endian targets, because the u32 after the first contains the next 32 bits. On Big Endian this breaks completely though, forcing us to be inventive here. So Ben suggested to always use Little Endian, which looks reasonable. We only have dirty bitmap implemented in Little Endian targets so far and since PowerPC would be the first Big Endian platform, we can just as well switch to Little Endian always with little effort without breaking existing targets. Signed-off-by:
Alexander Graf <agraf@suse.de> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Oct 16, 2009
-
-
Darrick J. Wong authored
I'm seeing an oops condition when kvm-intel and kvm-amd are modprobe'd during boot (say on an Intel system) and then rmmod'd: # modprobe kvm-intel kvm_init() kvm_init_debug() kvm_arch_init() <-- stores debugfs dentries internally (success, etc) # modprobe kvm-amd kvm_init() kvm_init_debug() <-- second initialization clobbers kvm's internal pointers to dentries kvm_arch_init() kvm_exit_debug() <-- and frees them # rmmod kvm-intel kvm_exit() kvm_exit_debug() <-- double free of debugfs files! *BOOM* If execution gets to the end of kvm_init(), then the calling module has been established as the kvm provider. Move the debugfs initialization to the end of the function, and remove the now-unnecessary call to kvm_exit_debug() from the error path. That way we avoid trampling on the debugfs entries and freeing them twice. Cc: stable@kernel.org Signed-off-by:
Darrick J. Wong <djwong@us.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Oct 04, 2009
-
-
Izik Eidus authored
this is needed for kvm if it want ksm to directly map pages into its shadow page tables. [marcelo: cast pfn assignment to u64] Signed-off-by:
Izik Eidus <ieidus@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Oct 01, 2009
-
-
Alexey Dobriyan authored
[akpm@linux-foundation.org: fix KVM] Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Acked-by:
Mike Frysinger <vapier@gentoo.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 27, 2009
-
-
Alexey Dobriyan authored
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 24, 2009
-
-
Li Zefan authored
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node(). Signed-off-by:
Li Zefan <lizf@cn.fujitsu.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
- Sep 10, 2009
-
-
Julia Lawall authored
This code is not executed before file has been initialized to the result of calling eventfd_fget. This function returns an ERR_PTR value in an error case instead of NULL. Thus the test that file is not NULL is always true. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/ ) // <smpl> @match exists@ expression x, E; statement S1, S2; @@ x = eventfd_fget(...) ... when != x = E ( * if (x == NULL || ...) S1 else S2 | * if (x == NULL && ...) S1 else S2 ) // </smpl> Signed-off-by:
Julia Lawall <julia@diku.dk> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Heiko Carstens authored
CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_set_memory_region': arch/s390/kvm/../../../virt/kvm/kvm_main.c:485: warning: unused variable 'j' arch/s390/kvm/../../../virt/kvm/kvm_main.c:484: warning: unused variable 'lpages' arch/s390/kvm/../../../virt/kvm/kvm_main.c:483: warning: unused variable 'ugfn' Cc: Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
This bug was introduced by b4a2f5e7. Cc: stable@kernel.org Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
The symbol only controls irq routing, not MSI-X. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Remove debugfs file if kvm_arch_init() return error Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Jan Kiszka authored
spin_lock disables preemption, so we can simply read the current cpu. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from interface between general code and arch code. kvm_arch_vcpu_runnable() checks for interrupts instead. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gregory Haskins authored
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via ioeventfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 110000 iops, 9.09us rtt ioeventfd-mmio: 200100 iops, 5.00us rtt ioeventfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt ioeventfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. -------------------- Signed-off-by:
Gregory Haskins <ghaskins@novell.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gregory Haskins authored
Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON if it fails. We want to create dynamic MMIO/PIO entries driven from userspace later in the series, so we need to enhance the code to be more robust with the following changes: 1) Add a return value to the registration function 2) Fix up all the callsites to check the return code, handle any failures, and percolate the error up to the caller. 3) Add an unregister function that collapses holes in the array Signed-off-by:
Gregory Haskins <ghaskins@novell.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Add tracepoint in msi/ioapic/pic set_irq() functions, in IPI sending and in the point where IRQ is placed into apic's IRR. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Michael S. Tsirkin authored
Irqfd sets level for interrupt to 1 and then to 0. For MSI, check level so that a single message is sent. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Gleb Natapov authored
Cosmetic only. No logic is changed by this patch. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Jiri Slaby authored
There is a missing unlock on one fail path in ioapic_mmio_write, fix that. Signed-off-by:
Jiri Slaby <jirislaby@gmail.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-