Commits · 51975db0b7333cf389b64b5040c2a910341d241a · jan.koester / Linux

Dec 27, 2009

KVM: get rid of kvm_create_vm() unused label warning on s390 · b4329db0

Heiko Carstens authored Dec 18, 2009

arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_create_vm':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:409: warning: label 'out_err' defined but not used

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

b4329db0

KVM: Fix possible circular locking in kvm_vm_ioctl_assign_device() · fae3a353

Sheng Yang authored Dec 15, 2009



One possible order is:

KVM_CREATE_IRQCHIP ioctl(took kvm->lock) -> kvm_iobus_register_dev() ->
down_write(kvm->slots_lock).

The other one is in kvm_vm_ioctl_assign_device(), which take kvm->slots_lock
first, then kvm->lock.

Update the comment of lock order as well.

Observe it due to kernel locking debug warnings.

Cc: stable@kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

fae3a353

Dec 22, 2009

anonfd: Allow making anon files read-only · 628ff7c1

Roland Dreier authored Dec 18, 2009



It seems a couple places such as arch/ia64/kernel/perfmon.c and
drivers/infiniband/core/uverbs_main.c could use anon_inode_getfile()
instead of a private pseudo-fs + alloc_file(), if only there were a way
to get a read-only file.  So provide this by having anon_inode_getfile()
create a read-only file if we pass O_RDONLY in flags.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

628ff7c1

Dec 03, 2009

KVM: Allow internal errors reported to userspace to carry extra data · a9c7399d

Avi Kivity authored Nov 04, 2009



Usually userspace will freeze the guest so we can inspect it, but some
internal state is not available.  Add extra data to internal error
reporting so we can expose it to the debugger.  Extra data is specific
to the suberror.

Signed-off-by: Avi Kivity <avi@redhat.com>

a9c7399d

KVM: only clear irq_source_id if irqchip is present · e50212bb

Marcelo Tosatti authored Oct 29, 2009



Otherwise kvm might attempt to dereference a NULL pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

e50212bb

KVM: Enable 32bit dirty log pointers on 64bit host · 6ff5894c

Arnd Bergmann authored Oct 22, 2009



With big endian userspace, we can't quite figure out if a pointer
is 32 bit (shifted >> 32) or 64 bit when we read a 64 bit pointer.

This is what happens with dirty logging. To get the pointer interpreted
correctly, we thus need Arnd's patch to implement a compat layer for
the ioctl:

A better way to do this is to add a separate compat_ioctl() method that
converts this for you.

Based on initial patch from Arnd Bergmann.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

6ff5894c

KVM: fix irq_source_id size verification · cd5a2685

Marcelo Tosatti authored Oct 17, 2009

find_first_zero_bit works with bit numbers, not bytes.

Fixes

https://sourceforge.net/tracker/?func=detail&aid=2847560&group_id=180599&atid=893831



Reported-by: "Xu, Jiajun" <jiajun.xu@intel.com>
Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

cd5a2685

KVM: introduce kvm_vcpu_on_spin · d255f4f2

Zhai, Edwin authored Oct 09, 2009



Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.

Signed-off-by: "Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

d255f4f2

KVM: fix lock imbalance in kvm_*_irq_source_id() · 0c6ddceb

Jiri Slaby authored Sep 25, 2009



Stanse found 2 lock imbalances in kvm_request_irq_source_id and
kvm_free_irq_source_id. They omit to unlock kvm->irq_lock on fail paths.

Fix that by adding unlock labels at the end of the functions and jump
there from the fail paths.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

0c6ddceb

KVM: Activate Virtualization On Demand · 10474ae8

Alexander Graf authored Sep 15, 2009



X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

10474ae8

KVM: Move assigned device code to own file · bfd99ff5
Avi Kivity authored Aug 26, 2009
```
Signed-off-by: Avi Kivity <avi@redhat.com>
```
bfd99ff5

KVM: Drop kvm->irq_lock lock from irq injection path · 680b3648

Gleb Natapov authored Aug 24, 2009



The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

680b3648

KVM: Move IO APIC to its own lock · eba0226b

Gleb Natapov authored Aug 24, 2009



The allows removal of irq_lock from the injection path.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

eba0226b

KVM: Convert irq notifiers lists to RCU locking · 280aa177

Gleb Natapov authored Aug 24, 2009



Use RCU locking for mask/ack notifiers lists.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

280aa177

KVM: Move irq ack notifier list to arch independent code · 136bdfee

Gleb Natapov authored Aug 24, 2009



Mask irq notifier list is already there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

136bdfee

KVM: Move irq routing data structure to rcu locking · e42bba90
Gleb Natapov authored Aug 24, 2009
```
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
```
e42bba90

KVM: Maintain back mapping from irqchip/pin to gsi · 3e71f88b

Gleb Natapov authored Aug 24, 2009



Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

3e71f88b

KVM: Change irq routing table to use gsi indexed array · 46e624b9

Gleb Natapov authored Aug 24, 2009



Use gsi indexed array instead of scanning all entries on each interrupt
injection.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

46e624b9

KVM: Move irq sharing information to irqchip level · 1a6e4a8c

Gleb Natapov authored Aug 24, 2009



This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

1a6e4a8c

KVM: Don't wrap schedule() with vcpu_put()/vcpu_load() · 45ec431c

Avi Kivity authored Aug 23, 2009



Preemption notifiers will do that for us automatically.

Signed-off-by: Avi Kivity <avi@redhat.com>

45ec431c

Nov 05, 2009

Use Little Endian for Dirty Bitmap · c8240bd6

Alexander Graf authored Oct 30, 2009



We currently use host endian long types to store information
in the dirty bitmap.

This works reasonably well on Little Endian targets, because the
u32 after the first contains the next 32 bits. On Big Endian this
breaks completely though, forcing us to be inventive here.

So Ben suggested to always use Little Endian, which looks reasonable.

We only have dirty bitmap implemented in Little Endian targets so far
and since PowerPC would be the first Big Endian platform, we can just
as well switch to Little Endian always with little effort without
breaking existing targets.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

c8240bd6

Oct 16, 2009

KVM: Prevent kvm_init from corrupting debugfs structures · 0ea4ed8e

Darrick J. Wong authored Oct 14, 2009



I'm seeing an oops condition when kvm-intel and kvm-amd are modprobe'd
during boot (say on an Intel system) and then rmmod'd:

   # modprobe kvm-intel
     kvm_init()
     kvm_init_debug()
     kvm_arch_init()  <-- stores debugfs dentries internally
     (success, etc)

   # modprobe kvm-amd
     kvm_init()
     kvm_init_debug() <-- second initialization clobbers kvm's
                          internal pointers to dentries
     kvm_arch_init()
     kvm_exit_debug() <-- and frees them

   # rmmod kvm-intel
     kvm_exit()
     kvm_exit_debug() <-- double free of debugfs files!

     *BOOM*

If execution gets to the end of kvm_init(), then the calling module has been
established as the kvm provider.  Move the debugfs initialization to the end of
the function, and remove the now-unnecessary call to kvm_exit_debug() from the
error path.  That way we avoid trampling on the debugfs entries and freeing
them twice.

Cc: stable@kernel.org
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

0ea4ed8e

Oct 04, 2009

KVM: add support for change_pte mmu notifiers · 3da0dd43

Izik Eidus authored Sep 23, 2009



this is needed for kvm if it want ksm to directly map pages into its
shadow page tables.

[marcelo: cast pfn assignment to u64]

Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

3da0dd43

Oct 01, 2009

const: constify remaining file_operations · 828c0950

Alexey Dobriyan authored Oct 01, 2009



[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

828c0950

Sep 27, 2009

const: mark struct vm_struct_operations · f0f37e2f

Alexey Dobriyan authored Sep 27, 2009



* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f0f37e2f

Sep 24, 2009

cpumask: use zalloc_cpumask_var() where possible · 79f55997

Li Zefan authored Jun 15, 2009



Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

79f55997

Sep 10, 2009

KVM: correct error-handling code · 6223011f

Julia Lawall authored Jul 28, 2009

This code is not executed before file has been initialized to the result of
calling eventfd_fget.  This function returns an ERR_PTR value in an error
case instead of NULL.  Thus the test that file is not NULL is always true.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/

)

// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@

x = eventfd_fget(...)
... when != x = E
(
*  if (x == NULL || ...) S1 else S2
|
*  if (x == NULL && ...) S1 else S2
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Avi Kivity <avi@redhat.com>

6223011f

KVM: fix compile warnings on s390 · 28bcb112

Heiko Carstens authored Sep 03, 2009

CC arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function '__kvm_set_memory_region':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:485: warning: unused variable 'j'
arch/s390/kvm/../../../virt/kvm/kvm_main.c:484: warning: unused variable 'lpages'
arch/s390/kvm/../../../virt/kvm/kvm_main.c:483: warning: unused variable 'ugfn'

Cc: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

28bcb112

KVM: Fix coalesced interrupt reporting in IOAPIC · 65a82211

Gleb Natapov authored Sep 03, 2009



This bug was introduced by b4a2f5e7.

Cc: stable@kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

65a82211

KVM: Move #endif KVM_CAP_IRQ_ROUTING to correct place · 6621fbc2
Avi Kivity authored Aug 10, 2009
```
The symbol only controls irq routing, not MSI-X.

Signed-off-by: Avi Kivity <avi@redhat.com>
```
6621fbc2

KVM: fix kvm_init() error handling · aed665f7

Xiao Guangrong authored Aug 03, 2009



Remove debugfs file if kvm_arch_init() return error

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

aed665f7

KVM: Drop obsolete cpu_get/put in make_all_cpus_request · e601e3be

Jan Kiszka authored Jul 20, 2009



spin_lock disables preemption, so we can simply read the current cpu.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

e601e3be

KVM: Reduce runnability interface with arch support code · a1b37100

Gleb Natapov authored Jul 09, 2009



Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

a1b37100

KVM: add ioeventfd support · d34e6b17

Gregory Haskins authored Jul 07, 2009

ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
ioeventfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:       110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:      153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png



The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

d34e6b17

KVM: make io_bus interface more robust · 090b7aff

Gregory Haskins authored Jul 07, 2009



Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON
if it fails.  We want to create dynamic MMIO/PIO entries driven from
userspace later in the series, so we need to enhance the code to be more
robust with the following changes:

   1) Add a return value to the registration function
   2) Fix up all the callsites to check the return code, handle any
      failures, and percolate the error up to the caller.
   3) Add an unregister function that collapses holes in the array

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

090b7aff

KVM: Add trace points in irqchip code · 1000ff8d

Gleb Natapov authored Jul 07, 2009



Add tracepoint in msi/ioapic/pic set_irq() functions,
in IPI sending and in the point where IRQ is placed into
apic's IRR.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

1000ff8d

KVM: ignore msi request if !level · 07fb8bb2

Michael S. Tsirkin authored Jul 05, 2009



Irqfd sets level for interrupt to 1 and then to 0.
For MSI, check level so that a single message is sent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

07fb8bb2

KVM: Use temporary variable to shorten lines. · 70f93dae

Gleb Natapov authored Jul 05, 2009



Cosmetic only. No logic is changed by this patch.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

70f93dae

KVM: Trace irq level and source id · ae8c1c40
Avi Kivity authored Jul 01, 2009
```
Signed-off-by: Avi Kivity <avi@redhat.com>
```
ae8c1c40

KVM: fix lock imbalance · 27c4ba60

Jiri Slaby authored Jun 29, 2009



There is a missing unlock on one fail path in ioapic_mmio_write,
fix that.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

27c4ba60