Commits · f9e0554deca54a42fb2cf7f68c05a4a37461c205 · jan.koester / Linux

Jul 12, 2011

KVM: PPC: Pass init/destroy vm and prepare/commit memory region ops down · f9e0554d

Paul Mackerras authored Jun 29, 2011



This arranges for the top-level arch/powerpc/kvm/powerpc.c file to
pass down some of the calls it gets to the lower-level subarchitecture
specific code.  The lower-level implementations (in booke.c and book3s.c)
are no-ops.  The coming book3s_hv.c will need this.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

f9e0554d

KVM: PPC: Deliver program interrupts right away instead of queueing them · 3cf658b6

Paul Mackerras authored Jun 29, 2011

Doing so means that we don't have to save the flags anywhere and gets
rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c.

Doing so is OK because a program interrupt won't be generated at the
same time as any other synchronous interrupt. If a program interrupt
and an asynchronous interrupt (external or decrementer) are generated
at the same time, the program interrupt will be delivered, which is
correct because it has a higher priority, and then the asynchronous
interrupt will be masked.

We don't ever generate system reset or machine check interrupts to the
guest, but if we did, then we would need to make sure they got delivered
rather than the program interrupt. The current code would be wrong in
this situation anyway since it would deliver the program interrupt as
well as the reset/machine check interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

3cf658b6

powerpc, KVM: Rework KVM checks in first-level interrupt handlers · b01c8b54

Paul Mackerras authored Jun 29, 2011



Instead of branching out-of-line with the DO_KVM macro to check if we
are in a KVM guest at the time of an interrupt, this moves the KVM
check inline in the first-level interrupt handlers.  This speeds up
the non-KVM case and makes sure that none of the interrupt handlers
are missing the check.

Because the first-level interrupt handlers are now larger, some things
had to be move out of line in exceptions-64s.S.

This all necessitated some minor changes to the interrupt entry code
in KVM.  This also streamlines the book3s_32 KVM test.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

b01c8b54

KVM: PPC: Split out code from book3s.c into book3s_pr.c · f05ed4d5

Paul Mackerras authored Jun 29, 2011



In preparation for adding code to enable KVM to use hypervisor mode
on 64-bit Book 3S processors, this splits book3s.c into two files,
book3s.c and book3s_pr.c, where book3s_pr.c contains the code that is
specific to running the guest in problem state (user mode) and book3s.c
contains code which should apply to all Book 3S processors.

In doing this, we abstract some details, namely the interrupt offset,
updating the interrupt pending flag, and detecting if the guest is
in a critical section.  These are all things that will be different
when we use hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

f05ed4d5

KVM: PPC: Move fields between struct kvm_vcpu_arch and kvmppc_vcpu_book3s · c4befc58

Paul Mackerras authored Jun 29, 2011

This moves the slb field, which represents the state of the emulated
SLB, from the kvmppc_vcpu_book3s struct to the kvm_vcpu_arch, and the
hpte_hash_[v]pte[_long] fields from kvm_vcpu_arch to kvmppc_vcpu_book3s.
This is in accord with the principle that the kvm_vcpu_arch struct
represents the state of the emulated CPU, and the kvmppc_vcpu_book3s
struct holds the auxiliary data structures used in the emulation.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

c4befc58

KVM: PPC: Fix machine checks on 32-bit Book3S · 149dbdb1

Paul Mackerras authored Jun 29, 2011



Commit 69acc0d3ba ("KVM: PPC: Resolve real-mode handlers through
function exports") resulted in vcpu->arch.trampoline_lowmem and
vcpu->arch.trampoline_enter ending up with kernel virtual addresses
rather than physical addresses.  This is OK on 64-bit Book3S machines,
which ignore the top 4 bits of the effective address in real mode,
but on 32-bit Book3S machines, accessing these addresses in real mode
causes machine check interrupts, as the hardware uses the whole
effective address as the physical address in real mode.

This fixes the problem by using __pa() to convert these addresses
to physical addresses.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>

149dbdb1

KVM: PPC: e500: Don't search over the entire TLB0. · 1aee47a0

Scott Wood authored Jun 14, 2011



Only look in the 4 entries that could possibly contain the
entry we're looking for.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

1aee47a0

KVM: PPC: e500: Add shadow PID support · dd9ebf1f

Liu Yu authored Jun 14, 2011



Dynamically assign host PIDs to guest PIDs, splitting each guest PID into
multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS].  Use
both PID0 and PID1 so that the shadow PIDs for the right mode can be
selected, that correspond both to guest TID = zero and guest TID = guest
PID.

This allows us to significantly reduce the frequency of needing to
invalidate the entire TLB.  When the guest mode or PID changes, we just
update the host PID0/PID1.  And since the allocation of shadow PIDs is
global, multiple guests can share the TLB without conflict.

Note that KVM does not yet support the guest setting PID1 or PID2 to
a value other than zero.  This will need to be fixed for nested KVM
to work.  Until then, we enforce the requirement for guest PID1/PID2
to stay zero by failing the emulation if the guest tries to set them
to something else.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

dd9ebf1f

KVM: PPC: e500: Stop keeping shadow TLB · 08b7fa92

Liu Yu authored Jun 14, 2011



Instead of a fully separate set of TLB entries, keep just the
pfn and dirty status.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

08b7fa92

KVM: PPC: e500: enable magic page · a4cd8b23

Scott Wood authored Jun 14, 2011



This is a shared page used for paravirtualization.  It is always present
in the guest kernel's effective address space at the address indicated
by the hypercall that enables it.

The physical address specified by the hypercall is not used, as
e500 does not have real mode.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

a4cd8b23

KVM: PPC: e500: Support large page mappings of PFNMAP vmas. · 9973d54e

Scott Wood authored Jun 14, 2011



This allows large pages to be used on guest mappings backed by things like
/dev/mem, resulting in a significant speedup when guest memory
is mapped this way (it's useful for directly-assigned MMIO, too).

This is not a substitute for hugetlbfs integration, but is useful for
configurations where devices are directly assigned on chips without an
IOMMU -- in these cases, we need guest physical and true physical to
match, and be contiguous, so static reservation and mapping via /dev/mem
is the most straightforward way to set things up.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

9973d54e

KVM: PPC: e500: Eliminate shadow_pages[], and use pfns instead. · 59c1f4e3

Scott Wood authored Jun 14, 2011



This is in line with what other architectures do, and will allow us to
map things other than ordinary, unreserved kernel pages -- such as
dedicated devices, or large contiguous reserved regions.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

59c1f4e3

KVM: PPC: e500: don't use MAS0 as intermediate storage. · 0ef30995

Scott Wood authored Jun 14, 2011



This avoids races.  It also means that we use the shadow TLB way,
rather than the hardware hint -- if this is a problem, we could do
a tlbsx before inserting a TLB0 entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

0ef30995

KVM: PPC: e500: Disable preloading TLB1 in tlb_load(). · 6fc4d1eb

Scott Wood authored Jun 14, 2011



Since TLB1 loading doesn't check the shadow TLB before allocating another
entry, you can get duplicates.

Once shadow PIDs are enabled in a later patch, we won't need to
invalidate the TLB on every switch, so this optimization won't be
needed anyway.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

6fc4d1eb

KVM: PPC: e500: Save/restore SPE state · 4cd35f67

Scott Wood authored Jun 14, 2011



This is done lazily.  The SPE save will be done only if the guest has
used SPE since the last preemption or heavyweight exit.  Restore will be
done only on demand, when enabling MSR_SPE in the shadow MSR, in response
to an SPE fault or mtmsr emulation.

For SPEFSCR, Linux already switches it on context switch (non-lazily), so
the only remaining bit is to save it between qemu and the guest.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

4cd35f67

KVM: PPC: booke: use shadow_msr · ecee273f

Scott Wood authored Jun 14, 2011



Keep the guest MSR and the guest-mode true MSR separate, rather than
modifying the guest MSR on each guest entry to produce a true MSR.

Any bits which should be modified based on guest MSR must be explicitly
propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in
kvmppc_set_msr().

While we're modifying the guest entry code, reorder a few instructions
to bury some load latencies.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

ecee273f

powerpc/e500: SPE register saving: take arbitrary struct offset · c51584d5

Scott Wood authored Jun 14, 2011



Previously, these macros hardcoded THREAD_EVR0 as the base of the save
area, relative to the base register passed.  This base offset is now
passed as a separate macro parameter, allowing reuse with other SPE
save areas, such as used by KVM.

Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

c51584d5

powerpc/e500: Save SPEFCSR in flush_spe_to_thread() · 685659ee

yu liu authored Jun 14, 2011



giveup_spe() saves the SPE state which is protected by MSR[SPE].
However, modifying SPEFSCR does not trap when MSR[SPE]=0.
And since SPEFSCR is already saved/restored in _switch(),
not all the callers want to save SPEFSCR again.
Thus, saving SPEFSCR should not belong to giveup_spe().

This patch moves SPEFSCR saving to flush_spe_to_thread(),
and cleans up the caller that needs to save SPEFSCR accordingly.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

685659ee

KVM: PPC: Resolve real-mode handlers through function exports · a22a2dac

Alexander Graf authored Jun 07, 2011



Up until now, Book3S KVM had variables stored in the kernel that a kernel module
or the kvm code in the kernel could read from to figure out where some real mode
helper functions are located.

This is all unnecessary. The high bits of the EA get ignore in real mode, so we
can just use the pointer as is. Also, it's a lot easier on relocations when we
use the normal way of resolving the address to a function, instead of jumping
through hoops.

This patch fixes compilation with CONFIG_RELOCATABLE=y.

Signed-off-by: Alexander Graf <agraf@suse.de>

a22a2dac

KVM: PPC: fix partial application of "exit timing in ticks" · 24294b9a

Stuart Yoder authored May 17, 2011

When http://www.spinics.net/lists/kvm-ppc/msg02664.html


was applied to produce commit b51e7aa7ed6d8d134d02df78300ab0f91cfff4d2,
the removal of the conversion in add_exit_timing was left out.

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

24294b9a

Jun 29, 2011

arch/powerpc: use printk_ratelimited instead of printk_ratelimit · 76462232

Christian Dietrich authored Jun 04, 2011

Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited.

Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

76462232

powerpc/rtas-rtc: remove sideeffects of printk_ratelimit · 9a8f99fa

Christian Dietrich authored Jun 04, 2011



Don't use printk_ratelimit() as an additional condition for returning
on an error. Because when the ratelimit is reached, printk_ratelimit
will return 0 and e.g. in rtas_get_boot_time won't check for an error
condition.

Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

9a8f99fa

Jun 28, 2011

powerpc/pseries: remove duplicate SCSI_BNX2_ISCSI in pseries_defconfig · 937c190c

Michael Neuling authored Jun 27, 2011



Remove duplicate assignment of SCSI_BNX2_ISCSI in pseries_defconfig
introduced by:
  37e0c21e powerpc/pseries: Enable iSCSI support for a number of cards

causes warning:
arch/powerpc/configs/pseries_defconfig:151:warning: override: reassigning to symbol SCSI_BNX2_ISCSI

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

937c190c

Jun 27, 2011

Fix node_start/end_pfn() definition for mm/page_cgroup.c · c6830c22

KAMEZAWA Hiroyuki authored Jun 16, 2011



commit 21a3c964 uses node_start/end_pfn(nid) for detection start/end
of nodes. But, it's not defined in linux/mmzone.h but defined in
/arch/???/include/mmzone.h which is included only under
CONFIG_NEED_MULTIPLE_NODES=y.

Then, we see
  mm/page_cgroup.c: In function 'page_cgroup_init':
  mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
  mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'

So, fixiing page_cgroup.c is an idea...

But node_start_pfn()/node_end_pfn() is a very generic macro and
should be implemented in the same manner for all archs.
(m32r has different implementation...)

This patch removes definitions of node_start/end_pfn() in each archs
and defines a unified one in linux/mmzone.h. It's not under
CONFIG_NEED_MULTIPLE_NODES, now.

A result of macro expansion is here (mm/page_cgroup.c)

for !NUMA
 start_pfn = ((&contig_page_data)->node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});

for NUMA (x86-64)
  start_pfn = ((node_data[nid])->node_start_pfn);
  end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});

Changelog:
 - fixed to avoid using "nid" twice in node_end_pfn() macro.

Reported-and-acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c6830c22

Jun 22, 2011

powerpc/e500: fix breakage with fsl_rio_mcheck_exception · 82a9a480

Scott Wood authored Jun 16, 2011



The wrong MCSR bit was being used on e500mc.  MCSR_BUS_RBERR only exists
on e500v1/v2.  Use MCSR_LD on e500mc, and remove all MCSR checking
in fsl_rio_mcheck_exception as we now no longer call that function
if the appropriate bit in MCSR is not set.

If RIO support was enabled at compile-time, but was never probed, just
return from fsl_rio_mcheck_exception rather than dereference a NULL
pointer.

TODO: There is still a remaining, though comparitively minor, issue in
that this recovery mechanism will falsely engage if there's an unrelated
MCSR_LD event at the same time as a RIO error.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

82a9a480

powerpc/p1022ds: fix audio-related properties in the device tree · f3fed682

Timur Tabi authored Jun 08, 2011



On the Freescale P1022DS reference board, the SSI audio controller is
connected in "asynchronous" mode to the codec's clocks, so the device tree
needs an "fsl,ssi-asynchronous" property.

Also remove the clock-frequency property from the wm8776 node, because
the clock is enabled only if U-Boot enables it, and U-Boot will set the
property if the clock is enabled.  A future version of the P1022DS audio
driver will configure the clock itself, but for now, the driver should
not be told that the clock is running when it isn't.

Also fix the FIFO depth to 15, instead of 16.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

f3fed682

Jun 16, 2011

rtc: fix build warnings in defconfigs · c7cbb022

Wanlong Gao authored Jun 15, 2011



RTC_CLASS is changed to bool, so 'm' is invalid.

Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c7cbb022

Jun 09, 2011

powerpc: Force page alignment for initrd reserved memory · 307cfe71

Benjamin Herrenschmidt authored Jun 09, 2011



When using 64K pages with a separate cpio rootfs, U-Boot will align
the rootfs on a 4K page boundary. When the memory is reserved, and
subsequent early memblock_alloc is called, it will allocate memory
between the 64K page alignment and reserved memory. When the reserved
memory is subsequently freed, it is done so by pages, causing the
early memblock_alloc requests to be re-used, which in my case, caused
the device-tree to be clobbered.

This patch forces the reserved memory for initrd to be kernel page
aligned, and will move the device tree if it overlaps with the range
extension of initrd. This patch will also consolidate the identical
function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c,
and adds the same range extension when freeing initrd. free_initrd_mem()
is also moved to the __init section.

Many thanks to Milton Miller for his input on this patch.

[BenH: Fixed build without CONFIG_BLK_DEV_INITRD]

Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

307cfe71

dtc/powerpc: remove obsolete .gitignore entries · c49f8789

Wolfram Sang authored Mar 12, 2011



dtc was moved and .gitignores have been added to the new location. So, we can
delete the old, forgotten ones.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>

c49f8789

Jun 03, 2011
- powerpc/85xx: fix race bug of calling request_irq after enable elbc interrupts · 704102a6
  Shaohui Xie authored Jun 03, 2011
  
  Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
  704102a6
Jun 02, 2011

powerpc/book3e: Fix CPU feature handling on e5500 in 32-bit mode · fb9be234

Kumar Gala authored Jun 02, 2011

We are missing FPU feature bit that user space may require. In the
64-bit mode this gets set since we pull it in via COMMON_USER_PPC64. We
just explicitly set it so user space will be happy again.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

fb9be234

powerpc/fsl_rio: Fix compile error when CONFIG_FSL_RIO not set · a3623239

Kumar Gala authored Jun 02, 2011

arch/powerpc/kernel/built-in.o: In function `machine_check_e500mc':
arch/powerpc/kernel/traps.c:429: undefined reference to `fsl_rio_mcheck_exception'
arch/powerpc/kernel/built-in.o: In function `machine_check_e500':
arch/powerpc/kernel/traps.c:519: undefined reference to `fsl_rio_mcheck_exception'
make: *** [.tmp_vmlinux1] Error 1

Reported-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

a3623239

May 31, 2011

powerpc/pmac: Don't register pmac PIC syscore ops when HW not present · 339dedf7

Benjamin Herrenschmidt authored May 31, 2011



The Apple custom PIC only exist in some earlier machine models,
anything with an MPIC will crash on suspend if we register those
syscore ops unconditionally.

This is a regression caused by commit f5a592f7 ("PM / PowerPC: Use
struct syscore_ops instead of sysdevs for PM")

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

339dedf7

May 28, 2011

ns: Wire up the setns system call · 7b21fddd

Eric W. Biederman authored May 27, 2011



32bit and 64bit on x86 are tested and working.  The rest I have looked
at closely and I can't find any problems.

setns is an easy system call to wire up.  It just takes two ints so I
don't expect any weird architecture porting problems.

While doing this I have noticed that we have some architectures that are
very slow to get new system calls.  cris seems to be the slowest where
the last system calls wired up were preadv and pwritev.  avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h.  frv is
behind with perf_event_open being the last syscall wired up.  On h8300
the last system call wired up was epoll_wait.  On m32r the last system
call wired up was fallocate.  mn10300 has recvmmsg as the last system
call wired up.  The rest seem to at least have syncfs wired up which was
new in the 2.6.39.

v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall  conflicts.
v7: ported to Linus's latest post 2.6.39 tree.

>  arch/blackfin/include/asm/unistd.h     |    3 ++-
>  arch/blackfin/mach-common/entry.S      |    1 +
Acked-by: Mike Frysinger <vapier@gentoo.org>

Oh - ia64 wiring looks good.
Acked-by: Tony Luck <tony.luck@intel.com>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7b21fddd

May 27, 2011

arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT} · 63e424c8

Akinobu Mita authored May 26, 2011

By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
to test for existence of find bitops anymore.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

63e424c8

cgroup: remove the ns_cgroup · a77aea92

Daniel Lezcano authored May 26, 2011

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

  * cgroup creation is out-of-control
  * cgroup name can conflict when pids are looping
  * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
  * we may want to create a namespace without creating a cgroup

  The ns_cgroup was replaced by a compatibility flag 'clone_children',
  where a newly created cgroup will copy the parent cgroup values.
  The userspace has to manually create a cgroup and add a task to
  the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html



The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change.  Commit 45531757 ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal.  Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

a77aea92

May 26, 2011

powerpc/4xx: Adding PCIe MSI support · 3fb79338

Rupjyoti Sarmah authored Mar 29, 2011



This patch adds MSI support for 440SPe, 460Ex, 460Sx and 405Ex.

Signed-off-by: Rupjyoti Sarmah <rsarmah@apm.com>
Signed-off-by: Tirumala R Marri <tmarri@apm.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

3fb79338

powerpc: Fix irq_free_virt by adjusting bounds before loop · 4dd60290

Milton Miller authored May 24, 2011



Instead of looping over each irq and checking against the irq array
bounds, adjust the bounds before looping.

The old code will not free any irq if the irq + count is above
irq_virq_count because the test in the loop is testing irq + count
instead of irq + i.

This code checks the limits to avoid unsigned integer overflows.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

4dd60290

powerpc/irq: Protect irq_radix_revmap_lookup against irq_free_virt · 9b788251

Milton Miller authored May 24, 2011



The radix-tree code uses call_rcu when freeing internal elements.
We must protect against the elements being freed while we traverse
the tree, even if the returned pointer will still be valid.

While preparing a patch to expand the context in which
irq_radix_revmap_lookup will be called, I realized that the
radix tree was not locked.

When asked

    For a normal call_rcu usage, is it allowed to read the structure in
    irq_enter / irq_exit, without additional rcu_read_lock?  Could an
    element freed with call_rcu advance with the cpu still between
    irq_enter/irq_exit (and irq_disabled())?

Paul McKenney replied:

    Absolutely illegal to do so. OK for call_rcu_sched(), but a
    flaming bug for call_rcu().

    And thank you very much for finding this!!!

Further analysis:

In the current CONFIG_TREE_RCU implementation. CONFIG_TREE_PREEMPT_RCU
(and CONFIG_TINY_PREEMPT_RCU) uses explicit counters.

These counters are reflected from per-CPU to global in the
scheduling-clock-interrupt handler, so disabling irq does prevent the
grace period from completing. But there are real-time implementations
(such as the one use by the Concurrent guys) where disabling irq
does -not- prevent the grace period from completing.

While an alternative fix would be to switch radix-tree to rcu_sched, I
don't want to audit the other users of radix trees (nor put alternative
freeing in the library).  The normal overhead for rcu_read_lock and
unlock are a local counter increment and decrement.

This does not show up in the rcu lockdep because in 2.6.34 commit
2676a58c (radix-tree: Disable RCU lockdep checking in radix tree)
deemed it too hard to pass the condition of the protecting lock
to the library.

Signed-off-by: Milton Miller <miltonm@bga.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

9b788251

powerpc/irq: Check desc in handle_one_irq and expand generic_handle_irq · 2e455257

Milton Miller authored May 24, 2011



Look up the descriptor and check that it is found in handle_one_irq
before checking if we are on the irq stack, and call the handler
directly using the descriptor if we are on the stack.

We need check irq_to_desc finds the descriptor to avoid a NULL
pointer dereference.  It could have failed because the number from
ppc_md.get_irq was above NR_IRQS, or various exceptional conditions
with sparse irqs (eg race conditions while freeing an irq if its was
not shutdown in the controller).

fe12bc2c (genirq: Uninline and sanity check generic_handle_irq())
moved generic_handle_irq out of line to allow its use by interrupt
controllers in modules.  However, handle_one_irq is core arch code.
It already knows the details of struct irq_desc and handling irqs in
the nested irq case.  This will avoid the extra stack frame to return
the value we don't check.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

2e455257