Commits · 33b3f03dccc26c377e9689790ecc41079a0c9ca7 · jan.koester / Linux

Jul 30, 2008

powerpc: Correctly hookup PTRACE_GET/SETVSRREGS for 32 bit processes · 33b3f03d

Michael Neuling authored Jul 29, 2008

Fix bug where PTRACE_GET/SETVSRREGS are not connected for 32 bit processes.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

33b3f03d

powerpc: Allow non-hcall return values for lparcfg writes · 9ee07f91

Nathan Fontenot authored Jul 26, 2008



The code to handle writes to /proc/ppc64/lparcfg incorrectly
assumes that the return code from the helper routines to update
processor or memory entitlement return a hcall return value. It
then assumes any non-hcall return value is bad and sets the return
code for the write to be -EIO.

The update_[mp]pp routines can return values other than a hcall
return value. This patch removes the automatic setting of any
return code that is not an hcall return value from these routines
to -EIO.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

9ee07f91

Jul 29, 2008

powerpc/fsl: proliferate simple-bus compatibility to soc nodes · cf0d19fb

Kim Phillips authored Jul 29, 2008



add simple-bus compatible property to soc nodes for 83xx/85xx platforms
that were missing them.  Add same to platform probe code.

This fixes SoC device drivers (such as talitos) to succeed in matching
devices present in the soc node.

also update mpc836x_rdk dts to new SEC bindings (overlooked in commit
3fd44736: powerpc/fsl: update crypto node definition and device tree
instances).

Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

cf0d19fb

Jul 28, 2008

cpm2: Rework baud rate generators configuration to support external clocks. · dddb8d31

Laurent Pinchart authored Jul 22, 2008

The CPM2 BRG setup functions cpm_setbrg and cpm2_fastbrg don't support
external clocks. This patch adds a new exported __cpm2_setbrg function
that takes the clock rate and clock source as extra parameters, and moves
cpm_setbrg and cpm2_fastbrg to include/asm-powerpc/cpm2.h where they
become inline wrappers around __cpm2_setbrg.

Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

dddb8d31

powerpc: rtc_cmos_setup: assign interrupts only if there is i8259 PIC · e517881e

Anton Vorontsov authored Jun 12, 2008



i8259 PIC is disabled on MPC8610HPCD boards, thus currently rtc-cmos
driver fails to probe.

To fix the issue, we lookup the device tree for "chrp,iic" and
"pnpPNP,000" compatible devices, and if not found we do not assign RTC
IRQ and assuming that i8259 was disabled.

Though this patch fixes RTC on some boards (and surely should not break
any other), the whole approach is still broken. We can't easily fix this
though, because old device trees do not specify i8259 interrupts for the
cmos rtc node.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

e517881e

cpm_uart: Add generic clock API support to set baudrates · 80776554

Laurent Pinchart authored Jul 28, 2008

This patch introduces baudrate setting support via the generic clock API.
When present the optional device tree clock property is used instead of
fsl-cpm-brg. Platforms can then define complex clock schemes, to output
the serial clock on an external pin for instance.

Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

80776554

powerpc: implement GPIO LIB API on CPM1 Freescale SoC. · dc2380ec

Jochen Friedrich authored Jul 03, 2008



This patch implement GPIO LIB support for the CPM1 GPIOs.

Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

dc2380ec

cpm2: Implement GPIO LIB API on CPM2 Freescale SoC. · e193325e

Laurent Pinchart authored Jul 28, 2008



This patch implement GPIO LIB support for the CPM2 GPIOs. The code can
also be used for CPM1 GPIO port E, as both cores are compatible at the
register level.

Based on earlier work by Laurent Pinchart.

Signed-off-by: Jochen Friedrich <jochen@scram.de>
Cc: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

e193325e

powerpc: Disable 64K hugetlb support when doing 64K SPU mappings · 00df438e

Benjamin Herrenschmidt authored Jul 28, 2008



The 64K SPU local store mapping feature is incompatible with the
64K huge pages support due to the inability of some parts of
the memory management to differenciate between them while they
use a different page table format.

For now, disable 64K huge pages when CONFIG_SPU_FS_64K_LS,
in the long run, this can be fixed by making this feature use
the hugetlb page table format.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

00df438e

powerpc/powermac: Fixup default serial port device for pmac_zilog · 025d7917

Benjamin Herrenschmidt authored Jul 28, 2008

This removes the non-working code in legacy_serial that tried to handle
the powermac SCC ports, and instead add a (now working) function to the
powermac platform code to find the default serial console if any.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

025d7917

powerpc/powermac: Use sane default baudrate for SCC debugging · f023bf0f

Benjamin Herrenschmidt authored Jul 28, 2008

When using the "sccdbg" option to route early kernel messages and
xmon to the SCC serial port on PowerMacs, when this wasn't the
configured output port of Open Firmware, we initialize the baudrate
to 57600bps. This isn't a very good default on some powermacs where
both the FW and pmac_zilog will default to 38400. This fixes it to
use the same logic as pmac_zilog to pick a default speed.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

f023bf0f

powerpc: Show processor cache information in sysfs · 124c27d3

Nathan Lynch authored Jul 27, 2008



Collect cache information from the OF device tree and display it in
the cpu hierarchy in sysfs.  This is intended to be compatible at the
userspace level with x86's implementation[1], hence some of the funny
attribute names.  The arrangement of cache info is not immediately
intuitive, but (again) it's for compatibility's sake.

The cache attributes exposed are:

type (Data, Instruction, or Unified)
level (1, 2, 3...)
size
coherency_line_size
number_of_sets
ways_of_associativity

All of these can be derived on platforms that follow the OF PowerPC
Processor binding.  The code "publishes" only those attributes for
which it is able to determine values; attributes for values which
cannot be determined are not created at all.

[1] arch/x86/kernel/cpu/intel_cacheinfo.c

BenH: Turned some printk's into pr_debug, added better NULL checking
in a couple of places.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

124c27d3

powerpc: Make core id information available to userspace · e9efed3b

Nathan Lynch authored Jul 27, 2008

Existing Open Firmware practice is to report each processor core as a
separate node in the device tree. Report the value of the "reg" OF
property corresponding to a logical CPU's device node as the core_id
attribute in /sys/devices/system/cpu/cpu*/topology/core_id.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

e9efed3b

powerpc: Make core sibling information available to userspace · 440a0857

Nathan Lynch authored Jul 27, 2008



Implement the notion of "core siblings" for powerpc.  This makes
/sys/devices/system/cpu/cpu*/topology/core_siblings present sensible
values, indicating online CPUs which share an L2 cache.

BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead
of IBM-specific open coded search

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

440a0857

powerpc/vio: More fallout from dma_mapping_error API change · 0764bf63

Stephen Rothwell authored Jul 28, 2008

arch/powerpc/kernel/vio.c:533: error: too few arguments to function 'dma_mapping_error'

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

0764bf63

powerpc/pseries: Fix CMO sysdev attribute API change fallout · 3cee67f7

Stephen Rothwell authored Jul 28, 2008

Noticed due to these wanings:

arch/powerpc/platforms/pseries/cmm.c:298: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:299: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:320: warning: initialization from incompatible pointer type
arch/powerpc/platforms/pseries/cmm.c:320: warning: initialization from incompatible pointer type

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

3cee67f7

powerpc: Enable tracehook for the architecture · dec2b0d0

Roland McGrath authored Jul 27, 2008

The powerpc arch code has all the prerequisites, so set HAVE_ARCH_TRACEHOOK.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

dec2b0d0

powerpc: Add TIF_NOTIFY_RESUME support for tracehook · 7d6d637d

Roland McGrath authored Jul 27, 2008



This adds TIF_NOTIFY_RESUME support for powerpc.  When set,
we call tracehook_notify_resume() on the way to user mode.
This overloads do_signal() to do the work, but changes its
arguments to it has the TIF_* bits handy in a register and
drops the useless first argument that was always zero.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

7d6d637d

powerpc: Make syscall tracing use tracehook.h helpers · 4f72c427

Roland McGrath authored Jul 27, 2008



This changes powerpc syscall tracing to use the new tracehook.h entry
points.  There is no change, only cleanup.

In addition, the assembly changes allow do_syscall_trace_enter() to
abort the syscall without losing the information about the original
r0 value.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

4f72c427

powerpc: Call tracehook_signal_handler() when setting up signal frames · 6558ba2b

Roland McGrath authored Jul 27, 2008

This makes the powerpc signal handling code call tracehook_signal_handler()
after a handler is set up. This means that using PTRACE_SINGLESTEP to
enter a signal handler will report to ptrace on the first instruction of
the handler, instead of the second. This is consistent with what x86 and
other machines do, and what users and debuggers want.

BenH: Fixed up the test for the trap value.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

6558ba2b

powerpc: Update cpu_sibling_maps dynamically · e2075f79

Nathan Lynch authored Jul 27, 2008



Rather doing one initialization pass over all the per-cpu
cpu_sibling_maps at boot, update the maps at cpu online/offline time.

This is a behavior change -- the thread_siblings attribute now
reflects only online siblings, whereas it would display offline
siblings before.  The new behavior matches that of x86, and is
arguably more useful.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

e2075f79

powerpc: register_cpu_online should be __cpuinit · 9ba1984e

Nathan Lynch authored Jul 27, 2008



It is called only in cpu online paths.

(caught by CONFIG_DEBUG_SECTION_MISMATCH=y)

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

9ba1984e

powerpc: kill useless SMT code in prom_hold_cpus · 7d2f6075

Nathan Lynch authored Jul 27, 2008



This piece of code is broken for >2 threads, and possibly in some
other subtle ways (such as comparing a value obtained from an
"ibm,ppc-interrupt-server#s" property to a value obtained from a
"reg" property) and doesn't seem to have any useful purpose in the
first place other than a dubious warning in case NR_CPUS is too
small, which probably isn't the right place to do so.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

7d2f6075

powerpc: Fix vio build warnings · b9fa49a9

Nathan Lynch authored Jul 26, 2008

arch/powerpc/kernel/vio.c:1034: warning: function declaration isnât a prototype
arch/powerpc/kernel/vio.c:1035: warning: function declaration isnât a prototype

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

b9fa49a9

powerpc/booke: Clean up the hardware watchpoint support · 2325f0a0

Kumar Gala authored Jul 26, 2008



* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both
* Fixed a few comments
* Go back to only using DBCR0_IDM to determine if we are using
  debug resources.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

2325f0a0

powerpc: Removed duplicated include in stacktrace.c · d3b06023

Huang Weiyi authored Jul 24, 2008



Removed duplicated include file <linux/module.h> in
arch/powerpc/kernel/stacktrace.c.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

d3b06023

Jul 27, 2008

KVM: ppc: fix invalidation of large guest pages · cc04454f

Hollis Blanchard authored Jul 25, 2008



When guest invalidates a large tlb map, there may be more than one
corresponding shadow tlb maps that need to be invalidated. Use eaddr and eend
to find these shadow tlb maps.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

cc04454f

Jul 26, 2008

powerpc: use generic show_mem() · bda2fa53

Johannes Weiner authored Jul 25, 2008



Remove arch-specific show_mem() in favor of the generic version.

This also removes the following redundant information display:

	- pages in swapcache, printed by show_swap_cache_info()

where show_mem() calls show_free_areas(), which calls
show_swap_cache_info().

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bda2fa53

SL*B: drop kmem cache argument from constructor · 51cc5068

Alexey Dobriyan authored Jul 25, 2008



Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

51cc5068

kexec jump · 3ab83521

Huang Ying authored Jul 25, 2008

This patch provides an enhancement to kexec/kdump.  It implements the
following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off).  This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10



Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e

Implementation point:

To support jumping without reserving memory.  One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page).  When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped.  Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3ab83521

dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b

FUJITA Tomonori authored Jul 25, 2008

Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423

).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8d8bb39b

Jul 25, 2008

powerpc: Fix boot problem due to AT_BASE_PLATFORM change · fc532f81

Nathan Lynch authored Jul 25, 2008



Commit 9115d134 ("powerpc: Enable
AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we
have to use PTRRELOC to initialize powerpc_base_platform this early in
boot.

Bug reported by Jon Smirl.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

fc532f81

powerpc: clean up the Book-E HW watchpoint support · 0b21bb49

Kumar Gala authored Jul 25, 2008



* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both
* Fixed a few comments
* Go back to only using DBCR0_IDM to determine if we are using
  debug resources.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

0b21bb49

gpiolib: allow user-selection · 7444a72e

Michael Buesch authored Jul 25, 2008



This patch adds functionality to the gpio-lib subsystem to make it
possible to enable the gpio-lib code even if the architecture code didn't
request to get it built in.

The archtitecture code does still need to implement the gpiolib accessor
functions in its asm/gpio.h file.  This patch adds the implementations for
x86 and PPC.

With these changes it is possible to run generic GPIO expansion cards on
every architecture that implements the trivial wrapper functions.  Support
for more architectures can easily be added.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Brownell <david-b@pacbell.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Samuel Ortiz <sameo@openedhand.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7444a72e

kprobes: improve kretprobe scalability with hashed locking · ef53d9c5

Srinivasa D S authored Jul 25, 2008



Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table.  We have one
global kretprobe lock to serialise the access to these lists.  This causes
only one kretprobe handler to execute at a time.  Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).

Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.

Solution:

 1) Instead of having one global lock to protect kretprobe instances
    present in kretprobe object and kretprobe hash table.  We will have
    two locks, one lock for protecting kretprobe hash table and another
    lock for kretporbe object.

 2) We hold lock present in kretprobe object while we modify kretprobe
    instance in kretprobe object and we hold per-hash-list lock while
    modifying kretprobe instances present in that hash list.  To prevent
    deadlock, we never grab a per-hash-list lock while holding a kretprobe
    lock.

 3) We can remove used_instances from struct kretprobe, as we can
    track used instances of kretprobe instances using kretprobe hash
    table.

Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.

cacheline              non-cacheline             Un-patched kernel
aligned patch 	       aligned patch
===============================================================================
real    9m46.784s       9m54.412s                  10m2.450s
user    40m5.715s       40m7.142s                  40m4.273s
sys     2m57.754s       2m58.583s                  3m17.430s
===========================================================

Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real    9m26.389s
user    40m8.775s
sys     2m7.283s
=========================

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ef53d9c5

introduce HAVE_EFFICIENT_UNALIGNED_ACCESS Kconfig symbol · 58340a07

Johannes Berg authored Jul 25, 2008



In many cases, especially in networking, it can be beneficial to know at
compile time whether the architecture can do unaligned accesses efficiently.
This patch introduces a new Kconfig symbol

	HAVE_EFFICIENT_UNALIGNED_ACCESS

for that purpose and adds it to the powerpc and x86 architectures.  Also add
some documentation about alignment and networking, and especially one intended
use of this symbol.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu> [x86 architecture part]
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

58340a07

powerpc/pseries: Remove kmalloc call in handling writes to lparcfg · 16c14b46

Nathan Fontenot authored Jul 24, 2008



There are only 4 valid name=value pairs for writes to
/proc/ppc64/lparcfg.  Current code allocates a buffer to copy
this information in from the user.  Since the longest name=value
pair will easily fit into a buffer of 64 characters, simply
put the buffer on the stack instead of allocating the buffer.

Signed-off-by: Nathan Fotenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

16c14b46

powerpc/pseries: Update arch vector to indicate support for CMO · 8391e42a

Nathan Fontenot authored Jul 24, 2008



Update the architecture vector to indicate that Cooperative Memory
Overcommitment is supported if CONFIG_PPC_SMLPAR is set.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

8391e42a

powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O · 22e1a4dd

Nathan Fontenot authored Jul 24, 2008



Verify memory entitlement updates can be handled by vio.

Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

22e1a4dd

powerpc/pseries: vio bus support for CMO · a90ab95a

Robert Jennings authored Jul 24, 2008



This is a large patch but the normal code path is not affected.  For
non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
pSeries systems this does not affect the normal code path.  Devices that
do not perform DMA operations do not need modification with this patch.
The function get_desired_dma was renamed from get_io_entitlement for
clarity.

Overview

Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
to be run with less RAM than the aggregate needs of the group of
partitions.  The firmware will balance memory between the partitions
and page in/out memory as needed.  Based on the number and type of IO
adpaters preset each partition is allocated an amount of memory for
DMA operations and this allocation will be guaranteed to the partition;
this is referred to as the partition's 'entitlement'.

Partitions running in a CMO environment can only have virtual IO devices
present.  The VIO bus layer will manage the IO entitlement for the system.
Accounting, at a system and per-device level, is tracked in the VIO bus
code and exposed via sysfs.  A set of dma_ops functions are added to
the bus to allow for this accounting.

Bus initialization

At initialization, the bus will calculate the minimum needs of the system
based on providing each device present with a standard minimum entitlement
along with a spare allocation for the bus to handle hotplug events.
If the minimum needs can not be met the system boot will be halted.

Device changes

The significant changes for devices while running under CMO are that the
devices must specify how much dedicated IO entitlement they desire and
must also handle DMA mapping errors that can occur due to constrained
IO memory.  The virtual IO drivers are modified to silence errors when
DMA mappings fail for CMO and handle these failures gracefully.

Each devices will be guaranteed a minimum entitlement that can always
be mapped.  Devices will specify how much entitlement they desire and
the VIO bus will attempt to provide for this.  Devices can change their
desired entitlement level at any point in time to address particular needs
(via vio_cmo_set_dev_desired()), not just at device probe time.

VIO bus changes

The system will have a particular entitlement level available from which
it can provide memory to the devices.  The bus defines two pools of memory
within this entitlement, the reserved and excess pools.  Each device is
provided with it's own entitlement no less than a system defined minimum
entitlement and no greater than what the device has specified as it's
desired entitlement.  The entitlement provided to devices comes from the
reserve pool.  The reserve pool can also contain a spare allocation as
large as the system defined minimum entitlement which is used for device
hotplug events.  Any entitlement not needed to fulfill the needs of a
reserve pool is placed in the excess pool.  Each device is guaranteed
that it can map up to it's entitled level; additional mapping are possible
as long as there is unmapped memory in the excess pool.

Bus probe

As the system starts, each device is given an entitlement equal only
to the system defined minimum entitlement.  The reserve pool is equal
to the sum of these entitlements, plus a spare allocation.  The VIO bus
also tracks the aggregate desired entitlement of all the devices.  If the
system desired entitlement is greater than the size of the reserve pool,
when devices unmap IO memory it will be reserved and a balance operation
will be scheduled for some time in the future.

Entitlement balancing

The balance function tries to fairly distribute entitlement between the
devices in the system with the goal of providing each device with it's
desired amount of entitlement.  Devices using more than what would be
ideal will have their entitled set-point adjusted; this will effectively
set a goal for lower IO memory usage as future mappings can fail and
deallocations will trigger a balance operation to distribute the newly
unmapped memory.  A fair distribution of entitlement can take several
balance operations to achieve.  Entitlement changes and device DLPAR
events will alter the state of CMO and will trigger balance operations.

Hotplug events

The VIO bus allows for changes in system entitlement at run-time via
'vio_cmo_entitlement_update()'.  When devices are added the hotplug
device event will be preceded by a system entitlement increase and this
is reversed when devices are removed.

The following changes are made that the VIO bus layer for CMO:
 * add IO memory accounting per device structure.
 * add IO memory entitlement query function to driver structure.
 * during vio bus probe, if CMO is enabled, check that driver has
   memory entitlement query function defined.  Fail if function not defined.
 * fail to register driver if io entitlement function not defined.
 * create set of dma_ops at vio level for CMO that will track allocations
   and return DMA failures once entitlement is reached.  Entitlement will
   limited by overall system entitlement.  Devices will have a reserved
   quantity of memory that is guaranteed, the rest can be used as available.
 * expose entitlement, current allocation, desired allocation, and the
   allocation error counter for devices to the user through sysfs
 * provide mechanism for changing a device's desired entitlement at run time
   for devices as an exported function and sysfs tunable
 * track any DMA failures for entitled IO memory for each vio device.
 * check entitlement against available system entitlement on device add
 * track entitlement metrics (high water mark, current usage)
 * provide function to reset high water mark
 * provide minimum and desired entitlement numbers at a bus level
 * provide drivers with a minimum guaranteed entitlement
 * balance available entitlement between devices to satisfy their needs
 * handle system entitlement changes and device hotplug

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

a90ab95a