Skip to content
  1. Feb 11, 2009
  2. Feb 10, 2009
  3. Jan 29, 2009
    • Trent Piepho's avatar
      powerpc/fsl-booke: Make CAM entries used for lowmem configurable · 96051465
      Trent Piepho authored
      
      
      On booke processors, the code that maps low memory only uses up to three
      CAM entries, even though there are sixteen and nothing else uses them.
      
      Make this number configurable in the advanced options menu along with max
      low memory size.  If one wants 1 GB of lowmem, then it's typically
      necessary to have four CAM entries.
      
      Signed-off-by: default avatarTrent Piepho <tpiepho@freescale.com>
      Signed-off-by: default avatarKumar Gala <galak@kernel.crashing.org>
      96051465
    • Trent Piepho's avatar
      powerpc/fsl-booke: Allow larger CAM sizes than 256 MB · c8f3570b
      Trent Piepho authored
      
      
      The code that maps kernel low memory would only use page sizes up to 256
      MB.  On E500v2 pages up to 4 GB are supported.
      
      However, a page must be aligned to a multiple of the page's size.  I.e.
      256 MB pages must aligned to a 256 MB boundary.  This was enforced by a
      requirement that the physical and virtual addresses of the start of lowmem
      be aligned to 256 MB.  Clearly requiring 1GB or 4GB alignment to allow
      pages of that size isn't acceptable.
      
      To solve this, I simply have adjust_total_lowmem() take alignment into
      account when it decides what size pages to use.  Give it PAGE_OFFSET =
      0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map
      pages like this:
      PA 0x3000_0000 VA 0x7000_0000 Size 256 MB
      PA 0x4000_0000 VA 0x8000_0000 Size 1 GB
      PA 0x8000_0000 VA 0xC000_0000 Size 256 MB
      PA 0x9000_0000 VA 0xD000_0000 Size 256 MB
      PA 0xA000_0000 VA 0xE000_0000 Size 256 MB
      
      Because the lowmem mapping code now takes alignment into account,
      PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB.  Even lower might be
      possible.  The lowmem code will work down to 4 kB but it's possible some of
      the boot code will fail before then.  Poor alignment will force small pages
      to be used, which combined with the limited number of TLB1 pages available,
      will result in very little memory getting mapped.  So alignments less than
      64 MB probably aren't very useful anyway.
      
      Signed-off-by: default avatarTrent Piepho <tpiepho@freescale.com>
      Signed-off-by: default avatarKumar Gala <galak@kernel.crashing.org>
      c8f3570b
    • Trent Piepho's avatar
      powerpc/fsl-booke: Remove code duplication in lowmem mapping · f88747e7
      Trent Piepho authored
      
      
      The code to map lowmem uses three CAM aka TLB[1] entries to cover it.  The
      size of each is stored in three globals named __cam0, __cam1, and __cam2.
      All the code that uses them is duplicated three times for each of the three
      variables.
      
      We have these things called arrays and loops....
      
      Once converted to use an array, it will be easier to make the number of
      CAMs configurable.
      
      Signed-off-by: default avatarTrent Piepho <tpiepho@freescale.com>
      Signed-off-by: default avatarKumar Gala <galak@kernel.crashing.org>
      f88747e7
  4. Jan 28, 2009
  5. Jan 16, 2009
  6. Jan 13, 2009
  7. Jan 08, 2009
  8. Jan 07, 2009
  9. Jan 06, 2009
    • Gary Hade's avatar
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade authored
      
      
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      
      Signed-off-by: default avatarGary Hade <garyhade@us.ibm.com>
      Signed-off-by: default avatarBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
    • Mel Gorman's avatar
      mm: report the MMU pagesize in /proc/pid/smaps · 3340289d
      Mel Gorman authored
      
      
      The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
      kernel to back a VMA.  This matches the size used by the MMU in the
      majority of cases.  However, one counter-example occurs on PPC64 kernels
      whereby a kernel using 64K as a base pagesize may still use 4K pages for
      the MMU on older processor.  To distinguish, this patch reports
      MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.
      
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3340289d
  10. Dec 28, 2008
  11. Dec 23, 2008
  12. Dec 21, 2008
  13. Dec 16, 2008
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Remove flush_HPTE() · f63837f0
      Benjamin Herrenschmidt authored
      
      
      The function flush_HPTE() is used in only one place, the implementation
      of DEBUG_PAGEALLOC on ppc32.
      
      It's actually a dup of flush_tlb_page() though it's -slightly- more
      efficient on hash based processors.  We remove it and replace it by
      a direct call to the hash flush code on those processors and to
      flush_tlb_page() for everybody else.
      
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      f63837f0
    • Benjamin Herrenschmidt's avatar
      powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c · e41e811a
      Benjamin Herrenschmidt authored
      
      
      This renames the files to clarify the fact that they are used by
      the hash based family of CPUs (the 603 being an exception in that
      family but is still handled by that code).
      
      This paves the way for the new tlb_nohash.c coming via a subsequent
      commit.
      
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      e41e811a
    • Dave Hansen's avatar
      powerpc: Fix bootmem reservation on uninitialized node · a4c74ddd
      Dave Hansen authored
      careful_allocation() was calling into the bootmem allocator for
      nodes which had not been fully initialized and caused a previous
      bug:  http://patchwork.ozlabs.org/patch/10528/
      
        So, I merged a
      few broken out loops in do_init_bootmem() to fix it.  That changed
      the code ordering.
      
      I think this bug is triggered by having reserved areas for a node
      which are spanned by another node's contents.  In the
      mark_reserved_regions_for_nid() code, we attempt to reserve the
      area for a node before we have allocated the NODE_DATA() for that
      nid.  We do this since I reordered that loop.  I suck.
      
      This is causing crashes at bootup on some systems, as reported
      by Jon Tollefson.
      
      This may only present on some systems that have 16GB pages
      reserved.  But, it can probably happen on any system that is
      trying to reserve large swaths of memory that happen to span other
      nodes' contents.
      
      This commit ensures that we do not touch bootmem for any node which
      has not been initialized, and also removes a compile warning about
      an unused variable.
      
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      a4c74ddd
    • Brian King's avatar
      powerpc: Check for valid hugepage size in hugetlb_get_unmapped_area · 48f797de
      Brian King authored
      
      
      It looks like most of the hugetlb code is doing the correct thing if
      hugepages are not supported, but the mmap code is not.  If we get into
      the mmap code when hugepages are not supported, such as in an LPAR
      which is running Active Memory Sharing, we can oops the kernel.  This
      fixes the oops being seen in this path.
      
      oops: Kernel access of bad area, sig: 11 [#1]
      SMP NR_CPUS=1024 NUMA pSeries
      Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N)
      dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N)
      scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N)
      Supported: No
      NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0
      REGS: c000000077e7b830 TRAP: 0300   Tainted: G
      (2.6.27.5-bz50170-2-ppc64)
      MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44000448  XER: 20000001
      DAR: c000002000af90a8, DSISR: 0000000040000000
      TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6
      GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000
      GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001
      GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000
      GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000
      GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001
      GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5
      GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000
      NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0
      LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
      Call Trace:
      [c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80
      [c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8
      [c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420
      [c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140
      [c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40
      Instruction dump:
      fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0
      fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0
      
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      48f797de
  14. Dec 03, 2008
  15. Nov 30, 2008
    • Dave Hansen's avatar
      powerpc: Fix boot freeze on machine with empty memory node · 4a618669
      Dave Hansen authored
      
      
      I got a bug report about a distro kernel not booting on a particular
      machine.  It would freeze during boot:
      
      > ...
      > Could not find start_pfn for node 1
      > [boot]0015 Setup Done
      > Built 2 zonelists in Node order, mobility grouping on.  Total pages: 123783
      > Policy zone: DMA
      > Kernel command line:
      > [boot]0020 XICS Init
      > [boot]0021 XICS Done
      > PID hash table entries: 4096 (order: 12, 32768 bytes)
      > clocksource: timebase mult[7d0000] shift[22] registered
      > Console: colour dummy device 80x25
      > console handover: boot [udbg0] -> real [hvc0]
      > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes)
      > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes)
      > freeing bootmem node 0
      
      I've reproduced this on 2.6.27.7.  It is caused by commit
      8f64e1f2 ("powerpc: Reserve in bootmem
      lmb reserved regions that cross NUMA nodes").
      
      The problem is that Jon took a loop which was (in pseudocode):
      
      	for_each_node(nid)
      		NODE_DATA(nid) = careful_alloc(nid);
      		setup_bootmem(nid);
      		reserve_node_bootmem(nid);
      
      and broke it up into:
      
      	for_each_node(nid)
      		NODE_DATA(nid) = careful_alloc(nid);
      		setup_bootmem(nid);
      	for_each_node(nid)
      		reserve_node_bootmem(nid);
      
      The issue comes in when the 'careful_alloc()' is called on a node with
      no memory.  It falls back to using bootmem from a previously-initialized
      node.  But, bootmem has not yet been reserved when Jon's patch is
      applied.  It gives back bogus memory (0xc000000000000000) and pukes
      later in boot.
      
      The following patch collapses the loop back together.  It also breaks
      the mark_reserved_regions_for_nid() code out into a function and adds
      some comments.  I think a huge part of introducing this bug is because
      for loop was too long and hard to read.
      
      The actual bug fix here is the:
      
      +		if (end_pfn <= node->node_start_pfn ||
      +		    start_pfn >= node_end_pfn)
      +			continue;
      
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      4a618669
    • Al Viro's avatar
      powerpc set_huge_psize() false positive · 4ea8fb9c
      Al Viro authored
      
      
      called only from __init, calls __init.  Incidentally, it ought to be static
      in file.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ea8fb9c
  16. Nov 19, 2008
Loading