Skip to content
  1. Apr 28, 2008
  2. Apr 27, 2008
    • Christian Borntraeger's avatar
      s390: KVM preparation: host memory management changes for s390 kvm · 5b7baf05
      Christian Borntraeger authored
      
      
      This patch changes the s390 memory management defintions to use the pgste field
      for dirty and reference bit tracking of host and guest code. Usually on s390,
      dirty and referenced are tracked in storage keys, which belong to the physical
      page. This changes with virtualization: The guest and host dirty/reference bits
      are defined to be the logical OR of the values for the mapping and the physical
      page. This patch implements the necessary changes in pgtable.h for s390.
      
      There is a common code change in mm/rmap.c, the call to
      page_test_and_clear_young must be moved. This is a no-op for all
      architecture but s390. page_referenced checks the referenced bits for
      the physiscal page and for all mappings:
      o The physical page is checked with page_test_and_clear_young.
      o The mappings are checked with ptep_test_and_clear_young and friends.
      
      Without pgstes (the current implementation on Linux s390) the physical page
      check is implemented but the mapping callbacks are no-ops because dirty
      and referenced are not tracked in the s390 page tables. The pgstes introduces
      guest and host dirty and reference bits for s390 in the host mapping. These
      mapping must be checked before page_test_and_clear_young resets the reference
      bit.
      
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
      5b7baf05
  3. Apr 26, 2008
    • Yinghai Lu's avatar
      x86_64/mm: check and print vmemmap allocation continuous · c2b91e2e
      Yinghai Lu authored
      
      
      On big systems with lots of memory, don't print out too much during
      bootup, and make it easy to find if it is continuous.
      
      on 256G 8 sockets system will get
       [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
      [ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
       [ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
      [ffffe20038700000-ffffe200387fffff] potential offnode page_structs
       [ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
       [ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
       [ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
      [ffffe20054700000-ffffe200547fffff] potential offnode page_structs
       [ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
      [ffffe20070700000-ffffe200707fffff] potential offnode page_structs
       [ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
       [ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
       [ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
      [ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
       [ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
      [ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
       [ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
       [ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
       [ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
      [ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
       [ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
       [ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7
      
      instead of a very long print out...
      
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c2b91e2e
    • Yinghai Lu's avatar
      mm: allow reserve_bootmem() cross nodes · a5645a61
      Yinghai Lu authored
      
      
      split reserve_bootmem_core() into two functions, one which checks
      conflicts, and one which sets the bits.
      
      and make reserve_bootmem to loop bdata_list to cross the nodes.
      
      user could be crashkernel and ramdisk..., in case the range provided
      by those externalities crosses the nodes.
      
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a5645a61
    • Yinghai Lu's avatar
      mm: offset align in alloc_bootmem() · 9a2dc04c
      Yinghai Lu authored
      
      
      need offset alignment when node_boot_start's alignment is less than
      the alignment required.
      
      use local node_boot_start to match alignment - so don't add extra operation
      in search loop.
      
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9a2dc04c
    • Yinghai Lu's avatar
      mm: fix alloc_bootmem_core to use fast searching for all nodes · ad09315c
      Yinghai Lu authored
      
      
      Make the nodes other than node 0 use bdata->last_success for fast
      search too.
      
      We need to use __alloc_bootmem_core() for vmemmap allocation for other
      nodes when numa and sparsemem/vmemmap are enabled.
      
      Also, make fail_block path increase i with incr only after ALIGN
      to avoid extra increase when size is larger than align.
      
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ad09315c
    • Yinghai Lu's avatar
      mm: make mem_map allocation continuous · e123dd3f
      Yinghai Lu authored
      
      
      vmemmap allocation currently has this layout:
      
       [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
       [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0
       [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0
       [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0
       [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0
      ...
      
      note that there is a 2M hole between them - not optimal.
      
      the root cause is that usemap (24 bytes) will be allocated after every 2M
      mem_map, and it will push next vmemmap (2M) to the next (2M) alignment.
      
      solution: try to allocate the mem_map continously.
      
      after the patch, we get:
      
       [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0
       [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0
       [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0
       [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0
       [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0
      ...
      
      which is the ideal layout.
      
      and usemap will share a page because of they are allocated continuously too:
      
      sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24
      sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24
      sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24
      sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24
      ...
      
      so we make the bootmem allocation more compact and use less memory
      for usemap => mission accomplished ;-)
      
      Signed-off-by: default avatarYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e123dd3f
  4. Apr 23, 2008
  5. Apr 21, 2008
  6. Apr 20, 2008
  7. Apr 19, 2008
    • Mike Travis's avatar
      nodemask: use new node_to_cpumask_ptr function · c5f59f08
      Mike Travis authored
      
      
        * Use new node_to_cpumask_ptr.  This creates a pointer to the
          cpumask for a given node.  This definition is in mm patch:
      
      	asm-generic-add-node_to_cpumask_ptr-macro.patch
      
        * Use new set_cpus_allowed_ptr function.
      
      Depends on:
      	[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
      	[sched-devel]: sched: add new set_cpus_allowed_ptr function
      	[x86/latest]: x86: add cpus_scnprintf function
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Greg Banks <gnb@melbourne.sgi.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c5f59f08
    • Mike Travis's avatar
      cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer · f9a86fcb
      Mike Travis authored
      
      
        * Modify cpuset_cpus_allowed to return the currently allowed cpuset
          via a pointer argument instead of as the function return value.
      
        * Use new set_cpus_allowed_ptr function.
      
        * Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses.
      
      Depends on:
      	[sched-devel]: sched: add new set_cpus_allowed_ptr function
      
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f9a86fcb
    • Mike Travis's avatar
      cpumask: Cleanup more uses of CPU_MASK and NODE_MASK · d366f8cb
      Mike Travis authored
      
      
       *  Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
          NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
          and MAXNODES counts.
      
       *  In some cases, the cpumask variable was initialized but then overwritten
          with another value.  This is the case for changes like this:
      
          -       cpumask_t oldmask = CPU_MASK_ALL;
          +       cpumask_t oldmask;
      
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d366f8cb
  8. Apr 17, 2008
  9. Apr 16, 2008
    • KOSAKI Motohiro's avatar
      add "Isolate" migratetype name to /proc/pagetypeinfo · 91446b06
      KOSAKI Motohiro authored
      
      
      In a5d76b54 (memory unplug: page isolation by
      KAMEZAWA Hiroyuki), "isolate" migratetype added.  but unfortunately, it
      doesn't treat /proc/pagetypeinfo display logic.
      
      this patch add "Isolate" to pagetype name field.
      
      /proc/pagetype
      before:
      ------------------------------------------------------------------------------------------------------------------------
      Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
      Node    0, zone      DMA, type    Unmovable      1      2      2      2      1      2      2      1      1      0      0
      Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone      DMA, type      Movable      2      3      3      1      3      3      2      0      0      0      0
      Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
      Node    0, zone      DMA, type       <NULL>      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone   Normal, type    Unmovable      1      9      7      4      1      1      1      1      0      0      0
      Node    0, zone   Normal, type  Reclaimable      5      2      0      0      1      1      0      0      0      1      0
      Node    0, zone   Normal, type      Movable      0      1      1      0      0      0      1      0      0      1     60
      Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
      Node    0, zone   Normal, type       <NULL>      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone  HighMem, type    Unmovable      0      0      1      1      1      0      1      1      2      2      0
      Node    0, zone  HighMem, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone  HighMem, type      Movable    236     62      6      2      2      1      1      0      1      1     16
      Node    0, zone  HighMem, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
      Node    0, zone  HighMem, type       <NULL>      0      0      0      0      0      0      0      0      0      0      0
      
      Number of blocks type     Unmovable  Reclaimable      Movable      Reserve       <NULL>
      Node 0, zone      DMA            1            0            2       1            0
      Node 0, zone   Normal           10           40          169       1            0
      Node 0, zone  HighMem            2            0          283       1            0
      
      after:
      ------------------------------------------------------------------------------------------------------------------------
      Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
      Node    0, zone      DMA, type    Unmovable      1      2      2      2      1      2      2      1      1      0      0
      Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone      DMA, type      Movable      2      3      3      1      3      3      2      0      0      0      0
      Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
      Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone   Normal, type    Unmovable      0      2      1      1      0      1      0      0      0      0      0
      Node    0, zone   Normal, type  Reclaimable      1      1      1      1      1      0      1      1      1      0      0
      Node    0, zone   Normal, type      Movable      0      1      1      1      0      1      0      1      0      0    196
      Node    0, zone   Normal, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
      Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone  HighMem, type    Unmovable      0      1      0      0      0      1      1      1      2      2      0
      Node    0, zone  HighMem, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
      Node    0, zone  HighMem, type      Movable      1      0      1      1      0      0      0      0      1      0    200
      Node    0, zone  HighMem, type      Reserve      0      0      0      0      0      0      0      0      0      0      1
      Node    0, zone  HighMem, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
      
      Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate
      Node 0, zone      DMA            1            0            2       1            0
      Node 0, zone   Normal            8            4          207       1            0
      Node 0, zone  HighMem            2            0          283       1            0
      
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91446b06
    • Li Zefan's avatar
      memcg: fix oops in oom handling · e115f2d8
      Li Zefan authored
      
      
      When I used a test program to fork mass processes and immediately move them to
      a cgroup where the memory limit is low enough to trigger oom kill, I got oops:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000808
      IP: [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
      PGD 4c95f067 PUD 4406c067 PMD 0
      Oops: 0002 [1] SMP
      CPU 2
      Modules linked in:
      
      Pid: 11973, comm: a.out Not tainted 2.6.25-rc7 #5
      RIP: 0010:[<ffffffff8045c47f>]  [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
      RSP: 0018:ffff8100448c7c30  EFLAGS: 00010002
      RAX: 0000000000000202 RBX: 0000000000000009 RCX: 000000000001c9f3
      RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000808
      RBP: ffff81007e444080 R08: 0000000000000000 R09: ffff8100448c7900
      R10: ffff81000105f480 R11: 00000100ffffffff R12: ffff810067c84140
      R13: 0000000000000001 R14: ffff8100441d0018 R15: ffff81007da56200
      FS:  00007f70eb1856f0(0000) GS:ffff81007fbad3c0(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000808 CR3: 000000004498a000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process a.out (pid: 11973, threadinfo ffff8100448c6000, task ffff81007da533e0)
      Stack:  ffffffff8023ef5a 00000000000000d0 ffffffff80548dc0 00000000000000d0
       ffff810067c84140 ffff81007e444080 ffffffff8026cef9 00000000000000d0
       ffff8100441d0000 00000000000000d0 ffff8100441d0000 ffff8100505445c0
      Call Trace:
       [<ffffffff8023ef5a>] ? force_sig_info+0x25/0xb9
       [<ffffffff8026cef9>] ? oom_kill_task+0x77/0xe2
       [<ffffffff8026d696>] ? mem_cgroup_out_of_memory+0x55/0x67
       [<ffffffff802910ad>] ? mem_cgroup_charge_common+0xec/0x202
       [<ffffffff8027997b>] ? handle_mm_fault+0x24e/0x77f
       [<ffffffff8022c4af>] ? default_wake_function+0x0/0xe
       [<ffffffff8027a17a>] ? get_user_pages+0x2ce/0x3af
       [<ffffffff80290fee>] ? mem_cgroup_charge_common+0x2d/0x202
       [<ffffffff8027a441>] ? make_pages_present+0x8e/0xa4
       [<ffffffff8027d1ab>] ? mmap_region+0x373/0x429
       [<ffffffff8027d7eb>] ? do_mmap_pgoff+0x2ff/0x364
       [<ffffffff80210471>] ? sys_mmap+0xe5/0x111
       [<ffffffff8020bfc9>] ? tracesys+0xdc/0xe1
      
      Code: 00 00 01 48 8b 3c 24 e9 46 d4 dd ff f0 ff 07 48 8b 3c 24 e9 3a d4 dd ff fe 07 48 8b 3c 24 e9 2f d4 dd ff 9c 58 fa ba 00 01 00 00 <f0> 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c3 fa b8 00 01 00
      RIP  [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
       RSP <ffff8100448c7c30>
      CR2: 0000000000000808
      ---[ end trace c3702fa668021ea4 ]---
      
      It's reproducable in a x86_64 box, but doesn't happen in x86_32.
      
      This is because tsk->sighand is not guarded by RCU, so we have to
      hold tasklist_lock, just as what out_of_memory() does.
      
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Rientjes <rientjes@cs.washington.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e115f2d8
    • Ingo Molnar's avatar
      mm: sparsemem memory_present() fix · bead9a3a
      Ingo Molnar authored
      
      
      Fix memory corruption and crash on 32-bit x86 systems.
      
      If a !PAE x86 kernel is booted on a 32-bit system with more than 4GB of
      RAM, then we call memory_present() with a start/end that goes outside
      the scope of MAX_PHYSMEM_BITS.
      
      That causes this loop to happily walk over the limit of the sparse
      memory section map:
      
          for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
                      unsigned long section = pfn_to_section_nr(pfn);
                      struct mem_section *ms;
      
                      sparse_index_init(section, nid);
                      set_section_nid(section, nid);
      
                      ms = __nr_to_section(section);
                      if (!ms->section_mem_map)
                              ms->section_mem_map = sparse_encode_early_nid(nid) |
      			                                SECTION_MARKED_PRESENT;
      
      'ms' will be out of bounds and we'll corrupt a small amount of memory by
      encoding the node ID and writing SECTION_MARKED_PRESENT (==0x1) over it.
      
      The corruption might happen when encoding a non-zero node ID, or due to
      the SECTION_MARKED_PRESENT which is 0x1:
      
      	mmzone.h:#define	SECTION_MARKED_PRESENT	(1UL<<0)
      
      The fix is to sanity check anything the architecture passes to
      sparsemem.
      
      This bug seems to be rather old (as old as sparsemem support itself),
      but the exact incarnation depended on random details like configs, which
      made this bug more prominent in v2.6.25-to-be.
      
      An additional enhancement might be to print a warning about ignored or
      trimmed memory ranges.
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Tested-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Yinghai Lu <Yinghai.Lu@sun.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bead9a3a
  10. Apr 14, 2008
  11. Apr 09, 2008
  12. Apr 04, 2008
    • Balbir Singh's avatar
      memory controller: make memory resource control aware of boot options · 4077960e
      Balbir Singh authored
      
      
      A boot option for the memory controller was discussed on lkml.  It is a good
      idea to add it, since it saves memory for people who want to turn off the
      memory controller.
      
      By default the option is on for the following two reasons:
      
      1. It provides compatibility with the current scheme where the memory
         controller turns on if the config option is enabled
      2. It allows for wider testing of the memory controller, once the config
         option is enabled
      
      We still allow the create, destroy callbacks to succeed, since they are not
      aware of boot options.  We do not populate the directory will memory resource
      controller specific files.
      
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4077960e
  13. Apr 01, 2008
  14. Mar 30, 2008
  15. Mar 28, 2008
  16. Mar 26, 2008
    • Nishanth Aravamudan's avatar
      hugetlb: fix potential livelock in return_unused_surplus_hugepages() · 11320d17
      Nishanth Aravamudan authored
      
      
      Running the counters testcase from libhugetlbfs results in on 2.6.25-rc5
      and 2.6.25-rc5-mm1:
      
          BUG: soft lockup - CPU#3 stuck for 61s! [counters:10531]
          NIP: c0000000000d1f3c LR: c0000000000d1f2c CTR: c0000000001b5088
          REGS: c000005db12cb360 TRAP: 0901   Not tainted  (2.6.25-rc5-autokern1)
          MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 48008448  XER: 20000000
          TASK = c000005dbf3d6000[10531] 'counters' THREAD: c000005db12c8000 CPU: 3
          GPR00: 0000000000000004 c000005db12cb5e0 c000000000879228 0000000000000004
          GPR04: 0000000000000010 0000000000000000 0000000000200200 0000000000100100
          GPR08: c0000000008aba10 000000000000ffff 0000000000000004 0000000000000000
          GPR12: 0000000028000442 c000000000770080
          NIP [c0000000000d1f3c] .return_unused_surplus_pages+0x84/0x18c
          LR [c0000000000d1f2c] .return_unused_surplus_pages+0x74/0x18c
          Call Trace:
          [c000005db12cb5e0] [c000005db12cb670] 0xc000005db12cb670 (unreliable)
          [c000005db12cb670] [c0000000000d24c4] .hugetlb_acct_memory+0x2e0/0x354
          [c000005db12cb740] [c0000000001b5048] .truncate_hugepages+0x1d4/0x214
          [c000005db12cb890] [c0000000001b50a4] .hugetlbfs_delete_inode+0x1c/0x3c
          [c000005db12cb920] [c000000000103fd8] .generic_delete_inode+0xf8/0x1c0
          [c000005db12cb9b0] [c0000000001b5100] .hugetlbfs_drop_inode+0x3c/0x24c
          [c000005db12cba50] [c00000000010287c] .iput+0xdc/0xf8
          [c000005db12cbad0] [c0000000000fee54] .dentry_iput+0x12c/0x194
          [c000005db12cbb60] [c0000000000ff050] .d_kill+0x6c/0xa4
          [c000005db12cbbf0] [c0000000000ffb74] .dput+0x18c/0x1b0
          [c000005db12cbc70] [c0000000000e9e98] .__fput+0x1a4/0x1e8
          [c000005db12cbd10] [c0000000000e61ec] .filp_close+0xb8/0xe0
          [c000005db12cbda0] [c0000000000e62d0] .sys_close+0xbc/0x134
          [c000005db12cbe30] [c00000000000872c] syscall_exit+0x0/0x40
          Instruction dump:
          ebbe8038 38800010 e8bf0002 3bbd0008 7fa3eb78 38a50001 7ca507b4 4818df25
          60000000 38800010 38a00000 7c601b78 <7fa3eb78> 2f800010 409d0008 38000010
      
      This was tracked down to a potential livelock in
      return_unused_surplus_hugepages().  In the case where we have surplus
      pages on some node, but no free pages on the same node, we may never
      break out of the loop. To avoid this livelock, terminate the search if
      we iterate a number of times equal to the number of online nodes without
      freeing a page.
      
      Thanks to Andy Whitcroft and Adam Litke for helping with debugging and
      the patch.
      
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11320d17
    • Nishanth Aravamudan's avatar
      hugetlb: indicate surplus huge page counts in per-node meminfo · a1de0919
      Nishanth Aravamudan authored
      
      
      Currently we show the surplus hugetlb pool state in /proc/meminfo, but
      not in the per-node meminfo files, even though we track the information
      on a per-node basis. Printing it there can help track down dynamic pool
      bugs including the one in the follow-on patch.
      
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1de0919
    • Daniel Yeisley's avatar
      slab: fix cache_cache bootstrap in kmem_cache_init() · ec1f5eee
      Daniel Yeisley authored
      
      
      Commit 556a169d ("slab: fix bootstrap on
      memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
      but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
      patch fixes list_add() corruption in mm/slab.c seen on the ES7000.
      
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarDan Yeisley <dan.yeisley@unisys.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      ec1f5eee
Loading