Skip to content
Snippets Groups Projects
  1. Sep 10, 2010
  2. Sep 09, 2010
    • Tejun Heo's avatar
      percpu: fix build breakage on s390 and cleanup build configuration tests · 3c9a024f
      Tejun Heo authored
      
      Commit bbddff05 (percpu: use percpu allocator on UP too) incorrectly
      excluded pcpu_build_alloc_info() on SMP configurations which use
      generic setup_per_cpu_area() like s390.  The config ifdefs are
      becoming confusing.  Fix and clean it up by,
      
      * Move pcpu_build_alloc_info() right on top of its two users -
        pcpu_{embed|page}_first_chunk() which are already in CONFIG_SMP
        block.
      
      * Define BUILD_{EMBED|PAGE}_FIRST_CHUNK which indicate whether each
        first chunk function needs to be included and use them to control
        inclusion of the three functions to reduce confusion.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarSachin Sant <sachinp@in.ibm.com>
      3c9a024f
  3. Sep 08, 2010
    • Tejun Heo's avatar
      percpu: use percpu allocator on UP too · bbddff05
      Tejun Heo authored
      
      On UP, percpu allocations were redirected to kmalloc.  This has the
      following problems.
      
      * For certain amount of allocations (determined by
        PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
        allocator can be used before the usual kernel memory allocator is
        brought online.  On SMP, this is used to initialize the kernel
        memory allocator.
      
      * percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
        doesn't.  For example, workqueue makes use of larger alignments for
        cpu_workqueues.
      
      Currently, users of percpu allocators need to handle UP differently,
      which is somewhat fragile and ugly.  Other than small amount of
      memory, there isn't much to lose by enabling percpu allocator on UP.
      It can simply use kernel memory based chunk allocation which was added
      for SMP archs w/o MMUs.
      
      This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
      makes UP build use percpu-km.  As percpu addresses and kernel
      addresses are always identity mapped and static percpu variables don't
      need any special treatment, nothing is arch dependent and mm/percpu.c
      implements generic setup_per_cpu_areas() for UP.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Acked-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      bbddff05
    • Tejun Heo's avatar
      vmalloc: pcpu_get/free_vm_areas() aren't needed on UP · 4f8b02b4
      Tejun Heo authored
      
      These functions are used only by percpu memory allocator on SMP.
      Don't build them on UP.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Reviewed-by: default avatarChrsitoph Lameter <cl@linux.com>
      4f8b02b4
  4. Aug 28, 2010
    • Hugh Dickins's avatar
      mm: fix hang on anon_vma->root->lock · f1819427
      Hugh Dickins authored
      
      After several hours, kbuild tests hang with anon_vma_prepare() spinning on
      a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
      (which makes this very much more likely, but it could happen without).
      
      The ever-subtle page_lock_anon_vma() now needs a further twist: since
      anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
      of a reused anon_vma structure at any moment, page_lock_anon_vma()
      needs to check page_mapped() again before succeeding, otherwise
      page_unlock_anon_vma() might address a different root->lock.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1819427
  5. Aug 27, 2010
  6. Aug 24, 2010
    • Tony Luck's avatar
      guard page for stacks that grow upwards · 8ca3eb08
      Tony Luck authored
      
      pa-risc and ia64 have stacks that grow upwards. Check that
      they do not run into other mappings. By making VM_GROWSUP
      0x0 on architectures that do not ever use it, we can avoid
      some unpleasant #ifdefs in check_stack_guard_page().
      
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ca3eb08
    • Dave Chinner's avatar
      writeback: write_cache_pages doesn't terminate at nr_to_write <= 0 · 546a1924
      Dave Chinner authored
      
      I noticed XFS writeback in 2.6.36-rc1 was much slower than it should have
      been. Enabling writeback tracing showed:
      
          flush-253:16-8516  [007] 1342952.351608: wbc_writepage: bdi 253:16: towrt=1024 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [007] 1342952.351654: wbc_writepage: bdi 253:16: towrt=1023 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369520: wbc_writepage: bdi 253:16: towrt=0 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369542: wbc_writepage: bdi 253:16: towrt=-1 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369549: wbc_writepage: bdi 253:16: towrt=-2 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
      
      Writeback is not terminating in background writeback if ->writepage is
      returning with wbc->nr_to_write == 0, resulting in sub-optimal single page
      writeback on XFS.
      
      Fix the write_cache_pages loop to terminate correctly when this situation
      occurs and so prevent this sub-optimal background writeback pattern. This
      improves sustained sequential buffered write performance from around
      250MB/s to 750MB/s for a 100GB file on an XFS filesystem on my 8p test VM.
      
      Cc:<stable@kernel.org>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      546a1924
  7. Aug 22, 2010
  8. Aug 21, 2010
  9. Aug 20, 2010
  10. Aug 18, 2010
  11. Aug 15, 2010
    • Linus Torvalds's avatar
      mm: fix up some user-visible effects of the stack guard page · d7824370
      Linus Torvalds authored
      
      This commit makes the stack guard page somewhat less visible to user
      space. It does this by:
      
       - not showing the guard page in /proc/<pid>/maps
      
         It looks like lvm-tools will actually read /proc/self/maps to figure
         out where all its mappings are, and effectively do a specialized
         "mlockall()" in user space.  By not showing the guard page as part of
         the mapping (by just adding PAGE_SIZE to the start for grows-up
         pages), lvm-tools ends up not being aware of it.
      
       - by also teaching the _real_ mlock() functionality not to try to lock
         the guard page.
      
         That would just expand the mapping down to create a new guard page,
         so there really is no point in trying to lock it in place.
      
      It would perhaps be nice to show the guard page specially in
      /proc/<pid>/maps (or at least mark grow-down segments some way), but
      let's not open ourselves up to more breakage by user space from programs
      that depends on the exact deails of the 'maps' file.
      
      Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools
      source code to see what was going on with the whole new warning.
      
      Reported-and-tested-by: default avatarFrançois Valenduc <francois.valenduc@tvcablenet.be>
      Reported-by: default avatarHenrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d7824370
  12. Aug 14, 2010
  13. Aug 13, 2010
  14. Aug 12, 2010
    • Wu Fengguang's avatar
      writeback: add comment to the dirty limit functions · 1babe183
      Wu Fengguang authored
      
      Document global_dirty_limits() and bdi_dirty_limit().
      
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1babe183
    • Wu Fengguang's avatar
      writeback: avoid unnecessary calculation of bdi dirty thresholds · 16c4042f
      Wu Fengguang authored
      
      Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
      that the latter can be avoided when under global dirty background
      threshold (which is the normal state for most systems).
      
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      16c4042f
    • Wu Fengguang's avatar
      writeback: balance_dirty_pages(): reduce calls to global_page_state · e50e3720
      Wu Fengguang authored
      
      Reducing the number of times balance_dirty_pages calls global_page_state
      reduces the cache references and so improves write performance on a
      variety of workloads.
      
      'perf stats' of simple fio write tests shows the reduction in cache
      access.  Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2
      with 3Gb memory (dirty_threshold approx 600 Mb) running each test 10
      times, dropping the fasted & slowest values then taking the average &
      standard deviation
      
      		average (s.d.) in millions (10^6)
      2.6.31-rc8	648.6 (14.6)
      +patch		620.1 (16.5)
      
      Achieving this reduction is by dropping clip_bdi_dirty_limit as it rereads
      the counters to apply the dirty_threshold and moving this check up into
      balance_dirty_pages where it has already read the counters.
      
      Also by rearrange the for loop to only contain one copy of the limit tests
      allows the pdflush test after the loop to use the local copies of the
      counters rather than rereading them.
      
      In the common case with no throttling it now calls global_page_state 5
      fewer times and bdi_stat 2 fewer.
      
      Fengguang:
      
      This patch slightly changes behavior by replacing clip_bdi_dirty_limit()
      with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh) to
      avoid exceeding the dirty limit.  Since the bdi dirty limit is mostly
      accurate we don't need to do routinely clip.  A simple dirty limit check
      would be enough.
      
      The check is necessary because, in principle we should throttle everything
      calling balance_dirty_pages() when we're over the total limit, as said by
      Peter.
      
      We now set and clear dirty_exceeded not only based on bdi dirty limits,
      but also on the global dirty limit.  The global limit check is added in
      place of clip_bdi_dirty_limit() for safety and not intended as a behavior
      change.  The bdi limits should be tight enough to keep all dirty pages
      under the global limit at most time; occasional small exceeding should be
      OK though.  The change makes the logic more obvious: the global limit is
      the ultimate goal and shall be always imposed.
      
      We may now start background writeback work based on outdated conditions.
      That's safe because the bdi flush thread will (and have to) double check
      the states.  It reduces overall overheads because the test based on old
      states still have good chance to be right.
      
      [akpm@linux-foundation.org] fix uninitialized dirty_exceeded
      Signed-off-by: default avatarRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e50e3720
    • Randy Dunlap's avatar
      mm: fix fatal kernel-doc error · 3c111a07
      Randy Dunlap authored
      
      Fix a fatal kernel-doc error due to a #define coming between a function's
      kernel-doc notation and the function signature.  (kernel-doc cannot handle
      this)
      
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c111a07
  15. Aug 11, 2010
Loading