Skip to content
  1. Dec 13, 2011
    • Tejun Heo's avatar
      cgroup: improve old cgroup handling in cgroup_attach_proc() · 134d3373
      Tejun Heo authored
      
      
      cgroup_attach_proc() behaves differently from cgroup_attach_task() in
      the following aspects.
      
      * All hooks are invoked even if no task is actually being moved.
      
      * ->can_attach_task() is called for all tasks in the group whether the
        new cgrp is different from the current cgrp or not; however,
        ->attach_task() is skipped if new equals new.  This makes the calls
        asymmetric.
      
      This patch improves old cgroup handling in cgroup_attach_proc() by
      looking up the current cgroup at the head, recording it in the flex
      array along with the task itself, and using it to remove the above two
      differences.  This will also ease further changes.
      
      -v2: nr_todo renamed to nr_migrating_tasks as per Paul Menage's
           suggestion.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarPaul Menage <paul@paulmenage.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      134d3373
    • Tejun Heo's avatar
      cgroup: always lock threadgroup during migration · cd3d0952
      Tejun Heo authored
      
      
      Update cgroup to take advantage of the fack that threadgroup_lock()
      guarantees stable threadgroup.
      
      * Lock threadgroup even if the target is a single task.  This
        guarantees that when the target tasks stay stable during migration
        regardless of the target type.
      
      * Remove PF_EXITING early exit optimization from attach_task_by_pid()
        and check it in cgroup_task_migrate() instead.  The optimization was
        for rather cold path to begin with and PF_EXITING state can be
        trusted throughout migration by checking it after locking
        threadgroup.
      
      * Don't add PF_EXITING tasks to target task array in
        cgroup_attach_proc().  This ensures that task migration is performed
        only for live tasks.
      
      * Remove -ESRCH failure path from cgroup_task_migrate().  With the
        above changes, it's guaranteed to be called only for live tasks.
      
      After the changes, only live tasks are migrated and they're guaranteed
      to stay alive until migration is complete.  This removes problems
      caused by exec and exit racing against cgroup migration including
      symmetry among cgroup attach methods and different cgroup methods
      racing each other.
      
      v2: Oleg pointed out that one more PF_EXITING check can be removed
          from cgroup_attach_proc().  Removed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      cd3d0952
    • Tejun Heo's avatar
      threadgroup: extend threadgroup_lock() to cover exit and exec · 77e4ef99
      Tejun Heo authored
      
      
      threadgroup_lock() protected only protected against new addition to
      the threadgroup, which was inherently somewhat incomplete and
      problematic for its only user cgroup.  On-going migration could race
      against exec and exit leading to interesting problems - the symmetry
      between various attach methods, task exiting during method execution,
      ->exit() racing against attach methods, migrating task switching basic
      properties during exec and so on.
      
      This patch extends threadgroup_lock() such that it protects against
      all three threadgroup altering operations - fork, exit and exec.  For
      exit, threadgroup_change_begin/end() calls are added to exit_signals
      around assertion of PF_EXITING.  For exec, threadgroup_[un]lock() are
      updated to also grab and release cred_guard_mutex.
      
      With this change, threadgroup_lock() guarantees that the target
      threadgroup will remain stable - no new task will be added, no new
      PF_EXITING will be set and exec won't happen.
      
      The next patch will update cgroup so that it can take full advantage
      of this change.
      
      -v2: beefed up comment as suggested by Frederic.
      
      -v3: narrowed scope of protection in exit path as suggested by
           Frederic.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      77e4ef99
    • Tejun Heo's avatar
      threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem · 257058ae
      Tejun Heo authored
      
      
      Make the following renames to prepare for extension of threadgroup
      locking.
      
      * s/signal->threadgroup_fork_lock/signal->group_rwsem/
      * s/threadgroup_fork_read_lock()/threadgroup_change_begin()/
      * s/threadgroup_fork_read_unlock()/threadgroup_change_end()/
      * s/threadgroup_fork_write_lock()/threadgroup_lock()/
      * s/threadgroup_fork_write_unlock()/threadgroup_unlock()/
      
      This patch doesn't cause any behavior change.
      
      -v2: Rename threadgroup_change_done() to threadgroup_change_end() per
           KAMEZAWA's suggestion.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      257058ae
    • Tejun Heo's avatar
      cgroup: add cgroup_root_mutex · e25e2cbb
      Tejun Heo authored
      
      
      cgroup wants to make threadgroup stable while modifying cgroup
      hierarchies which will introduce locking dependency on
      cred_guard_mutex from cgroup_mutex.  This unfortunately completes
      circular dependency.
      
       A. cgroup_mutex -> cred_guard_mutex -> s_type->i_mutex_key -> namespace_sem
       B. namespace_sem -> cgroup_mutex
      
      B is from cgroup_show_options() and this patch breaks it by
      introducing another mutex cgroup_root_mutex which nests inside
      cgroup_mutex and protects cgroupfs_root.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      e25e2cbb
  2. Nov 23, 2011
    • Rafael J. Wysocki's avatar
      PM / Hibernate: Do not leak memory in error/test code paths · bb58dd5d
      Rafael J. Wysocki authored
      
      
      The hibernation core code forgets to release memory preallocated
      for hibernation if there's an error in its early stages or if test
      modes causing hibernation_snapshot() to return early are used.  This
      causes the system to be hardly usable, because the amount of
      preallocated memory is usually huge.  Fix this problem.
      
      Reported-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      bb58dd5d
    • Tejun Heo's avatar
      freezer: kill unused set_freezable_with_signal() · 34b087e4
      Tejun Heo authored
      
      
      There's no in-kernel user of set_freezable_with_signal() left.  Mixing
      TIF_SIGPENDING with kernel threads can lead to nasty corner cases as
      kernel threads never travel signal delivery path on their own.
      
      e.g. the current implementation is buggy in the cancelation path of
      __thaw_task().  It calls recalc_sigpending_and_wake() in an attempt to
      clear TIF_SIGPENDING but the function never clears it regardless of
      sigpending state.  This means that signallable freezable kthreads may
      continue executing with !freezing() && stuck TIF_SIGPENDING, which can
      be troublesome.
      
      This patch removes set_freezable_with_signal() along with
      PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer.  User
      tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious
      sigpending is dealt with in the usual signal delivery path.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      34b087e4
  3. Nov 21, 2011
    • Tejun Heo's avatar
      freezer: remove unused @sig_only from freeze_task() · 839e3407
      Tejun Heo authored
      
      
      After "freezer: make freezing() test freeze conditions in effect
      instead of TIF_FREEZE", freezing() returns authoritative answer on
      whether the current task should freeze or not and freeze_task()
      doesn't need or use @sig_only.  Remove it.
      
      While at it, rewrite function comment for freeze_task() and rename
      @sig_only to @user_only in try_to_freeze_tasks().
      
      This patch doesn't cause any functional change.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      839e3407
    • Tejun Heo's avatar
      freezer: use lock_task_sighand() in fake_signal_wake_up() · 37ad8aca
      Tejun Heo authored
      
      
      cgroup_freezer calls freeze_task() without holding tasklist_lock and,
      if the task is exiting, its ->sighand may be gone by the time
      fake_signal_wake_up() is called.  Use lock_task_sighand() instead of
      accessing ->sighand directly.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Paul Menage <paul@paulmenage.org>
      37ad8aca
    • Tejun Heo's avatar
      freezer: restructure __refrigerator() · 5ece3eae
      Tejun Heo authored
      
      
      If another freeze happens before all tasks leave FROZEN state after
      being thawed, the freezer can see the existing FROZEN and consider the
      tasks to be frozen but they can clear FROZEN without checking the new
      freezing().
      
      Oleg suggested restructuring __refrigerator() such that there's single
      condition check section inside freezer_lock and sigpending is cleared
      afterwards, which fixes the problem and simplifies the code.
      Restructure accordingly.
      
      -v2: Frozen loop exited without releasing freezer_lock.  Fixed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      5ece3eae
    • Tejun Heo's avatar
      freezer: fix set_freezable[_with_signal]() race · 96ee6d85
      Tejun Heo authored
      
      
      A kthread doing set_freezable*() may race with on-going PM freeze and
      the freezer might think all tasks are frozen while the new freezable
      kthread is merrily proceeding to execute code paths which aren't
      supposed to be executing during PM freeze.
      
      Reimplement set_freezable[_with_signal]() using __set_freezable() such
      that freezable PF flags are modified under freezer_lock and
      try_to_freeze() is called afterwards.  This eliminates race condition
      against freezing.
      
      Note: Separated out from larger patch to resolve fix order dependency
            Oleg pointed out.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      96ee6d85
    • Tejun Heo's avatar
      freezer: remove should_send_signal() and update frozen() · 948246f7
      Tejun Heo authored
      
      
      should_send_signal() is only used in freezer.c.  Exporting them only
      increases chance of abuse.  Open code the two users and remove it.
      
      Update frozen() to return bool.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      948246f7
    • Tejun Heo's avatar
      freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE · a3201227
      Tejun Heo authored
      
      
      Using TIF_FREEZE for freezing worked when there was only single
      freezing condition (the PM one); however, now there is also the
      cgroup_freezer and single bit flag is getting clumsy.
      thaw_processes() is already testing whether cgroup freezing in in
      effect to avoid thawing tasks which were frozen by both PM and cgroup
      freezers.
      
      This is racy (nothing prevents race against cgroup freezing) and
      fragile.  A much simpler way is to test actual freeze conditions from
      freezing() - ie. directly test whether PM or cgroup freezing is in
      effect.
      
      This patch adds variables to indicate whether and what type of
      freezing conditions are in effect and reimplements freezing() such
      that it directly tests whether any of the two freezing conditions is
      active and the task should freeze.  On fast path, freezing() is still
      very cheap - it only tests system_freezing_cnt.
      
      This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
      freeze/thaw operations more usual - updating state variables for the
      new state and nudging target tasks so that they notice the new state
      and comply.  As long as the nudging happens after state update, it's
      race-free.
      
      * This allows use of freezing() in freeze_task().  Replace the open
        coded tests with freezing().
      
      * p != current test is added to warning printing conditions in
        try_to_freeze_tasks() failure path.  This is necessary as freezing()
        is now true for the task which initiated freezing too.
      
      -v2: Oleg pointed out that re-freezing FROZEN cgroup could increment
           system_freezing_cnt.  Fixed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: Paul Menage <paul@paulmenage.org>  (for the cgroup portions)
      a3201227
    • Tejun Heo's avatar
      cgroup_freezer: prepare for removal of TIF_FREEZE · 22b4e111
      Tejun Heo authored
      
      
      TIF_FREEZE will be removed soon and freezing() will directly test
      whether any freezing condition is in effect.  Make the following
      changes in preparation.
      
      * Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
        return bool.
      
      * Make cgroup_freezing() access task_freezer() under rcu read lock
        instead of task_lock().  This makes the state dereferencing racy
        against task moving to another cgroup; however, it was already racy
        without this change as ->state dereference wasn't synchronized.
        This will be later dealt with using attach hooks.
      
      * freezer->state is now set before trying to push tasks into the
        target state.
      
      -v2: Oleg pointed out that freeze_change_state() was setting
           freeze->state incorrectly to CGROUP_FROZEN instead of
           CGROUP_FREEZING.  Fixed.
      
      -v3: Matt pointed out that setting CGROUP_FROZEN used to always invoke
           try_to_freeze_cgroup() regardless of the current state.  Patch
           updated such that the actual freeze/thaw operations are always
           performed on invocation.  This shouldn't make any difference
           unless something is broken.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarPaul Menage <paul@paulmenage.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      22b4e111
    • Tejun Heo's avatar
      freezer: clean up freeze_processes() failure path · 03afed8b
      Tejun Heo authored
      
      
      freeze_processes() failure path is rather messy.  Freezing is canceled
      for workqueues and tasks which aren't frozen yet but frozen tasks are
      left alone and should be thawed by the caller and of course some
      callers (xen and kexec) didn't do it.
      
      This patch updates __thaw_task() to handle cancelation correctly and
      makes freeze_processes() and freeze_kernel_threads() call
      thaw_processes() on failure instead so that the system is fully thawed
      on failure.  Unnecessary [suspend_]thaw_processes() calls are removed
      from kernel/power/hibernate.c, suspend.c and user.c.
      
      While at it, restructure error checking if clause in suspend_prepare()
      to be less weird.
      
      -v2: Srivatsa spotted missing removal of suspend_thaw_processes() in
           suspend_prepare() and error in commit message.  Updated.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      03afed8b
    • Tejun Heo's avatar
      freezer: kill PF_FREEZING · 376fede8
      Tejun Heo authored
      
      
      With the previous changes, there's no meaningful difference between
      PF_FREEZING and PF_FROZEN.  Remove PF_FREEZING and use PF_FROZEN
      instead in task_contributes_to_load().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      376fede8
    • Tejun Heo's avatar
      freezer: test freezable conditions while holding freezer_lock · 85f1d476
      Tejun Heo authored
      
      
      try_to_freeze_tasks() and thaw_processes() use freezable() and
      frozen() as preliminary tests before initiating operations on a task.
      These are done without any synchronization and hinder with
      synchronization cleanup without any real performance benefits.
      
      In try_to_freeze_tasks(), open code self test and move PF_NOFREEZE and
      frozen() tests inside freezer_lock in freeze_task().
      
      thaw_processes() can simply drop freezable() test as frozen() test in
      __thaw_task() is enough.
      
      Note: This used to be a part of larger patch to fix set_freezable()
            race.  Separated out to satisfy ordering among dependent fixes.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      85f1d476
    • Tejun Heo's avatar
      freezer: make freezing indicate freeze condition in effect · 6907483b
      Tejun Heo authored
      
      
      Currently freezing (TIF_FREEZE) and frozen (PF_FROZEN) states are
      interlocked - freezing is set to request freeze and when the task
      actually freezes, it clears freezing and sets frozen.
      
      This interlocking makes things more complex than necessary - freezing
      doesn't mean there's freezing condition in effect and frozen doesn't
      match the task actually entering and leaving frozen state (it's
      cleared by the thawing task).
      
      This patch makes freezing indicate that freeze condition is in effect.
      A task enters and stays frozen if freezing.  This makes PF_FROZEN
      manipulation done only by the task itself and prevents wakeup from
      __thaw_task() leaking outside of refrigerator.
      
      The only place which needs to tell freezing && !frozen is
      try_to_freeze_task() to whine about tasks which don't enter frozen.
      It's updated to test the condition explicitly.
      
      With the change, frozen() state my linger after __thaw_task() until
      the task wakes up and exits fridge.  This can trigger BUG_ON() in
      update_if_frozen().  Work it around by testing freezing() && frozen()
      instead of frozen().
      
      -v2: Oleg pointed out missing re-check of freezing() when trying to
           clear FROZEN and possible spurious BUG_ON() trigger in
           update_if_frozen().  Both fixed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Menage <paul@paulmenage.org>
      6907483b
    • Tejun Heo's avatar
      freezer: use dedicated lock instead of task_lock() + memory barrier · 0c9af092
      Tejun Heo authored
      
      
      Freezer synchronization is needlessly complicated - it's by no means a
      hot path and the priority is staying unintrusive and safe.  This patch
      makes it simply use a dedicated lock instead of piggy-backing on
      task_lock() and playing with memory barriers.
      
      On the failure path of try_to_freeze_tasks(), locking is moved from it
      to cancel_freezing().  This makes the frozen() test racy but the race
      here is a non-issue as the warning is printed for tasks which failed
      to enter frozen for 20 seconds and race on PF_FROZEN at the last
      moment doesn't change anything.
      
      This simplifies freezer implementation and eases further changes
      including some race fixes.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0c9af092
    • Tejun Heo's avatar
      freezer: don't distinguish nosig tasks on thaw · 6cd8dedc
      Tejun Heo authored
      
      
      There's no point in thawing nosig tasks before others.  There's no
      ordering requirement between the two groups on thaw, which the staged
      thawing can't guarantee anyway.  Simplify thaw_processes() by removing
      the distinction and collapsing thaw_tasks() into thaw_processes().
      This will help further updates to freezer.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      6cd8dedc
    • Tejun Heo's avatar
      freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks · a585042f
      Tejun Heo authored
      
      
      clear_freeze_flag() in exit_mm() is racy.  Freezing can start
      afterwards.  Remove it.  Skipping freezer for exiting task will be
      properly implemented later.
      
      Also, freezable() was testing exit_state directly to make system
      freezer ignore dead tasks.  Let the exiting task set PF_NOFREEZE after
      entering TASK_DEAD instead.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      a585042f
    • Tejun Heo's avatar
      freezer: rename thaw_process() to __thaw_task() and simplify the implementation · a5be2d0d
      Tejun Heo authored
      
      
      thaw_process() now has only internal users - system and cgroup
      freezers.  Remove the unnecessary return value, rename, unexport and
      collapse __thaw_process() into it.  This will help further updates to
      the freezer code.
      
      -v3: oom_kill grew a use of thaw_process() while this patch was
           pending.  Convert it to use __thaw_task() for now.  In the longer
           term, this should be handled by allowing tasks to die if killed
           even if it's frozen.
      
      -v2: minor style update as suggested by Matt.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      a5be2d0d
    • Tejun Heo's avatar
      freezer: implement and use kthread_freezable_should_stop() · 8a32c441
      Tejun Heo authored
      
      
      Writeback and thinkpad_acpi have been using thaw_process() to prevent
      deadlock between the freezer and kthread_stop(); unfortunately, this
      is inherently racy - nothing prevents freezing from happening between
      thaw_process() and kthread_stop().
      
      This patch implements kthread_freezable_should_stop() which enters
      refrigerator if necessary but is guaranteed to return if
      kthread_stop() is invoked.  Both thaw_process() users are converted to
      use the new function.
      
      Note that this deadlock condition exists for many of freezable
      kthreads.  They need to be converted to use the new should_stop or
      freezable workqueue.
      
      Tested with synthetic test case.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarHenrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      8a32c441
    • Tejun Heo's avatar
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo authored
      
      
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
    • Tejun Heo's avatar
      freezer: fix current->state restoration race in refrigerator() · 50fb4f7f
      Tejun Heo authored
      
      
      refrigerator() saves current->state before entering frozen state and
      restores it before returning using __set_current_state(); however,
      this is racy, for example, please consider the following sequence.
      
      	set_current_state(TASK_INTERRUPTIBLE);
      	try_to_freeze();
      	if (kthread_should_stop())
      		break;
      	schedule();
      
      If kthread_stop() races with ->state restoration, the restoration can
      restore ->state to TASK_INTERRUPTIBLE after kthread_stop() sets it to
      TASK_RUNNING but kthread_should_stop() may still see zero
      ->should_stop because there's no memory barrier between restoring
      TASK_INTERRUPTIBLE and kthread_should_stop() test.
      
      This isn't restricted to kthread_should_stop().  current->state is
      often used in memory barrier based synchronization and silently
      restoring it w/o mb breaks them.
      
      Use set_current_state() instead.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      50fb4f7f
  4. Nov 19, 2011
  5. Nov 18, 2011
    • Srivatsa S. Bhat's avatar
      PM / Hibernate: Fix the early termination of test modes · aa9a7b11
      Srivatsa S. Bhat authored
      
      
      Commit 2aede851
      (PM / Hibernate: Freeze kernel threads after preallocating memory)
      postponed the freezing of kernel threads to after preallocating memory
      for hibernation. But while doing that, the hibernation test TEST_FREEZER
      and the test mode HIBERNATION_TESTPROC were not moved accordingly.
      
      As a result, when using these test modes, it only goes upto the freezing of
      userspace and exits, when in fact it should go till the complete end of task
      freezing stage, namely the freezing of kernel threads as well.
      
      So, move these points of exit to appropriate places so that freezing of
      kernel threads is also tested while using these test harnesses.
      
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      aa9a7b11
  6. Nov 17, 2011
  7. Nov 07, 2011
  8. Nov 06, 2011
  9. Nov 04, 2011
    • Tejun Heo's avatar
      PM / Freezer: Revert 27920651 "PM / Freezer: Make fake_signal_wake_up() wake... · d6cc7685
      Tejun Heo authored
      
      PM / Freezer: Revert 27920651 "PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too"
      
      Commit 27920651 "PM / Freezer: Make fake_signal_wake_up() wake
      TASK_KILLABLE tasks too" updated fake_signal_wake_up() used by freezer
      to wake up KILLABLE tasks.  Sending unsolicited wakeups to tasks in
      killable sleep is dangerous as there are code paths which depend on
      tasks not waking up spuriously from KILLABLE sleep.
      
      For example. sys_read() or page can sleep in TASK_KILLABLE assuming
      that wait/down/whatever _killable can only fail if we can not return
      to the usermode.  TASK_TRACED is another obvious example.
      
      The previous patch updated wait_event_freezekillable() such that it
      doesn't depend on the spurious wakeup.  This patch reverts the
      offending commit.
      
      Note that the spurious KILLABLE wakeup had other implicit effects in
      KILLABLE sleeps in nfs and cifs and those will need further updates to
      regain freezekillable behavior.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      d6cc7685
    • Guennadi Liakhovetski's avatar
      PM / QoS: Remove redundant check · 6513fd69
      Guennadi Liakhovetski authored
      
      
      Remove an "if" check, that repeats an equivalent one 6 lines above.
      
      Signed-off-by: default avatarGuennadi Liakhovetski <g.liakhovetski@gmx.de>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      6513fd69
    • Srivatsa S. Bhat's avatar
      PM / Sleep: Fix race between CPU hotplug and freezer · 79cfbdfa
      Srivatsa S. Bhat authored
      
      
      The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down()
      functions depend on the value of the 'tasks_frozen' argument passed to them
      (which indicates whether tasks have been frozen or not).
      (Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN,
      CPU_DEAD, CPU_DEAD_FROZEN).
      
      Thus, it is essential that while the callbacks for those notifications are
      running, the state of the system with respect to the tasks being frozen or
      not remains unchanged, *throughout that duration*. Hence there is a need for
      synchronizing the CPU hotplug code with the freezer subsystem.
      
      Since the freezer is involved only in the Suspend/Hibernate call paths, this
      patch hooks the CPU hotplug code to the suspend/hibernate notifiers
      PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent
      the race between CPU hotplug and freezer, thus ensuring that CPU hotplug
      notifications will always be run with the state of the system really being
      what the notifications indicate, _throughout_ their execution time.
      
      Signed-off-by: default avatarSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      79cfbdfa
  10. Nov 03, 2011
  11. Nov 02, 2011
    • Andrew Bresticker's avatar
      memcg: replace ss->id_lock with a rwlock · c1e2ee2d
      Andrew Bresticker authored
      
      
      While back-porting Johannes Weiner's patch "mm: memcg-aware global
      reclaim" for an internal effort, we noticed a significant performance
      regression during page-reclaim heavy workloads due to high contention of
      the ss->id_lock.  This lock protects idr map, and serializes calls to
      idr_get_next() in css_get_next() (which is used during the memcg hierarchy
      walk).
      
      Since idr_get_next() is just doing a look up, we need only serialize it
      with respect to idr_remove()/idr_get_new().  By making the ss->id_lock a
      rwlock, contention is greatly reduced and performance improves.
      
      Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
      each core (one file + container per core) in parallel on a NUMA machine.
      Result is the time for the test to complete in 1 of the containers.
      Both kernels included Johannes' memcg-aware global reclaim patches.
      
      Before rwlock patch: 1710.778s
      After rwlock patch: 152.227s
      
      Signed-off-by: default avatarAndrew Bresticker <abrestic@google.com>
      Cc: Paul Menage <menage@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1e2ee2d
    • Lucas De Marchi's avatar
      sysctl: add support for poll() · f1ecf068
      Lucas De Marchi authored
      
      
      Adding support for poll() in sysctl fs allows userspace to receive
      notifications of changes in sysctl entries.  This adds a infrastructure to
      allow files in sysctl fs to be pollable and implements it for hostname and
      domainname.
      
      [akpm@linux-foundation.org: s/declare/define/ for definitions]
      Signed-off-by: default avatarLucas De Marchi <lucas.demarchi@profusion.mobi>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1ecf068
    • David Rientjes's avatar
      cpusets: avoid looping when storing to mems_allowed if one node remains set · 89e8a244
      David Rientjes authored
      
      
      {get,put}_mems_allowed() exist so that general kernel code may locklessly
      access a task's set of allowable nodes without having the chance that a
      concurrent write will cause the nodemask to be empty on configurations
      where MAX_NUMNODES > BITS_PER_LONG.
      
      This could incur a significant delay, however, especially in low memory
      conditions because the page allocator is blocking and reclaim requires
      get_mems_allowed() itself.  It is not atypical to see writes to
      cpuset.mems take over 2 seconds to complete, for example.  In low memory
      conditions, this is problematic because it's one of the most imporant
      times to change cpuset.mems in the first place!
      
      The only way a task's set of allowable nodes may change is through cpusets
      by writing to cpuset.mems and when attaching a task to a generic code is
      not reading the nodemask with get_mems_allowed() at the same time, and
      then clearing all the old nodes.  This prevents the possibility that a
      reader will see an empty nodemask at the same time the writer is storing a
      new nodemask.
      
      If at least one node remains unchanged, though, it's possible to simply
      set all new nodes and then clear all the old nodes.  Changing a task's
      nodemask is protected by cgroup_mutex so it's guaranteed that two threads
      are not changing the same task's nodemask at the same time, so the
      nodemask is guaranteed to be stored before another thread changes it and
      determines whether a node remains set or not.
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Paul Menage <paul@paulmenage.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89e8a244
    • Ben Blum's avatar
      cgroups: don't attach task to subsystem if migration failed · 77ceab8e
      Ben Blum authored
      
      
      If a task has exited to the point it has called cgroup_exit() already,
      then we can't migrate it to another cgroup anymore.
      
      This can happen when we are attaching a task to a new cgroup between the
      call to ->can_attach_task() on subsystems and the migration that is
      eventually tried in cgroup_task_migrate().
      
      In this case cgroup_task_migrate() returns -ESRCH and we don't want to
      attach the task to the subsystems because the attachment to the new cgroup
      itself failed.
      
      Fix this by only calling ->attach_task() on the subsystems if the cgroup
      migration succeeded.
      
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarBen Blum <bblum@andrew.cmu.edu>
      Acked-by: default avatarPaul Menage <paul@paulmenage.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77ceab8e
Loading