Skip to content
  1. Oct 14, 2014
  2. Sep 09, 2014
  3. Aug 08, 2014
    • Jack Miller's avatar
      shm: allow exit_shm in parallel if only marking orphans · 83293c0f
      Jack Miller authored
      
      
      If shm_rmid_force (the default state) is not set then the shmids are only
      marked as orphaned and does not require any add, delete, or locking of the
      tree structure.
      
      Seperate the sysctl on and off case, and only obtain the read lock.  The
      newly added list head can be deleted under the read lock because we are
      only called with current and will only change the semids allocated by this
      task and not manipulate the list.
      
      This commit assumes that up_read includes a sufficient memory barrier for
      the writes to be seen my others that later obtain a write lock.
      
      Signed-off-by: default avatarMilton Miller <miltonm@bga.com>
      Signed-off-by: default avatarJack Miller <millerjo@us.ibm.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83293c0f
    • Jack Miller's avatar
      shm: make exit_shm work proportional to task activity · ab602f79
      Jack Miller authored
      This is small set of patches our team has had kicking around for a few
      versions internally that fixes tasks getting hung on shm_exit when there
      are many threads hammering it at once.
      
      Anton wrote a simple test to cause the issue:
      
        http://ozlabs.org/~anton/junkcode/bust_shm_exit.c
      
      
      
      Before applying this patchset, this test code will cause either hanging
      tracebacks or pthread out of memory errors.
      
      After this patchset, it will still produce output like:
      
        root@somehost:~# ./bust_shm_exit 1024 160
        ...
        INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113)
        INFO: Stall ended before state dump start
        ...
      
      But the task will continue to run along happily, so we consider this an
      improvement over hanging, even if it's a bit noisy.
      
      This patch (of 3):
      
      exit_shm obtains the ipc_ns shm rwsem for write and holds it while it
      walks every shared memory segment in the namespace.  Thus the amount of
      work is related to the number of shm segments in the namespace not the
      number of segments that might need to be cleaned.
      
      In addition, this occurs after the task has been notified the thread has
      exited, so the number of tasks waiting for the ns shm rwsem can grow
      without bound until memory is exausted.
      
      Add a list to the task struct of all shmids allocated by this task.  Init
      the list head in copy_process.  Use the ns->rwsem for locking.  Add
      segments after id is added, remove before removing from id.
      
      On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar
      to handling of semaphore undo.
      
      I chose a define for the init sequence since its a simple list init,
      otherwise it would require a function call to avoid include loops between
      the semaphore code and the task struct.  Converting the list_del to
      list_del_init for the unshare cases would remove the exit followed by
      init, but I left it blow up if not inited.
      
      Signed-off-by: default avatarMilton Miller <miltonm@bga.com>
      Signed-off-by: default avatarJack Miller <millerjo@us.ibm.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab602f79
  4. Jul 30, 2014
    • Eric W. Biederman's avatar
      namespaces: Use task_lock and not rcu to protect nsproxy · 728dba3a
      Eric W. Biederman authored
      
      
      The synchronous syncrhonize_rcu in switch_task_namespaces makes setns
      a sufficiently expensive system call that people have complained.
      
      Upon inspect nsproxy no longer needs rcu protection for remote reads.
      remote reads are rare.  So optimize for same process reads and write
      by switching using rask_lock instead.
      
      This yields a simpler to understand lock, and a faster setns system call.
      
      In particular this fixes a performance regression observed
      by Rafael David Tinoco <rafael.tinoco@canonical.com>.
      
      This is effectively a revert of Pavel Emelyanov's commit
      cf7b708c Make access to task's nsproxy lighter
      from 2007.  The race this originialy fixed no longer exists as
      do_notify_parent uses task_active_pid_ns(parent) instead of
      parent->nsproxy.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      728dba3a
  5. Jun 06, 2014
  6. Apr 07, 2014
  7. Mar 16, 2014
    • Michael Kerrisk's avatar
      ipc: Fix 2 bugs in msgrcv() MSG_COPY implementation · 4f87dac3
      Michael Kerrisk authored
      While testing and documenting the msgrcv() MSG_COPY flag that Stanislav
      Kinsbursky added in commit 4a674f34 ("ipc: introduce message queue
      copy feature" => kernel 3.8), I discovered a couple of bugs in the
      implementation.  The two bugs concern MSG_COPY interactions with other
      msgrcv() flags, namely:
      
       (A) MSG_COPY + MSG_EXCEPT
       (B) MSG_COPY + !IPC_NOWAIT
      
      The bugs are distinct (and the fix for the first one is obvious),
      however my fix for both is a single-line patch, which is why I'm
      combining them in a single mail, rather than writing two mails+patches.
      
       ===== (A) MSG_COPY + MSG_EXCEPT =====
      
      With the addition of the MSG_COPY flag, there are now two msgrcv()
      flags--MSG_COPY and MSG_EXCEPT--that modify the meaning of the 'msgtyp'
      argument in unrelated ways.  Specifying both in the same call is a
      logical error that is currently permitted, with the effect that MSG_COPY
      has priority and MSG_EXCEPT is ignored.  The call should give an error
      if both flag...
      4f87dac3
  8. Mar 06, 2014
  9. Feb 25, 2014
    • Davidlohr Bueso's avatar
      ipc,mqueue: remove limits for the amount of system-wide queues · f3713fd9
      Davidlohr Bueso authored
      Commit 93e6f119 ("ipc/mqueue: cleanup definition names and
      locations") added global hardcoded limits to the amount of message
      queues that can be created.  While these limits are per-namespace,
      reality is that it ends up breaking userspace applications.
      Historically users have, at least in theory, been able to create up to
      INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
      for some workloads and use cases.  For instance, Madars reports:
      
       "This update imposes bad limits on our multi-process application.  As
        our app uses approaches that each process opens its own set of queues
        (usually something about 3-5 queues per process).  In some scenarios
        we might run up to 3000 processes or more (which of-course for linux
        is not a problem).  Thus we might need up to 9000 queues or more.  All
        processes run under one user."
      
      Other affected users can be found in launchpad bug #1155695:
        https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695
      
      
      
      Instead of increasing this limit, revert it entirely and fallback to the
      original way of dealing queue limits -- where once a user's resource
      limit is reached, and all memory is used, new queues cannot be created.
      
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Reported-by: default avatarMadars Vitolins <m@silodev.com>
      Acked-by: default avatarDoug Ledford <dledford@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>	[3.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3713fd9
  10. Feb 02, 2014
    • H. Peter Anvin's avatar
      compat: Get rid of (get|put)_compat_time(val|spec) · 81993e81
      H. Peter Anvin authored
      
      
      We have two APIs for compatiblity timespec/val, with confusingly
      similar names.  compat_(get|put)_time(val|spec) *do* handle the case
      where COMPAT_USE_64BIT_TIME is set, whereas
      (get|put)_compat_time(val|spec) do not.  This is an accident waiting
      to happen.
      
      Clean it up by favoring the full-service version; the limited version
      is replaced with double-underscore versions static to kernel/compat.c.
      
      A common pattern is to convert a struct timespec to kernel format in
      an allocation on the user stack.  Unfortunately it is open-coded in
      several places.  Since this allocation isn't actually needed if
      COMPAT_USE_64BIT_TIME is true (since user format == kernel format)
      encapsulate that whole pattern into the function
      compat_convert_timespec().  An equivalent function should be written
      for struct timeval if it is needed in the future.
      
      Finally, get rid of compat_(get|put)_timeval_convert(): each was only
      used once, and the latter was not even doing what the function said
      (no conversion actually was being done.)  Moving the conversion into
      compat_sys_settimeofday() itself makes the code much more similar to
      sys_settimeofday() itself.
      
      v3: Remove unused compat_convert_timeval().
      
      v2: Drop bogus "const" in the destination argument for
          compat_convert_time*().
      
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Tested-by: default avatarH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      81993e81
  11. Jan 28, 2014
Loading