Skip to content
Snippets Groups Projects
  1. Jan 17, 2012
    • Eric Paris's avatar
      seccomp: audit abnormal end to a process due to seccomp · 85e7bac3
      Eric Paris authored
      
      The audit system likes to collect information about processes that end
      abnormally (SIGSEGV) as this may me useful intrusion detection information.
      This patch adds audit support to collect information when seccomp forces a
      task to exit because of misbehavior in a similar way.
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      85e7bac3
    • Eric Paris's avatar
      audit: check current inode and containing object when filtering on major and minor · 16c174bd
      Eric Paris authored
      
      The audit system has the ability to filter on the major and minor number of
      the device containing the inode being operated upon.  Lets say that
      /dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot.  Now lets
      say we add a watch with a filter on 8,1.  If we proceed to open an inode
      inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.
      
      Lets instead assume that one were to use a tool like debugfs and were to
      open /dev/sda1 directly and to modify it's contents.  We might hope that
      this would also be logged, but it isn't.  The rules will check the
      major,minor of the device containing /dev/sda1.  In other words the rule
      would match on the major/minor of the tmpfs mounted at /dev.
      
      I believe these rules should trigger on either device.  The man page is
      devoid of useful information about the intended semantics.  It only seems
      logical that if you want to know everything that happened on a major,minor
      that would include things that happened to the device itself...
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      16c174bd
    • Eric Paris's avatar
      audit: drop the meaningless and format breaking word 'user' · 3035c51e
      Eric Paris authored
      
      userspace audit messages look like so:
      
      type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg=''
      
      That third field just says 'user'.  That's useless and doesn't follow the
      key=value pair we are trying to enforce.  We already know it came from the
      user based on the record type.  Kill that word.  Die.
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      3035c51e
    • Eric Paris's avatar
      audit: dynamically allocate audit_names when not enough space is in the names array · 5195d8e2
      Eric Paris authored
      
      This patch does 2 things.  First it reduces the number of audit_names
      allocated in every audit context from 20 to 5.  5 should be enough for all
      'normal' syscalls (rename being the worst).  Some syscalls can still touch
      more the 5 inodes such as mount.  When rpc filesystem is mounted it will
      create inodes and those can exceed 5.  To handle that problem this patch will
      dynamically allocate audit_names if it needs more than 5.  This should
      decrease the typicall memory usage while still supporting all the possible
      kernel operations.
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      5195d8e2
    • Eric Paris's avatar
      audit: make filetype matching consistent with other filters · 5ef30ee5
      Eric Paris authored
      
      Every other filter that matches part of the inodes list collected by audit
      will match against any of the inodes on that list.  The filetype matching
      however had a strange way of doing things.  It allowed userspace to
      indicated if it should match on the first of the second name collected by
      the kernel.  Name collection ordering seems like a kernel internal and
      making userspace rules get that right just seems like a bad idea.  As it
      turns out the userspace audit writers had no idea it was doing this and
      thus never overloaded the value field.  The kernel always checked the first
      name collected which for the tested rules was always correct.
      
      This patch just makes the filetype matching like the major, minor, inode,
      and LSM rules in that it will match against any of the names collected.  It
      also changes the rule validation to reject the old unused rule types.
      
      Noone knew it was there.  Noone used it.  Why keep around the extra code?
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      5ef30ee5
  2. Jan 11, 2012
    • Serge E. Hallyn's avatar
      user namespace: make signal.c respect user namespaces · 6b550f94
      Serge E. Hallyn authored
      
      ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
      user namespace. (new, thanks Oleg)
      
      __send_signal: convert current's uid to the recipient's user namespace
      for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
      again :)
      
      do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
      user namespace
      
      ptrace_signal maps parent's uid into current's user namespace before
      including in signal to current.  IIUC Oleg has argued that this shouldn't
      matter as the debugger will play with it, but it seems like not converting
      the value currently being set is misleading.
      
      Changelog:
      Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
      	simplify callers and help make clear what we are translating
              (which uid into which namespace).  Passing the target task would
      	make callers even easier to read, but we pass in user_ns because
      	current_user_ns() != task_cred_xxx(current, user_ns).
      Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
      	in ptrace_signal().
      Sep 23: In send_signal(), detect when (user) signal is coming from an
      	ancestor or unrelated user namespace.  Pass that on to __send_signal,
      	which sets si_uid to 0 or overflowuid if needed.
      Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
      	SI_FROMKERNEL cases at callers, because we can't assume sender is
      	current in those cases.
      Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
      Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
      
      Signed-off-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Serge Hallyn <serge.hallyn@canonical.com>
      Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
      
      Eric Biederman pointed out that passing info is a bug and could lead to a
      NULL pointer deref to boot.
      
      A collection of signal, securebits, filecaps, cap_bounds, and a few other
      ltp tests passed with this kernel.
      
      Changelog:
          Nov 18: previous patch missed a leading '&'
      
      Signed-off-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Dan Carpenter <dan.carpenter@oracle.com>
      Subject: ipc/mqueue: lock() => unlock() typo
      
      There was a double lock typo introduced in b085f4bd6b21 "user namespace:
      make signal.c respect user namespaces"
      
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b550f94
    • Tejun Heo's avatar
      workqueue: make alloc_workqueue() take printf fmt and args for name · b196be89
      Tejun Heo authored
      
      alloc_workqueue() currently expects the passed in @name pointer to remain
      accessible.  This is inconvenient and a bit silly given that the whole wq
      is being dynamically allocated.  This patch updates alloc_workqueue() and
      friends to take printf format string instead of opaque string and matching
      varargs at the end.  The name is allocated together with the wq and
      formatted.
      
      alloc_ordered_workqueue() is converted to a macro to unify varargs
      handling with alloc_workqueue(), and, while at it, add comment to
      alloc_workqueue().
      
      None of the current in-kernel users pass in string with '%' as constant
      name and this change shouldn't cause any problem.
      
      [akpm@linux-foundation.org: use __printf]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b196be89
    • Matt Fleming's avatar
      signal: add block_sigmask() for adding sigmask to current->blocked · 5e6292c0
      Matt Fleming authored
      
      Abstract the code sequence for adding a signal handler's sa_mask to
      current->blocked because the sequence is identical for all architectures.
      Furthermore, in the past some architectures actually got this code wrong,
      so introduce a wrapper that all architectures can use.
      
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5e6292c0
    • KAMEZAWA Hiroyuki's avatar
      tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113
      KAMEZAWA Hiroyuki authored
      
      oom_score_adj is used for guarding processes from OOM-Killer.  One of
      problem is that it's inherited at fork().  When a daemon set oom_score_adj
      and make children, it's hard to know where the value is set.
      
      This patch adds some tracepoints useful for debugging. This patch adds
      3 trace points.
        - creating new task
        - renaming a task (exec)
        - set oom_score_adj
      
      To debug, users need to enable some trace pointer. Maybe filtering is useful as
      
      # EVENT=/sys/kernel/debug/tracing/events/task/
      # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
      # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
      # echo 1 > $EVENT/enable
      # EVENT=/sys/kernel/debug/tracing/events/oom/
      # echo 1 > $EVENT/enable
      
      output will be like this.
      # grep oom /sys/kernel/debug/tracing/trace
      bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
      bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
      ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
      bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
      grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      43d2b113
    • Stanislaw Gruszka's avatar
      PM/Hibernate: do not count debug pages as savable · c6968e73
      Stanislaw Gruszka authored
      
      When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder >
      0, we have lot of free pages that are not marked so.  Snapshot code
      account them as savable, what cause hibernate memory preallocation
      failure.
      
      It is pretty hard to make hibernate allocation succeed with
      debug_guardpage_minorder=1.  This change at least make it possible when
      system has relatively big amount of RAM.
      
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6968e73
  3. Jan 08, 2012
  4. Jan 07, 2012
  5. Jan 05, 2012
    • Li Zefan's avatar
      cgroup: fix to allow mounting a hierarchy by name · 0d19ea86
      Li Zefan authored
      
      If we mount a hierarchy with a specified name, the name is unique,
      and we can use it to mount the hierarchy without specifying its
      set of subsystem names. This feature is documented is
      Documentation/cgroups/cgroups.txt section 2.3
      
      Here's an example:
      
      	# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
      	# mount -t cgroup -o name=myhier xxx /cgroup2
      
      But it was broken by commit 32a8cf23
      (cgroup: make the mount options parsing more accurate)
      
      This fixes the regression.
      
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      0d19ea86
  6. Jan 04, 2012
  7. Jan 02, 2012
  8. Dec 31, 2011
    • Hugh Dickins's avatar
      futex: Fix uninterruptible loop due to gate_area · e6780f72
      Hugh Dickins authored
      
      It was found (by Sasha) that if you use a futex located in the gate
      area we get stuck in an uninterruptible infinite loop, much like the
      ZERO_PAGE issue.
      
      While looking at this problem, PeterZ realized you'll get into similar
      trouble when hitting any install_special_pages() mapping.  And are there
      still drivers setting up their own special mmaps without page->mapping,
      and without special VM or pte flags to make get_user_pages fail?
      
      In most cases, if page->mapping is NULL, we do not need to retry at all:
      Linus points out that even /proc/sys/vm/drop_caches poses no problem,
      because it ends up using remove_mapping(), which takes care not to
      interfere when the page reference count is raised.
      
      But there is still one case which does need a retry: if memory pressure
      called shmem_writepage in between get_user_pages_fast dropping page
      table lock and our acquiring page lock, then the page gets switched from
      filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
      Fault it back in to get the page->mapping needed for key->shared.inode.
      
      Reported-by: default avatarSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6780f72
  9. Dec 30, 2011
  10. Dec 28, 2011
  11. Dec 27, 2011
  12. Dec 23, 2011
  13. Dec 22, 2011
Loading