Skip to content
  1. Oct 14, 2014
  2. Oct 10, 2014
  3. Oct 03, 2014
    • Joe Perches's avatar
      dynamic_debug: change __dynamic_<foo>_dbg return types to void · 906d2015
      Joe Perches authored
      
      
      The return value is not used by callers of these functions
      so change the functions to return void.
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Acked-by: default avatarJason Baron <jbaron@akamai.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      906d2015
    • Peter Zijlstra's avatar
      locking/lockdep: Revert qrwlock recusive stuff · 8acd91e8
      Peter Zijlstra authored
      Commit f0bab73c ("locking/lockdep: Restrict the use of recursive
      read_lock() with qrwlock") changed lockdep to try and conform to the
      qrwlock semantics which differ from the traditional rwlock semantics.
      
      In particular qrwlock is fair outside of interrupt context, but in
      interrupt context readers will ignore all fairness.
      
      The problem modeling this is that read and write side have different
      lock state (interrupts) semantics but we only have a single
      representation of these. Therefore lockdep will get confused, thinking
      the lock can cause interrupt lock inversions.
      
      So revert it for now; the old rwlock semantics were already imperfectly
      modeled and the qrwlock extra won't fit either.
      
      If we want to properly fix this, I think we need to resurrect the work
      by Gautham did a few years ago that split the read and write state of
      locks:
      
         http://lwn.net/Articles/332801/
      
      
      
      FWIW the locking selftest that would've failed (and was reported by
      Borislav earlier) is something like:
      
        RL(X1);	/* IRQ-ON */
        LOCK(A);
        UNLOCK(A);
        RU(X1);
      
        IRQ_ENTER();
        RL(X1);	/* IN-IRQ */
        RU(X1);
        IRQ_EXIT();
      
      At which point it would report that because A is an IRQ-unsafe lock we
      can suffer the following inversion:
      
      	CPU0		CPU1
      
      	lock(A)
      			lock(X1)
      			lock(A)
      	<IRQ>
      	 lock(X1)
      
      And this is 'wrong' because X1 can recurse (assuming the above lock are
      in fact read-lock) but lockdep doesn't know about this.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: ego@linux.vnet.ibm.com
      Cc: bp@alien8.de
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8acd91e8
  4. Sep 28, 2014
    • Willy Tarreau's avatar
      lzo: check for length overrun in variable length encoding. · 72cf9012
      Willy Tarreau authored
      
      
      This fix ensures that we never meet an integer overflow while adding
      255 while parsing a variable length encoding. It works differently from
      commit 206a81c1 ("lzo: properly check for overruns") because instead of
      ensuring that we don't overrun the input, which is tricky to guarantee
      due to many assumptions in the code, it simply checks that the cumulated
      number of 255 read cannot overflow by bounding this number.
      
      The MAX_255_COUNT is the maximum number of times we can add 255 to a base
      count without overflowing an integer. The multiply will overflow when
      multiplying 255 by more than MAXINT/255. The sum will overflow earlier
      depending on the base count. Since the base count is taken from a u8
      and a few bits, it is safe to assume that it will always be lower than
      or equal to 2*255, thus we can always prevent any overflow by accepting
      two less 255 steps.
      
      This patch also reduces the CPU overhead and actually increases performance
      by 1.1% compared to the initial code, while the previous fix costs 3.1%
      (measured on x86_64).
      
      The fix needs to be backported to all currently supported stable kernels.
      
      Reported-by: default avatarWillem Pinckaers <willem@lekkertech.net>
      Cc: "Don A. Bailey" <donb@securitymouse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72cf9012
    • Willy Tarreau's avatar
      Revert "lzo: properly check for overruns" · af958a38
      Willy Tarreau authored
      
      
      This reverts commit 206a81c1 ("lzo: properly check for overruns").
      
      As analysed by Willem Pinckaers, this fix is still incomplete on
      certain rare corner cases, and it is easier to restart from the
      original code.
      
      Reported-by: default avatarWillem Pinckaers <willem@lekkertech.net>
      Cc: "Don A. Bailey" <donb@securitymouse.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af958a38
  5. Sep 26, 2014
    • Alexei Starovoitov's avatar
      bpf: mini eBPF library, test stubs and verifier testsuite · 3c731eba
      Alexei Starovoitov authored
      
      
      1.
      the library includes a trivial set of BPF syscall wrappers:
      int bpf_create_map(int key_size, int value_size, int max_entries);
      int bpf_update_elem(int fd, void *key, void *value);
      int bpf_lookup_elem(int fd, void *key, void *value);
      int bpf_delete_elem(int fd, void *key);
      int bpf_get_next_key(int fd, void *key, void *next_key);
      int bpf_prog_load(enum bpf_prog_type prog_type,
      		  const struct sock_filter_int *insns, int insn_len,
      		  const char *license);
      bpf_prog_load() stores verifier log into global bpf_log_buf[] array
      
      and BPF_*() macros to build instructions
      
      2.
      test stubs configure eBPF infra with 'unspec' map and program types.
      These are fake types used by user space testsuite only.
      
      3.
      verifier tests valid and invalid programs and expects predefined
      error log messages from kernel.
      40 tests so far.
      
      $ sudo ./test_verifier
       #0 add+sub+mul OK
       #1 unreachable OK
       #2 unreachable2 OK
       #3 out of range jump OK
       #4 out of range jump2 OK
       #5 test1 ld_imm64 OK
       ...
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c731eba
    • Vladimir Zapolskiy's avatar
      genalloc: fix device node resource counter · 6f3aabd1
      Vladimir Zapolskiy authored
      
      
      Decrement the np_pool device_node refcount, which was incremented on
      the preceding of_parse_phandle() call.
      
      Signed-off-by: default avatarVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f3aabd1
  6. Sep 24, 2014
    • Tejun Heo's avatar
      percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky · 1cae13e7
      Tejun Heo authored
      
      
      Currently, a percpu_ref which is initialized with
      PERPCU_REF_INIT_ATOMIC or switched to atomic mode via
      switch_to_atomic() automatically reverts to percpu mode on the first
      percpu_ref_reinit().  This makes the atomic mode difficult to use for
      cases where a percpu_ref is used as a persistent on/off switch which
      may be cycled multiple times.
      
      This patch makes such atomic state sticky so that it survives through
      kill/reinit cycles.  After this patch, atomic state is cleared only by
      an explicit percpu_ref_switch_to_percpu() call.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      1cae13e7
    • Tejun Heo's avatar
      percpu_ref: add PERCPU_REF_INIT_* flags · 2aad2a86
      Tejun Heo authored
      
      
      With the recent addition of percpu_ref_reinit(), percpu_ref now can be
      used as a persistent switch which can be turned on and off repeatedly
      where turning off maps to killing the ref and waiting for it to drain;
      however, there currently isn't a way to initialize a percpu_ref in its
      off (killed and drained) state, which can be inconvenient for certain
      persistent switch use cases.
      
      Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic
      selection of operation mode; however, currently a newly initialized
      percpu_ref is always in percpu mode making it impossible to avoid the
      latency overhead of switching to atomic mode.
      
      This patch adds @flags to percpu_ref_init() and implements the
      following flags.
      
      * PERCPU_REF_INIT_ATOMIC	: start ref in atomic mode
      * PERCPU_REF_INIT_DEAD		: start ref killed and drained
      
      These flags should be able to serve the above two use cases.
      
      v2: target_core_tpg.c conversion was missing.  Fixed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      2aad2a86
    • Tejun Heo's avatar
      percpu_ref: decouple switching to percpu mode and reinit · f47ad457
      Tejun Heo authored
      
      
      percpu_ref has treated the dropping of the base reference and
      switching to atomic mode as an integral operation; however, there's
      nothing inherent tying the two together.
      
      The use cases for percpu_ref have been expanding continuously.  While
      the current init/kill/reinit/exit model can cover a lot, the coupling
      of kill/reinit with atomic/percpu mode switching is turning out to be
      too restrictive for use cases where many percpu_refs are created and
      destroyed back-to-back with only some of them reaching extended
      operation.  The coupling also makes implementing always-atomic debug
      mode difficult.
      
      This patch separates out percpu mode switching into
      percpu_ref_switch_to_percpu() and reimplements percpu_ref_reinit() on
      top of it.
      
      * DEAD still requires ATOMIC.  A dead ref can't be switched to percpu
        mode w/o going through reinit.
      
      v2: __percpu_ref_switch_to_percpu() was missing static.  Fixed.
          Reported by Fengguang aka kbuild test robot.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      f47ad457
    • Tejun Heo's avatar
      percpu_ref: decouple switching to atomic mode and killing · 490c79a6
      Tejun Heo authored
      
      
      percpu_ref has treated the dropping of the base reference and
      switching to atomic mode as an integral operation; however, there's
      nothing inherent tying the two together.
      
      The use cases for percpu_ref have been expanding continuously.  While
      the current init/kill/reinit/exit model can cover a lot, the coupling
      of kill/reinit with atomic/percpu mode switching is turning out to be
      too restrictive for use cases where many percpu_refs are created and
      destroyed back-to-back with only some of them reaching extended
      operation.  The coupling also makes implementing always-atomic debug
      mode difficult.
      
      This patch separates out atomic mode switching into
      percpu_ref_switch_to_atomic() and reimplements
      percpu_ref_kill_and_confirm() on top of it.
      
      * The handling of __PERCPU_REF_ATOMIC and __PERCPU_REF_DEAD is now
        differentiated.  Among get/put operations, percpu_ref_tryget_live()
        is the only one which cares about DEAD.
      
      * percpu_ref_switch_to_atomic() can be called multiple times on the
        same ref.  This means that multiple @confirm_switch may get queued
        up which we can't do reliably without extra memory area.  This is
        handled by making the later invocation synchronously wait for the
        completion of the previous one.  This isn't particularly desirable
        but such synchronous waits shouldn't happen in most cases.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      490c79a6
    • Tejun Heo's avatar
      percpu_ref: add PCPU_REF_DEAD · 27344a90
      Tejun Heo authored
      
      
      percpu_ref will be restructured so that percpu/atomic mode switching
      and reference killing are dedoupled.  In preparation, add
      PCPU_REF_DEAD and PCPU_REF_ATOMIC_DEAD which is OR of ATOMIC and DEAD.
      For now, ATOMIC and DEAD are changed together and all PCPU_REF_ATOMIC
      uses are converted to PCPU_REF_ATOMIC_DEAD without causing any
      behavior changes.
      
      percpu_ref_init() now specifies an explicit alignment when allocating
      the percpu counters so that the pointer has enough unused low bits to
      accomodate the flags.  Note that one flag was fine as min alignment
      for percpu memory is 2 bytes but two flags are already too many for
      the natural alignment of unsigned longs on archs like cris and m68k.
      
      v2: The original patch had BUILD_BUG_ON() which triggers if unsigned
          long's alignment isn't enough to accomodate the flags, which
          triggered on cris and m64k.  percpu_ref_init() updated to specify
          the required alignment explicitly.  Reported by Fengguang.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      27344a90
    • Tejun Heo's avatar
      percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch · 9e804d1f
      Tejun Heo authored
      
      
      percpu_ref will be restructured so that percpu/atomic mode switching
      and reference killing are dedoupled.  In preparation, do the following
      renames.
      
      * percpu_ref->confirm_kill	-> percpu_ref->confirm_switch
      * __PERCPU_REF_DEAD		-> __PERCPU_REF_ATOMIC
      * __percpu_ref_alive()		-> __ref_is_percpu()
      
      This patch is pure rename and doesn't introduce any functional
      changes.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      9e804d1f
    • Tejun Heo's avatar
      percpu_ref: replace pcpu_ prefix with percpu_ · eecc16ba
      Tejun Heo authored
      
      
      percpu_ref uses pcpu_ prefix for internal stuff and percpu_ for
      externally visible ones.  This is the same convention used in the
      percpu allocator implementation.  It works fine there but percpu_ref
      doesn't have too much internal-only stuff and scattered usages of
      pcpu_ prefix are confusing than helpful.
      
      This patch replaces all pcpu_ prefixes with percpu_.  This is pure
      rename and there's no functional change.  Note that PCPU_REF_DEAD is
      renamed to __PERCPU_REF_DEAD to signify that the flag is internal.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      eecc16ba
    • Tejun Heo's avatar
      percpu_ref: minor code and comment updates · 6251f997
      Tejun Heo authored
      
      
      * Some comments became stale.  Updated.
      * percpu_ref_tryget() unnecessarily initializes @ret.  Removed.
      * A blank line removed from percpu_ref_kill_rcu().
      * Explicit function name in a WARN format string replaced with __func__.
      * WARN_ON() in percpu_ref_reinit() converted to WARN_ON_ONCE().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      6251f997
    • Tejun Heo's avatar
      percpu_ref: relocate percpu_ref_reinit() · a2237370
      Tejun Heo authored
      
      
      percpu_ref is gonna go through restructuring.  Move
      percpu_ref_reinit() after percpu_ref_kill_and_confirm().  This will
      make later changes easier to follow and result in cleaner
      organization.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarKent Overstreet <kmo@daterainc.com>
      a2237370
    • Tejun Heo's avatar
      Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" · 9eca8046
      Tejun Heo authored
      
      
      This reverts commit 0a30288d, which
      was a temporary fix for SCSI blk-mq stall issue.  The following
      patches will fix the issue properly by introducing atomic mode to
      percpu_ref.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      9eca8046
    • Tejun Heo's avatar
      blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe · 0a30288d
      Tejun Heo authored
      
      
      blk-mq uses percpu_ref for its usage counter which tracks the number
      of in-flight commands and used to synchronously drain the queue on
      freeze.  percpu_ref shutdown takes measureable wallclock time as it
      involves a sched RCU grace period.  This means that draining a blk-mq
      takes measureable wallclock time.  One would think that this shouldn't
      matter as queue shutdown should be a rare event which takes place
      asynchronously w.r.t. userland.
      
      Unfortunately, SCSI probing involves synchronously setting up and then
      tearing down a lot of request_queues back-to-back for non-existent
      LUNs.  This means that SCSI probing may take more than ten seconds
      when scsi-mq is used.
      
      This will be properly fixed by implementing a mechanism to keep
      q->mq_usage_counter in atomic mode till genhd registration; however,
      that involves rather big updates to percpu_ref which is difficult to
      apply late in the devel cycle (v3.17-rc6 at the moment).  As a
      stop-gap measure till the proper fix can be implemented in the next
      cycle, this patch introduces __percpu_ref_kill_expedited() and makes
      blk_mq_freeze_queue() use it.  This is heavy-handed but should work
      for testing the experimental SCSI blk-mq implementation.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
      
      
      Fixes: add703fd ("blk-mq: use percpu_ref for mq usage count")
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Tested-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0a30288d
  7. Sep 22, 2014
  8. Sep 20, 2014
    • Tejun Heo's avatar
      percpu-refcount: make percpu_ref based on longs instead of ints · e625305b
      Tejun Heo authored
      
      
      percpu_ref is currently based on ints and the number of refs it can
      cover is (1 << 31).  This makes it impossible to use a percpu_ref to
      count memory objects or pages on 64bit machines as it may overflow.
      This forces those users to somehow aggregate the references before
      contributing to the percpu_ref which is often cumbersome and sometimes
      challenging to get the same level of performance as using the
      percpu_ref directly.
      
      While using ints for the percpu counters makes them pack tighter on
      64bit machines, the possible gain from using ints instead of longs is
      extremely small compared to the overall gain from per-cpu operation.
      This patch makes percpu_ref based on longs so that it can be used to
      directly count memory objects or pages.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      e625305b
    • Tejun Heo's avatar
      percpu-refcount: improve WARN messages · 4843c332
      Tejun Heo authored
      
      
      percpu_ref's WARN messages can be a lot more helpful by indicating
      who's the culprit.  Make them report the release function that the
      offending percpu-refcount is associated with.  This should make it a
      lot easier to track down the reported invalid refcnting operations.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      4843c332
  9. Sep 19, 2014
  10. Sep 16, 2014
  11. Sep 13, 2014
    • Linus Torvalds's avatar
      Make ARCH_HAS_FAST_MULTIPLIER a real config variable · 72d93104
      Linus Torvalds authored
      
      
      It used to be an ad-hoc hack defined by the x86 version of
      <asm/bitops.h> that enabled a couple of library routines to know whether
      an integer multiply is faster than repeated shifts and additions.
      
      This just makes it use the real Kconfig system instead, and makes x86
      (which was the only architecture that did this) select the option.
      
      NOTE! Even for x86, this really is kind of wrong.  If we cared, we would
      probably not enable this for builds optimized for netburst (P4), where
      shifts-and-adds are generally faster than multiplies.  This patch does
      *not* change that kind of logic, though, it is purely a syntactic change
      with no code changes.
      
      This was triggered by the fact that we have other places that really
      want to know "do I want to expand multiples by constants by hand or
      not", particularly the hash generation code.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72d93104
  12. Sep 12, 2014
    • David Howells's avatar
      KEYS: Fix termination condition in assoc array garbage collection · 95389b08
      David Howells authored
      
      
      This fixes CVE-2014-3631.
      
      It is possible for an associative array to end up with a shortcut node at the
      root of the tree if there are more than fan-out leaves in the tree, but they
      all crowd into the same slot in the lowest level (ie. they all have the same
      first nibble of their index keys).
      
      When assoc_array_gc() returns back up the tree after scanning some leaves, it
      can fall off of the root and crash because it assumes that the back pointer
      from a shortcut (after label ascend_old_tree) must point to a normal node -
      which isn't true of a shortcut node at the root.
      
      Should we find we're ascending rootwards over a shortcut, we should check to
      see if the backpointer is zero - and if it is, we have completed the scan.
      
      This particular bug cannot occur if the root node is not a shortcut - ie. if
      you have fewer than 17 keys in a keyring or if you have at least two keys that
      sit into separate slots (eg. a keyring and a non keyring).
      
      This can be reproduced by:
      
      	ring=`keyctl newring bar @s`
      	for ((i=1; i<=18; i++)); do last_key=`keyctl newring foo$i $ring`; done
      	keyctl timeout $last_key 2
      
      Doing this:
      
      	echo 3 >/proc/sys/kernel/keys/gc_delay
      
      first will speed things up.
      
      If we do fall off of the top of the tree, we get the following oops:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      IP: [<ffffffff8136cea7>] assoc_array_gc+0x2f7/0x540
      PGD dae15067 PUD cfc24067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: xt_nat xt_mark nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_ni
      CPU: 0 PID: 26011 Comm: kworker/0:1 Not tainted 3.14.9-200.fc20.x86_64 #1
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Workqueue: events key_garbage_collector
      task: ffff8800918bd580 ti: ffff8800aac14000 task.ti: ffff8800aac14000
      RIP: 0010:[<ffffffff8136cea7>] [<ffffffff8136cea7>] assoc_array_gc+0x2f7/0x540
      RSP: 0018:ffff8800aac15d40  EFLAGS: 00010206
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8800aaecacc0
      RDX: ffff8800daecf440 RSI: 0000000000000001 RDI: ffff8800aadc2bc0
      RBP: ffff8800aac15da8 R08: 0000000000000001 R09: 0000000000000003
      R10: ffffffff8136ccc7 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000070 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000018 CR3: 00000000db10d000 CR4: 00000000000006f0
      Stack:
       ffff8800aac15d50 0000000000000011 ffff8800aac15db8 ffffffff812e2a70
       ffff880091a00600 0000000000000000 ffff8800aadc2bc3 00000000cd42c987
       ffff88003702df20 ffff88003702dfa0 0000000053b65c09 ffff8800aac15fd8
      Call Trace:
       [<ffffffff812e2a70>] ? keyring_detect_cycle_iterator+0x30/0x30
       [<ffffffff812e3e75>] keyring_gc+0x75/0x80
       [<ffffffff812e1424>] key_garbage_collector+0x154/0x3c0
       [<ffffffff810a67b6>] process_one_work+0x176/0x430
       [<ffffffff810a744b>] worker_thread+0x11b/0x3a0
       [<ffffffff810a7330>] ? rescuer_thread+0x3b0/0x3b0
       [<ffffffff810ae1a8>] kthread+0xd8/0xf0
       [<ffffffff810ae0d0>] ? insert_kthread_work+0x40/0x40
       [<ffffffff816ffb7c>] ret_from_fork+0x7c/0xb0
       [<ffffffff810ae0d0>] ? insert_kthread_work+0x40/0x40
      Code: 08 4c 8b 22 0f 84 bf 00 00 00 41 83 c7 01 49 83 e4 fc 41 83 ff 0f 4c 89 65 c0 0f 8f 5a fe ff ff 48 8b 45 c0 4d 63 cf 49 83 c1 02 <4e> 8b 34 c8 4d 85 f6 0f 84 be 00 00 00 41 f6 c6 01 0f 84 92
      RIP  [<ffffffff8136cea7>] assoc_array_gc+0x2f7/0x540
       RSP <ffff8800aac15d40>
      CR2: 0000000000000018
      ---[ end trace 1129028a088c0cbd ]---
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      95389b08
  13. Sep 09, 2014
    • Alexei Starovoitov's avatar
      net: filter: add "load 64-bit immediate" eBPF instruction · 02ab695b
      Alexei Starovoitov authored
      
      
      add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
      All previous instructions were 8-byte. This is first 16-byte instruction.
      Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
      insn[0].code = BPF_LD | BPF_DW | BPF_IMM
      insn[0].dst_reg = destination register
      insn[0].imm = lower 32-bit
      insn[1].code = 0
      insn[1].imm = upper 32-bit
      All unused fields must be zero.
      
      Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
      which loads 32-bit immediate value into a register.
      
      x64 JITs it as single 'movabsq %rax, imm64'
      arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn
      
      Note that old eBPF programs are binary compatible with new interpreter.
      
      It helps eBPF programs load 64-bit constant into a register with one
      instruction instead of using two registers and 4 instructions:
      BPF_MOV32_IMM(R1, imm32)
      BPF_ALU64_IMM(BPF_LSH, R1, 32)
      BPF_MOV32_IMM(R2, imm32)
      BPF_ALU64_REG(BPF_OR, R1, R2)
      
      User space generated programs will use this instruction to load constants only.
      
      To tell kernel that user space needs a pointer the _pseudo_ variant of
      this instruction may be added later, which will use extra bits of encoding
      to indicate what type of pointer user space is asking kernel to provide.
      For example 'off' or 'src_reg' fields can be used for such purpose.
      src_reg = 1 could mean that user space is asking kernel to validate and
      load in-kernel map pointer.
      src_reg = 2 could mean that user space needs readonly data section pointer
      src_reg = 3 could mean that user space needs a pointer to per-cpu local data
      All such future pseudo instructions will not be carrying the actual pointer
      as part of the instruction, but rather will be treated as a request to kernel
      to provide one. The kernel will verify the request_for_a_pointer, then
      will drop _pseudo_ marking and will store actual internal pointer inside
      the instruction, so the end result is the interpreter and JITs never
      see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
      loads 64-bit immediate into a register. User space never operates on direct
      pointers and verifier can easily recognize request_for_pointer vs other
      instructions.
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02ab695b
    • Masanari Iida's avatar
      Documentation: Docbook: Fix generated DocBook/kernel-api.xml · da3dae54
      Masanari Iida authored
      
      
      This patch fix spelling typo found in DocBook/kernel-api.xml.
      It is because the file is generated from the source comments,
      I have to fix the comments in source codes.
      
      Signed-off-by: default avatarMasanari Iida <standby24x7@gmail.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      da3dae54
  14. Sep 08, 2014
    • Tejun Heo's avatar
      percpu-refcount: add @gfp to percpu_ref_init() · a34375ef
      Tejun Heo authored
      
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_refs too.
      
      This patch doesn't make any functional difference.
      
      v2: blk-mq conversion was missing.  Updated.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      a34375ef
    • Tejun Heo's avatar
      proportions: add @gfp to init functions · 20ae0079
      Tejun Heo authored
      
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      [flex_]proportions init functions so that !GFP_KERNEL allocation masks
      can be used with them too.
      
      This patch doesn't make any functional difference.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      20ae0079
    • Tejun Heo's avatar
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo authored
      
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
    • Tejun Heo's avatar
      percpu_counter: make percpu_counters_lock irq-safe · ebd8fef3
      Tejun Heo authored
      
      
      percpu_counter is scheduled to grow @gfp support to allow atomic
      initialization.  This patch makes percpu_counters_lock irq-safe so
      that it can be safely used from atomic contexts.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      ebd8fef3
  15. Sep 05, 2014
    • Daniel Borkmann's avatar
      net: bpf: make eBPF interpreter images read-only · 60a3b225
      Daniel Borkmann authored
      
      
      With eBPF getting more extended and exposure to user space is on it's way,
      hardening the memory range the interpreter uses to steer its command flow
      seems appropriate.  This patch moves the to be interpreted bytecode to
      read-only pages.
      
      In case we execute a corrupted BPF interpreter image for some reason e.g.
      caused by an attacker which got past a verifier stage, it would not only
      provide arbitrary read/write memory access but arbitrary function calls
      as well. After setting up the BPF interpreter image, its contents do not
      change until destruction time, thus we can setup the image on immutable
      made pages in order to mitigate modifications to that code. The idea
      is derived from commit 314beb9b ("x86: bpf_jit_comp: secure bpf jit
      against spraying attacks").
      
      This is possible because bpf_prog is not part of sk_filter anymore.
      After setup bpf_prog cannot be altered during its life-time. This prevents
      any modifications to the entire bpf_prog structure (incl. function/JIT
      image pointer).
      
      Every eBPF program (including classic BPF that are migrated) have to call
      bpf_prog_select_runtime() to select either interpreter or a JIT image
      as a last setup step, and they all are being freed via bpf_prog_free(),
      including non-JIT. Therefore, we can easily integrate this into the
      eBPF life-time, plus since we directly allocate a bpf_prog, we have no
      performance penalty.
      
      Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
      inspection of kernel_page_tables.  Brad Spengler proposed the same idea
      via Twitter during development of this patch.
      
      Joint work with Hannes Frederic Sowa.
      
      Suggested-by: default avatarBrad Spengler <spender@grsecurity.net>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Kees Cook <keescook@chromium.org>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60a3b225
  16. Sep 04, 2014
    • Ying Xue's avatar
      lib/rhashtable: allow user to set the minimum shifts of shrinking · 94000176
      Ying Xue authored
      
      
      Although rhashtable library allows user to specify a quiet big size
      for user's created hash table, the table may be shrunk to a
      very small size - HASH_MIN_SIZE(4) after object is removed from
      the table at the first time. Subsequently, even if the total amount
      of objects saved in the table is quite lower than user's initial
      setting in a long time, the hash table size is still dynamically
      adjusted by rhashtable_shrink() or rhashtable_expand() each time
      object is inserted or removed from the table. However, as
      synchronize_rcu() has to be called when table is shrunk or
      expanded by the two functions, we should permit user to set the
      minimum table size through configuring the minimum number of shifts
      according to user specific requirement, avoiding these expensive
      actions of shrinking or expanding because of calling synchronize_rcu().
      
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94000176
  17. Sep 03, 2014
    • Pablo Neira Ayuso's avatar
      rhashtable: fix lockdep splat in rhashtable_destroy() · ae82ddcf
      Pablo Neira Ayuso authored
      
      
      No need for rht_dereference() from rhashtable_destroy() since the
      existing callers don't hold the mutex when invoking this function
      from:
      
      1) Netlink, this is called in case of memory allocation errors in the
         initialization path, no nl_sk_hash_lock is held.
      2) Netfilter, this is called from the rcu callback, no nfnl_lock is
         held either.
      
      I think it's reasonable to assume that the caller has to make sure
      that no hash resizing may happen before releasing the bucket array.
      Therefore, the caller should be responsible for releasing this in a
      safe way, document this to make people aware of it.
      
      This resolves a rcu lockdep splat in nft_hash:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.16.0+ #178 Not tainted
      -------------------------------
      lib/rhashtable.c:596 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 1
      1 lock held by ksoftirqd/2/18:
       #0:  (rcu_callback){......}, at: [<ffffffff810918fd>] rcu_process_callbacks+0x27e/0x4c7
      
      stack backtrace:
      CPU: 2 PID: 18 Comm: ksoftirqd/2 Not tainted 3.16.0+ #178
      Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
       0000000000000001 ffff88011706bb68 ffffffff8143debc 0000000000000000
       ffff880117062610 ffff88011706bb98 ffffffff81077515 ffff8800ca041a50
       0000000000000004 ffff8800ca386480 ffff8800ca041a00 ffff88011706bbb8
      Call Trace:
       [<ffffffff8143debc>] dump_stack+0x4e/0x68
       [<ffffffff81077515>] lockdep_rcu_suspicious+0xfa/0x103
       [<ffffffff81228b1b>] rhashtable_destroy+0x46/0x52
       [<ffffffffa06f21a7>] nft_hash_destroy+0x73/0x82 [nft_hash]
      
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      ae82ddcf
    • David Howells's avatar
      KEYS: Fix use-after-free in assoc_array_gc() · 27419604
      David Howells authored
      
      
      An edit script should be considered inaccessible by a function once it has
      called assoc_array_apply_edit() or assoc_array_cancel_edit().
      
      However, assoc_array_gc() is accessing the edit script just after the
      gc_complete: label.
      
      Reported-by: default avatarAndreea-Cristina Bernat <bernat.ada@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarAndreea-Cristina Bernat <bernat.ada@gmail.com>
      cc: shemming@brocade.com
      cc: paulmck@linux.vnet.ibm.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      27419604
Loading