Skip to content
  1. Oct 31, 2018
  2. Oct 19, 2018
    • Waiman Long's avatar
      locking/lockdep: Fix debug_locks off performance problem · 9506a742
      Waiman Long authored
      
      
      It was found that when debug_locks was turned off because of a problem
      found by the lockdep code, the system performance could drop quite
      significantly when the lock_stat code was also configured into the
      kernel. For instance, parallel kernel build time on a 4-socket x86-64
      server nearly doubled.
      
      Further analysis into the cause of the slowdown traced back to the
      frequent call to debug_locks_off() from the __lock_acquired() function
      probably due to some inconsistent lockdep states with debug_locks
      off. The debug_locks_off() function did an unconditional atomic xchg
      to write a 0 value into debug_locks which had already been set to 0.
      This led to severe cacheline contention in the cacheline that held
      debug_locks.  As debug_locks is being referenced in quite a few different
      places in the kernel, this greatly slow down the system performance.
      
      To prevent that trashing of debug_locks cacheline, lock_acquired()
      and lock_contended() now checks the state of debug_locks before
      proceeding. The debug_locks_off() function is also modified to check
      debug_locks before calling __debug_locks_off().
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9506a742
  3. Oct 17, 2018
    • Waiman Long's avatar
      locking/pvqspinlock: Extend node size when pvqspinlock is configured · 0fa809ca
      Waiman Long authored
      
      
      The qspinlock code supports up to 4 levels of slowpath nesting using
      four per-CPU mcs_spinlock structures. For 64-bit architectures, they
      fit nicely in one 64-byte cacheline.
      
      For para-virtualized (PV) qspinlocks it needs to store more information
      in the per-CPU node structure than there is space for. It uses a trick
      to use a second cacheline to hold the extra information that it needs.
      So PV qspinlock needs to access two extra cachelines for its information
      whereas the native qspinlock code only needs one extra cacheline.
      
      Freshly added counter profiling of the qspinlock code, however, revealed
      that it was very rare to use more than two levels of slowpath nesting.
      So it doesn't make sense to penalize PV qspinlock code in order to have
      four mcs_spinlock structures in the same cacheline to optimize for a case
      in the native qspinlock code that rarely happens.
      
      Extend the per-CPU node structure to have two more long words when PV
      qspinlock locks are configured to hold the extra data that it needs.
      
      As a result, the PV qspinlock code will enjoy the same benefit of using
      just one extra cacheline like the native counterpart, for most cases.
      
      [ mingo: Minor changelog edits. ]
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1539697507-28084-2-git-send-email-longman@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0fa809ca
    • Waiman Long's avatar
      locking/qspinlock_stat: Count instances of nested lock slowpaths · 1222109a
      Waiman Long authored
      
      
      Queued spinlock supports up to 4 levels of lock slowpath nesting -
      user context, soft IRQ, hard IRQ and NMI. However, we are not sure how
      often the nesting happens.
      
      So add 3 more per-CPU stat counters to track the number of instances where
      nesting index goes to 1, 2 and 3 respectively.
      
      On a dual-socket 64-core 128-thread Zen server, the following were the
      new stat counter values under different circumstances:
      
               State                         slowpath   index1   index2   index3
               -----                         --------   ------   ------   -------
        After bootup                         1,012,150    82       0        0
        After parallel build + perf-top    125,195,009    82       0        0
      
      So the chance of having more than 2 levels of nesting is extremely low.
      
      [ mingo: Minor changelog edits. ]
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1539697507-28084-1-git-send-email-longman@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1222109a
  4. Oct 16, 2018
  5. Oct 09, 2018
    • Waiman Long's avatar
      locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y · 8ca2b56c
      Waiman Long authored
      
      
      A sizable portion of the CPU cycles spent on the __lock_acquire() is used
      up by the atomic increment of the class->ops stat counter. By taking it out
      from the lock_class structure and changing it to a per-cpu per-lock-class
      counter, we can reduce the amount of cacheline contention on the class
      structure when multiple CPUs are trying to acquire locks of the same
      class simultaneously.
      
      To limit the increase in memory consumption because of the percpu nature
      of that counter, it is now put back under the CONFIG_DEBUG_LOCKDEP
      config option. So the memory consumption increase will only occur if
      CONFIG_DEBUG_LOCKDEP is defined. The lock_class structure, however,
      is reduced in size by 16 bytes on 64-bit archs after ops removal and
      a minor restructuring of the fields.
      
      This patch also fixes a bug in the increment code as the counter is of
      the 'unsigned long' type, but atomic_inc() was used to increment it.
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/d66681f3-8781-9793-1dcf-2436a284550b@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8ca2b56c
  6. Oct 03, 2018
  7. Sep 11, 2018
  8. Sep 10, 2018
    • Colin Ian King's avatar
      locking/ww_mutex: Fix spelling mistake "cylic" -> "cyclic" · 0b405c65
      Colin Ian King authored
      
      
      Trivial fix to spelling mistake in pr_err() error message
      
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180824112235.8842-1-colin.king@canonical.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0b405c65
    • Ben Hutchings's avatar
      locking/lockdep: Delete unnecessary #include · dc5591a0
      Ben Hutchings authored
      
      
      Commit:
      
        c3bc8fd6 ("tracing: Centralize preemptirq tracepoints and unify their usage")
      
      added the inclusion of <trace/events/preemptirq.h>.
      
      liblockdep doesn't have a stub version of that header so now fails to build.
      
      However, commit:
      
        bff1b208 ("tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"")
      
      removed the use of functions declared in that header. So delete the #include.
      
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <alexander.levin@verizon.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: bff1b208 ("tracing: Partial revert of "tracing: Centralize ...")
      Fixes: c3bc8fd6 ("tracing: Centralize preemptirq tracepoints ...")
      Link: http://lkml.kernel.org/r/20180828203315.GD18030@decadent.org.uk
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dc5591a0
    • Waiman Long's avatar
      locking/rwsem: Make owner store task pointer of last owning reader · 925b9cd1
      Waiman Long authored
      
      
      Currently, when a reader acquires a lock, it only sets the
      RWSEM_READER_OWNED bit in the owner field. The other bits are simply
      not used. When debugging hanging cases involving rwsems and readers,
      the owner value does not provide much useful information at all.
      
      This patch modifies the current behavior to always store the task_struct
      pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This
      may be useful in debugging rwsem hanging cases especially if only one
      reader is involved. However, the task in the owner field may not the
      real owner or one of the real owners at all when the owner value is
      examined, for example, in a crash dump. So it is just an additional
      hint about the past history.
      
      If CONFIG_DEBUG_RWSEMS=y is enabled, the owner field will be checked at
      unlock time too to make sure the task pointer value is valid. That does
      have a slight performance cost and so is only enabled as part of that
      debug option.
      
      From the performance point of view, it is expected that the changes
      shouldn't have any noticeable performance impact. A rwsem microbenchmark
      (with 48 worker threads and 1:1 reader/writer ratio) was ran on a
      2-socket 24-core 48-thread Haswell system.  The locking rates on a
      4.19-rc1 based kernel were as follows:
      
        1) Unpatched kernel:				543.3 kops/s
        2) Patched kernel:				549.2 kops/s
        3) Patched kernel (CONFIG_DEBUG_RWSEMS on):	546.6 kops/s
      
      There was actually a slight increase in performance (1.1%) in this
      particular case. Maybe it was caused by the elimination of a branch or
      just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also
      had less than the expected impact on performance.
      
      The least significant 2 bits of the owner value are now used to designate
      the rwsem is readers owned and the owners are anonymous.
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1536265114-10842-1-git-send-email-longman@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      925b9cd1
    • Waiman Long's avatar
      locking/rwsem: Exit read lock slowpath if queue empty & no writer · 4b486b53
      Waiman Long authored
      
      
      It was discovered that a constant stream of readers with occassional
      writers pounding on a rwsem may cause many of the readers to enter the
      slowpath unnecessarily thus increasing latency and lowering performance.
      
      In the current code, a reader entering the slowpath critical section
      will unconditionally set the WAITING_BIAS, if not set yet, and clear
      its active count even if no one is in the wait queue and no writer
      is present. This causes some incoming readers to observe the presence
      of waiters in the wait queue and hence have to go into the slowpath
      themselves.
      
      With sufficient numbers of readers and a relatively short lock hold time,
      the WAITING_BIAS may be repeatedly turned on and off and a substantial
      portion of the readers will go into the slowpath sustaining a rather
      long queue in the wait queue spinlock and repeated WAITING_BIAS on/off
      cycle until the logjam is broken opportunistically.
      
      To avoid this situation from happening, an additional check is added to
      detect the special case that the reader in the critical section is the
      only one in the wait queue and no writer is present. When that happens,
      it can just exit the slowpath and return immediately as its active count
      has already been set in the lock.  Other incoming readers won't observe
      the presence of waiters and so will not be forced into the slowpath.
      
      The issue was found in a customer site where they had an application
      that pounded on the pread64 syscalls heavily on an XFS filesystem. The
      application was run in a recent 4-socket boxes with a lot of CPUs. They
      saw significant spinlock contention in the rwsem_down_read_failed() call.
      With this patch applied, the system CPU usage went down from 85% to 57%,
      and the spinlock contention in the pread64 syscalls was gone.
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1532459425-19204-1-git-send-email-longman@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4b486b53
    • Thomas Hellstrom's avatar
      locking/mutex: Fix mutex debug call and ww_mutex documentation · e13e2366
      Thomas Hellstrom authored
      
      
      The following commit:
      
        08295b3b ("Implement an algorithm choice for Wound-Wait mutexes")
      
      introduced a reference in the documentation to a function that was
      removed in an earlier commit.
      
      It also forgot to remove a call to debug_mutex_add_waiter() which is now
      unconditionally called by __mutex_add_waiter().
      
      Fix those bugs.
      
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dri-devel@lists.freedesktop.org
      Fixes: 08295b3b ("Implement an algorithm choice for Wound-Wait mutexes")
      Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellstrom@vmware.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e13e2366
  9. Aug 10, 2018
  10. Jul 31, 2018
    • Joel Fernandes (Google)'s avatar
      tracing: Centralize preemptirq tracepoints and unify their usage · c3bc8fd6
      Joel Fernandes (Google) authored
      This patch detaches the preemptirq tracepoints from the tracers and
      keeps it separate.
      
      Advantages:
      * Lockdep and irqsoff event can now run in parallel since they no longer
      have their own calls.
      
      * This unifies the usecase of adding hooks to an irqsoff and irqson
      event, and a preemptoff and preempton event.
        3 users of the events exist:
        - Lockdep
        - irqsoff and preemptoff tracers
        - irqs and preempt trace events
      
      The unification cleans up several ifdefs and makes the code in preempt
      tracer and irqsoff tracers simpler. It gets rid of all the horrific
      ifdeferry around PROVE_LOCKING and makes configuration of the different
      users of the tracepoints more easy and understandable. It also gets rid
      of the time_* function calls from the lockdep hooks used to call into
      the preemptirq tracer which is not needed anymore. The negative delta in
      lines of code in this patch is quite large too.
      
      In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
      as a single point for registering probes onto the tracepoints. With
      this,
      the web of config options for preempt/irq toggle tracepoints and its
      users becomes:
      
       PREEMPT_TRACER   PREEMPTIRQ_EVENTS  IRQSOFF_TRACER PROVE_LOCKING
             |                 |     \         |           |
             \    (selects)    /      \        \ (selects) /
            TRACE_PREEMPT_TOGGLE       ----> TRACE_IRQFLAGS
                            \                  /
                             \ (depends on)   /
                           PREEMPTIRQ_TRACEPOINTS
      
      Other than the performance tests mentioned in the previous patch, I also
      ran the locking API test suite. I verified that all tests cases are
      passing.
      
      I also injected issues by not registering lockdep probes onto the
      tracepoints and I see failures to confirm that the probes are indeed
      working.
      
      This series + lockdep probes not registered (just to inject errors):
      [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/12:FAILED|FAILED|  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/21:FAILED|FAILED|  ok  |
      [    0.000000]          hard-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
      [    0.000000]          soft-safe-A + irqs-on/12:FAILED|FAILED|  ok  |
      [    0.000000]          hard-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
      [    0.000000]          soft-safe-A + irqs-on/21:FAILED|FAILED|  ok  |
      [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      
      With this series + lockdep probes registered, all locking tests pass:
      
      [    0.000000]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |  ok  |
      [    0.000000]        sirq-safe-A => hirqs-on/21:  ok  |  ok  |  ok  |
      [    0.000000]          hard-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
      [    0.000000]          soft-safe-A + irqs-on/12:  ok  |  ok  |  ok  |
      [    0.000000]          hard-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
      [    0.000000]          soft-safe-A + irqs-on/21:  ok  |  ok  |  ok  |
      [    0.000000]     hard-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      [    0.000000]     soft-safe-A + unsafe-B #1/123:  ok  |  ok  |  ok  |
      
      Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org
      
      
      
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      c3bc8fd6
  11. Jul 30, 2018
  12. Jul 25, 2018
  13. Jul 03, 2018
    • Thomas Hellstrom's avatar
      locking: Implement an algorithm choice for Wound-Wait mutexes · 08295b3b
      Thomas Hellstrom authored
      The current Wound-Wait mutex algorithm is actually not Wound-Wait but
      Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
      is, contrary to Wait-Die a preemptive algorithm and is known to generate
      fewer backoffs. Testing reveals that this is true if the
      number of simultaneous contending transactions is small.
      As the number of simultaneous contending threads increases, Wait-Wound
      becomes inferior to Wait-Die in terms of elapsed time.
      Possibly due to the larger number of held locks of sleeping transactions.
      
      Update documentation and callers.
      
      Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
      
      
      tag patch-18-06-15
      
      Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
      chosen out of 100000. Four core Intel x86_64:
      
      Algorithm    #threads       Rollbacks  time
      Wound-Wait   4              ~100       ~17s.
      Wait-Die     4              ~150000    ~19s.
      Wound-Wait   16             ~360000    ~109s.
      Wait-Die     16             ~450000    ~82s.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Co-authored-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      08295b3b
    • Peter Ziljstra's avatar
      locking: WW mutex cleanup · 55f036ca
      Peter Ziljstra authored
      
      
      Make the WW mutex code more readable by adding comments, splitting up
      functions and pointing out that we're actually using the Wait-Die
      algorithm.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Co-authored-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      55f036ca
  14. Jun 25, 2018
    • Paul E. McKenney's avatar
      torture: Keep old-school dmesg format · 60500037
      Paul E. McKenney authored
      
      
      This commit adds "#define pr_fmt(fmt) fmt" to the torture-test files
      in order to keep the current dmesg format.  Once Joe's commits have
      hit mainline, these definitions will be changed in order to automatically
      generate the dmesg line prefix that the scripts expect.  This will have
      the beneficial side-effect of allowing printk() formats to be used more
      widely and of shortening some pr_*() lines.
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      60500037
    • Paul E. McKenney's avatar
      torture: Make online/offline messages appear only for verbose=2 · 90127d60
      Paul E. McKenney authored
      
      
      Some bugs reproduce quickly only at high CPU-hotplug rates, so the
      rcutorture TREE03 scenario now has only 200 milliseconds spacing between
      CPU-hotplug operations.  At this rate, the torture-test pair of console
      messages per operation becomes a bit voluminous.  This commit therefore
      converts the torture-test set of "verbose" kernel-boot arguments from
      bool to int, and prints the extra console messages only when verbose=2.
      The default is still verbose=1.
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      90127d60
  15. Jun 21, 2018
  16. Jun 20, 2018
  17. Jun 12, 2018
    • Kees Cook's avatar
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook authored
      
      
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      6396bb22
    • Kees Cook's avatar
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook authored
      
      
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      6da2ec56
  18. May 25, 2018
  19. May 16, 2018
    • Waiman Long's avatar
      locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN · 5a817641
      Waiman Long authored
      
      
      The filesystem freezing code needs to transfer ownership of a rwsem
      embedded in a percpu-rwsem from the task that does the freezing to
      another one that does the thawing by calling percpu_rwsem_release()
      after freezing and percpu_rwsem_acquire() before thawing.
      
      However, the new rwsem debug code runs afoul with this scheme by warning
      that the task that releases the rwsem isn't the one that acquires it,
      as reported by Amir Goldstein:
      
        DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
        WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79
      
        Call Trace:
         percpu_up_write+0x1f/0x28
         thaw_super_locked+0xdf/0x120
         do_vfs_ioctl+0x270/0x5f1
         ksys_ioctl+0x52/0x71
         __x64_sys_ioctl+0x16/0x19
         do_syscall_64+0x5d/0x167
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      To work properly with the rwsem debug code, we need to annotate that the
      rwsem ownership is unknown during the tranfer period until a brave soul
      comes forward to acquire the ownership. During that period, optimistic
      spinning will be disabled.
      
      Reported-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Tested-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Theodore Y. Ts'o <tytso@mit.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-fsdevel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1526420991-21213-3-git-send-email-longman@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5a817641
    • Waiman Long's avatar
      locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag · d7d760ef
      Waiman Long authored
      
      
      There are use cases where a rwsem can be acquired by one task, but
      released by another task. In thess cases, optimistic spinning may need
      to be disabled.  One example will be the filesystem freeze/thaw code
      where the task that freezes the filesystem will acquire a write lock
      on a rwsem and then un-owns it before returning to userspace. Later on,
      another task will come along, acquire the ownership, thaw the filesystem
      and release the rwsem.
      
      Bit 0 of the owner field was used to designate that it is a reader
      owned rwsem. It is now repurposed to mean that the owner of the rwsem
      is not known. If only bit 0 is set, the rwsem is reader owned. If bit
      0 and other bits are set, it is writer owned with an unknown owner.
      One such value for the latter case is (-1L). So we can set owner to 1 for
      reader-owned, -1 for writer-owned. The owner is unknown in both cases.
      
      To handle transfer of rwsem ownership, the higher level code should
      set the owner field to -1 to indicate a write-locked rwsem with unknown
      owner.  Optimistic spinning will be disabled in this case.
      
      Once the higher level code figures who the new owner is, it can then
      set the owner field accordingly.
      
      Tested-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Theodore Y. Ts'o <tytso@mit.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-fsdevel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1526420991-21213-2-git-send-email-longman@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d7d760ef
    • Christoph Hellwig's avatar
      proc: introduce proc_create_single{,_data} · 3f3942ac
      Christoph Hellwig authored
      
      
      Variants of proc_create{,_data} that directly take a seq_file show
      callback and drastically reduces the boilerplate code in the callers.
      
      All trivial callers converted over.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      3f3942ac
    • Christoph Hellwig's avatar
      proc: introduce proc_create_seq{,_data} · fddda2b7
      Christoph Hellwig authored
      
      
      Variants of proc_create{,_data} that directly take a struct seq_operations
      argument and drastically reduces the boilerplate code in the callers.
      
      All trivial callers converted over.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      fddda2b7
  20. May 14, 2018
  21. May 04, 2018
    • Peter Zijlstra's avatar
      locking/mutex: Optimize __mutex_trylock_fast() · c427f695
      Peter Zijlstra authored
      
      
      Use try_cmpxchg to avoid the pointless TEST instruction..
      And add the (missing) atomic_long_try_cmpxchg*() wrappery.
      
      On x86_64 this gives:
      
      0000000000000710 <mutex_lock>:						0000000000000710 <mutex_lock>:
       710:   65 48 8b 14 25 00 00    mov    %gs:0x0,%rdx                      710:   65 48 8b 14 25 00 00    mov    %gs:0x0,%rdx
       717:   00 00                                                            717:   00 00
                              715: R_X86_64_32S       current_task                                    715: R_X86_64_32S       current_task
       719:   31 c0                   xor    %eax,%eax                         719:   31 c0                   xor    %eax,%eax
       71b:   f0 48 0f b1 17          lock cmpxchg %rdx,(%rdi)                 71b:   f0 48 0f b1 17          lock cmpxchg %rdx,(%rdi)
       720:   48 85 c0                test   %rax,%rax                         720:   75 02                   jne    724 <mutex_lock+0x14>
       723:   75 02                   jne    727 <mutex_lock+0x17>             722:   f3 c3                   repz retq
       725:   f3 c3                   repz retq                                724:   eb da                   jmp    700 <__mutex_lock_slowpath>
       727:   eb d7                   jmp    700 <__mutex_lock_slowpath>       726:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
       729:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)                         72d:   00 00 00
      
      On ARM64 this gives:
      
      000000000000638 <mutex_lock>:						0000000000000638 <mutex_lock>:
           638:       d5384101        mrs     x1, sp_el0                           638:       d5384101        mrs     x1, sp_el0
           63c:       d2800002        mov     x2, #0x0                             63c:       d2800002        mov     x2, #0x0
           640:       f9800011        prfm    pstl1strm, [x0]                      640:       f9800011        prfm    pstl1strm, [x0]
           644:       c85ffc03        ldaxr   x3, [x0]                             644:       c85ffc03        ldaxr   x3, [x0]
           648:       ca020064        eor     x4, x3, x2                           648:       ca020064        eor     x4, x3, x2
           64c:       b5000064        cbnz    x4, 658 <mutex_lock+0x20>            64c:       b5000064        cbnz    x4, 658 <mutex_lock+0x20>
           650:       c8047c01        stxr    w4, x1, [x0]                         650:       c8047c01        stxr    w4, x1, [x0]
           654:       35ffff84        cbnz    w4, 644 <mutex_lock+0xc>             654:       35ffff84        cbnz    w4, 644 <mutex_lock+0xc>
           658:       b40000c3        cbz     x3, 670 <mutex_lock+0x38>            658:       b5000043        cbnz    x3, 660 <mutex_lock+0x28>
           65c:       a9bf7bfd        stp     x29, x30, [sp,#-16]!                 65c:       d65f03c0        ret
           660:       910003fd        mov     x29, sp                              660:       a9bf7bfd        stp     x29, x30, [sp,#-16]!
           664:       97ffffef        bl      620 <__mutex_lock_slowpath>          664:       910003fd        mov     x29, sp
           668:       a8c17bfd        ldp     x29, x30, [sp],#16                   668:       97ffffee        bl      620 <__mutex_lock_slowpath>
           66c:       d65f03c0        ret                                          66c:       a8c17bfd        ldp     x29, x30, [sp],#16
           670:       d65f03c0        ret                                          670:       d65f03c0        ret
      
      Reported-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c427f695
  22. Apr 27, 2018
Loading