Commit 1e57930e authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull RCU update from Paul McKenney:

 - Documentation updates

 - Miscellaneous fixes

 - Callback-offloading updates, mainly simplifications

 - RCU-tasks updates, including some -rt fixups, handling of systems
   with sparse CPU numbering, and a fix for a boot-time race-condition
   failure

 - Put SRCU on a memory diet in order to reduce the size of the
   srcu_struct structure

 - Torture-test updates fixing some bugs in tests and closing some
   testing holes

 - Torture-test updates for the RCU tasks flavors, most notably ensuring
   that building rcutorture and friends does not change the
   RCU-tasks-related Kconfig options

 - Torture-test scripting updates

 - Expedited grace-period updates, most notably providing
   milliseconds-scale (not all that) soft real-time response from
   synchronize_rcu_expedited().

   This is also the first time in almost 30 years of RCU that someone
   other than me has pushed for a reduction in the RCU CPU stall-warning
   timeout, in this case by more than three orders of magnitude from 21
   seconds to 20 milliseconds. This tighter timeout applies only to
   expedited grace periods

* tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits)
  rcu: Move expedited grace period (GP) work to RT kthread_worker
  rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
  srcu: Drop needless initialization of sdp in srcu_gp_start()
  srcu: Prevent expedited GPs and blocking readers from consuming CPU
  srcu: Add contention check to call_srcu() srcu_data ->lock acquisition
  srcu: Automatically determine size-transition strategy at boot
  rcutorture: Make torture.sh allow for --kasan
  rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU
  rcutorture: Make kvm.sh allow more memory for --kasan runs
  torture: Save "make allmodconfig" .config file
  scftorture: Remove extraneous "scf" from per_version_boot_params
  rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC
  torture: Enable CSD-lock stall reports for scftorture
  torture: Skip vmlinux check for kvm-again.sh runs
  scftorture: Adjust for TASKS_RCU Kconfig option being selected
  rcuscale: Allow rcuscale without RCU Tasks Rude/Trace
  rcuscale: Allow rcuscale without RCU Tasks
  refscale: Allow refscale without RCU Tasks Rude/Trace
  refscale: Allow refscale without RCU Tasks
  rcutorture: Allow specifying per-scenario stat_interval
  ...
parents b2f02e9c ce133890
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -973,7 +973,7 @@ The ``->dynticks`` field counts the corresponding CPU's transitions to
and from either dyntick-idle or user mode, so that this counter has an
even value when the CPU is in dyntick-idle mode or user mode and an odd
value otherwise. The transitions to/from user mode need to be counted
for user mode adaptive-ticks support (see timers/NO_HZ.txt).
for user mode adaptive-ticks support (see Documentation/timers/no_hz.rst).

The ``->rcu_need_heavy_qs`` field is used to record the fact that the
RCU core code would really like to see a quiescent state from the
+1 −1
Original line number Diff line number Diff line
@@ -406,7 +406,7 @@ In earlier implementations, the task requesting the expedited grace
period also drove it to completion. This straightforward approach had
the disadvantage of needing to account for POSIX signals sent to user
tasks, so more recent implemementations use the Linux kernel's
`workqueues <https://www.kernel.org/doc/Documentation/core-api/workqueue.rst>`__.
workqueues (see Documentation/core-api/workqueue.rst).

The requesting task still does counter snapshotting and funnel-lock
processing, but the task reaching the top of the funnel lock does a
+34 −2
Original line number Diff line number Diff line
@@ -370,8 +370,8 @@ pointer fetched by rcu_dereference() may not be used outside of the
outermost RCU read-side critical section containing that
rcu_dereference(), unless protection of the corresponding data
element has been passed from RCU to some other synchronization
mechanism, most commonly locking or `reference
counting <https://www.kernel.org/doc/Documentation/RCU/rcuref.txt>`__.
mechanism, most commonly locking or reference counting
(see ../../rcuref.rst).

.. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF]
.. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf
@@ -2654,6 +2654,38 @@ synchronize_rcu(), and rcu_barrier(), respectively. In
three APIs are therefore implemented by separate functions that check
for voluntary context switches.

Tasks Rude RCU
~~~~~~~~~~~~~~

Some forms of tracing need to wait for all preemption-disabled regions
of code running on any online CPU, including those executed when RCU is
not watching.  This means that synchronize_rcu() is insufficient, and
Tasks Rude RCU must be used instead.  This flavor of RCU does its work by
forcing a workqueue to be scheduled on each online CPU, hence the "Rude"
moniker.  And this operation is considered to be quite rude by real-time
workloads that don't want their ``nohz_full`` CPUs receiving IPIs and
by battery-powered systems that don't want their idle CPUs to be awakened.

The tasks-rude-RCU API is also reader-marking-free and thus quite compact,
consisting of call_rcu_tasks_rude(), synchronize_rcu_tasks_rude(),
and rcu_barrier_tasks_rude().

Tasks Trace RCU
~~~~~~~~~~~~~~~

Some forms of tracing need to sleep in readers, but cannot tolerate
SRCU's read-side overhead, which includes a full memory barrier in both
srcu_read_lock() and srcu_read_unlock().  This need is handled by a
Tasks Trace RCU that uses scheduler locking and IPIs to synchronize with
readers.  Real-time systems that cannot tolerate IPIs may build their
kernels with ``CONFIG_TASKS_TRACE_RCU_READ_MB=y``, which avoids the IPIs at
the expense of adding full memory barriers to the read-side primitives.

The tasks-trace-RCU API is also reasonably compact,
consisting of rcu_read_lock_trace(), rcu_read_unlock_trace(),
rcu_read_lock_trace_held(), call_rcu_tasks_trace(),
synchronize_rcu_tasks_trace(), and rcu_barrier_tasks_trace().

Possible Future Changes
-----------------------

+2 −2
Original line number Diff line number Diff line
@@ -33,8 +33,8 @@ Situation 1: Hash Tables

Hash tables are often implemented as an array, where each array entry
has a linked-list hash chain.  Each hash chain can be protected by RCU
as described in the listRCU.txt document.  This approach also applies
to other array-of-list situations, such as radix trees.
as described in listRCU.rst.  This approach also applies to other
array-of-list situations, such as radix trees.

.. _static_arrays:

+4 −5
Original line number Diff line number Diff line
@@ -140,8 +140,7 @@ over a rather long period of time, but improvements are always welcome!
		prevents destructive compiler optimizations.  However,
		with a bit of devious creativity, it is possible to
		mishandle the return value from rcu_dereference().
		Please see rcu_dereference.txt in this directory for
		more information.
		Please see rcu_dereference.rst for more information.

		The rcu_dereference() primitive is used by the
		various "_rcu()" list-traversal primitives, such
@@ -151,7 +150,7 @@ over a rather long period of time, but improvements are always welcome!
		primitives.  This is particularly useful in code that
		is common to readers and updaters.  However, lockdep
		will complain if you access rcu_dereference() outside
		of an RCU read-side critical section.  See lockdep.txt
		of an RCU read-side critical section.  See lockdep.rst
		to learn what to do about this.

		Of course, neither rcu_dereference() nor the "_rcu()"
@@ -323,7 +322,7 @@ over a rather long period of time, but improvements are always welcome!
	primitives when the update-side lock is held is that doing so
	can be quite helpful in reducing code bloat when common code is
	shared between readers and updaters.  Additional primitives
	are provided for this case, as discussed in lockdep.txt.
	are provided for this case, as discussed in lockdep.rst.

	One exception to this rule is when data is only ever added to
	the linked data structure, and is never removed during any
@@ -480,4 +479,4 @@ over a rather long period of time, but improvements are always welcome!
	both rcu_barrier() and synchronize_rcu(), if necessary, using
	something like workqueues to to execute them concurrently.

	See rcubarrier.txt for more information.
	See rcubarrier.rst for more information.
Loading