Commit ddfe1294 authored by Boqun Feng's avatar Boqun Feng Committed by Paul E. McKenney
Browse files

tools/memory-model: Provide extra ordering for unlock+lock pair on the same CPU

A recent discussion[1] shows that we are in favor of strengthening the
ordering of unlock + lock on the same CPU: a unlock and a po-after lock
should provide the so-called RCtso ordering, that is a memory access S
po-before the unlock should be ordered against a memory access R
po-after the lock, unless S is a store and R is a load.

The strengthening meets programmers' expection that "sequence of two
locked regions to be ordered wrt each other" (from Linus), and can
reduce the mental burden when using locks. Therefore add it in LKMM.

[1]: https://lore.kernel.org/lkml/20210909185937.GA12379@rowland.harvard.edu/



Co-developed-by: default avatarAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> (RISC-V)
Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
parent fa55b7dc
Loading
Loading
Loading
Loading
+25 −19
Original line number Original line Diff line number Diff line
@@ -1813,15 +1813,16 @@ spin_trylock() -- we can call these things lock-releases and
lock-acquires -- have two properties beyond those of ordinary releases
lock-acquires -- have two properties beyond those of ordinary releases
and acquires.
and acquires.


First, when a lock-acquire reads from a lock-release, the LKMM
First, when a lock-acquire reads from or is po-after a lock-release,
requires that every instruction po-before the lock-release must
the LKMM requires that every instruction po-before the lock-release
execute before any instruction po-after the lock-acquire.  This would
must execute before any instruction po-after the lock-acquire.  This
naturally hold if the release and acquire operations were on different
would naturally hold if the release and acquire operations were on
CPUs, but the LKMM says it holds even when they are on the same CPU.
different CPUs and accessed the same lock variable, but the LKMM says
For example:
it also holds when they are on the same CPU, even if they access
different lock variables.  For example:


	int x, y;
	int x, y;
	spinlock_t s;
	spinlock_t s, t;


	P0()
	P0()
	{
	{
@@ -1830,9 +1831,9 @@ For example:
		spin_lock(&s);
		spin_lock(&s);
		r1 = READ_ONCE(x);
		r1 = READ_ONCE(x);
		spin_unlock(&s);
		spin_unlock(&s);
		spin_lock(&s);
		spin_lock(&t);
		r2 = READ_ONCE(y);
		r2 = READ_ONCE(y);
		spin_unlock(&s);
		spin_unlock(&t);
	}
	}


	P1()
	P1()
@@ -1842,10 +1843,10 @@ For example:
		WRITE_ONCE(x, 1);
		WRITE_ONCE(x, 1);
	}
	}


Here the second spin_lock() reads from the first spin_unlock(), and
Here the second spin_lock() is po-after the first spin_unlock(), and
therefore the load of x must execute before the load of y.  Thus we
therefore the load of x must execute before the load of y, even though
cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the
the two locking operations use different locks.  Thus we cannot have
MP pattern).
r1 = 1 and r2 = 0 at the end (this is an instance of the MP pattern).


This requirement does not apply to ordinary release and acquire
This requirement does not apply to ordinary release and acquire
fences, only to lock-related operations.  For instance, suppose P0()
fences, only to lock-related operations.  For instance, suppose P0()
@@ -1872,13 +1873,13 @@ instructions in the following order:


and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
and thus it could load y before x, obtaining r2 = 0 and r1 = 1.


Second, when a lock-acquire reads from a lock-release, and some other
Second, when a lock-acquire reads from or is po-after a lock-release,
stores W and W' occur po-before the lock-release and po-after the
and some other stores W and W' occur po-before the lock-release and
lock-acquire respectively, the LKMM requires that W must propagate to
po-after the lock-acquire respectively, the LKMM requires that W must
each CPU before W' does.  For example, consider:
propagate to each CPU before W' does.  For example, consider:


	int x, y;
	int x, y;
	spinlock_t x;
	spinlock_t s;


	P0()
	P0()
	{
	{
@@ -1908,7 +1909,12 @@ each CPU before W' does. For example, consider:


If r1 = 1 at the end then the spin_lock() in P1 must have read from
If r1 = 1 at the end then the spin_lock() in P1 must have read from
the spin_unlock() in P0.  Hence the store to x must propagate to P2
the spin_unlock() in P0.  Hence the store to x must propagate to P2
before the store to y does, so we cannot have r2 = 1 and r3 = 0.
before the store to y does, so we cannot have r2 = 1 and r3 = 0.  But
if P1 had used a lock variable different from s, the writes could have
propagated in either order.  (On the other hand, if the code in P0 and
P1 had all executed on a single CPU, as in the example before this
one, then the writes would have propagated in order even if the two
critical sections used different lock variables.)


These two special requirements for lock-release and lock-acquire do
These two special requirements for lock-release and lock-acquire do
not arise from the operational model.  Nevertheless, kernel developers
not arise from the operational model.  Nevertheless, kernel developers
+3 −3
Original line number Original line Diff line number Diff line
@@ -27,7 +27,7 @@ include "lock.cat"
(* Release Acquire *)
(* Release Acquire *)
let acq-po = [Acquire] ; po ; [M]
let acq-po = [Acquire] ; po ; [M]
let po-rel = [M] ; po ; [Release]
let po-rel = [M] ; po ; [Release]
let po-unlock-rf-lock-po = po ; [UL] ; rf ; [LKR] ; po
let po-unlock-lock-po = po ; [UL] ; (po|rf) ; [LKR] ; po


(* Fences *)
(* Fences *)
let R4rmb = R \ Noreturn	(* Reads for which rmb works *)
let R4rmb = R \ Noreturn	(* Reads for which rmb works *)
@@ -70,12 +70,12 @@ let rwdep = (dep | ctrl) ; [W]
let overwrite = co | fr
let overwrite = co | fr
let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb)
let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb)
let to-r = addr | (dep ; [Marked] ; rfi)
let to-r = addr | (dep ; [Marked] ; rfi)
let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int)
let ppo = to-r | to-w | fence | (po-unlock-lock-po & int)


(* Propagation: Ordering from release operations and strong fences. *)
(* Propagation: Ordering from release operations and strong fences. *)
let A-cumul(r) = (rfe ; [Marked])? ; r
let A-cumul(r) = (rfe ; [Marked])? ; r
let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
	po-unlock-rf-lock-po) ; [Marked]
	po-unlock-lock-po) ; [Marked]
let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
	[Marked] ; rfe? ; [Marked]
	[Marked] ; rfe? ; [Marked]