summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorPaolo Bonzini2018-02-23 13:58:31 +0100
committerPaolo Bonzini2018-03-12 16:12:47 +0100
commit729c0ddd3cdf16973d850b1ee7c5234a1e4dddbb (patch)
treefb2fe59ace4cb30b4d966c119429bd825c65a8c1 /docs
parentrcutorture: remove synchronize_rcu from readers (diff)
downloadqemu-729c0ddd3cdf16973d850b1ee7c5234a1e4dddbb.tar.gz
qemu-729c0ddd3cdf16973d850b1ee7c5234a1e4dddbb.tar.xz
qemu-729c0ddd3cdf16973d850b1ee7c5234a1e4dddbb.zip
docs: document atomic_load_acquire and atomic_store_release
We will use them in the next patch, document what they do. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'docs')
-rw-r--r--docs/devel/atomics.txt57
1 files changed, 30 insertions, 27 deletions
diff --git a/docs/devel/atomics.txt b/docs/devel/atomics.txt
index 10c5fa37e8..a4db3a4aaa 100644
--- a/docs/devel/atomics.txt
+++ b/docs/devel/atomics.txt
@@ -122,20 +122,30 @@ In general, if the algorithm you are writing includes both writes
and reads on the same side, it is generally simpler to use sequentially
consistent primitives.
-When using this model, variables are accessed with atomic_read() and
-atomic_set(), and restrictions to the ordering of accesses is enforced
+When using this model, variables are accessed with:
+
+- atomic_read() and atomic_set(); these prevent the compiler from
+ optimizing accesses out of existence and creating unsolicited
+ accesses, but do not otherwise impose any ordering on loads and
+ stores: both the compiler and the processor are free to reorder
+ them.
+
+- atomic_load_acquire(), which guarantees the LOAD to appear to
+ happen, with respect to the other components of the system,
+ before all the LOAD or STORE operations specified afterwards.
+ Operations coming before atomic_load_acquire() can still be
+ reordered after it.
+
+- atomic_store_release(), which guarantees the STORE to appear to
+ happen, with respect to the other components of the system,
+ after all the LOAD or STORE operations specified afterwards.
+ Operations coming after atomic_store_release() can still be
+ reordered after it.
+
+Restrictions to the ordering of accesses can also be specified
using the memory barrier macros: smp_rmb(), smp_wmb(), smp_mb(),
smp_mb_acquire(), smp_mb_release(), smp_read_barrier_depends().
-atomic_read() and atomic_set() prevents the compiler from using
-optimizations that might otherwise optimize accesses out of existence
-on the one hand, or that might create unsolicited accesses on the other.
-In general this should not have any effect, because the same compiler
-barriers are already implied by memory barriers. However, it is useful
-to do so, because it tells readers which variables are shared with
-other threads, and which are local to the current thread or protected
-by other, more mundane means.
-
Memory barriers control the order of references to shared memory.
They come in six kinds:
@@ -232,7 +242,7 @@ make atomic_mb_set() the more expensive operation.
There are two common cases in which atomic_mb_read and atomic_mb_set
generate too many memory barriers, and thus it can be useful to manually
-place barriers instead:
+place barriers, or use atomic_load_acquire/atomic_store_release instead:
- when a data structure has one thread that is always a writer
and one thread that is always a reader, manual placement of
@@ -243,18 +253,15 @@ place barriers instead:
thread 1 thread 1
------------------------- ------------------------
(other writes)
- smp_mb_release()
- atomic_mb_set(&a, x) atomic_set(&a, x)
- smp_wmb()
- atomic_mb_set(&b, y) atomic_set(&b, y)
+ atomic_mb_set(&a, x) atomic_store_release(&a, x)
+ atomic_mb_set(&b, y) atomic_store_release(&b, y)
=>
thread 2 thread 2
------------------------- ------------------------
- y = atomic_mb_read(&b) y = atomic_read(&b)
- smp_rmb()
- x = atomic_mb_read(&a) x = atomic_read(&a)
- smp_mb_acquire()
+ y = atomic_mb_read(&b) y = atomic_load_acquire(&b)
+ x = atomic_mb_read(&a) x = atomic_load_acquire(&a)
+ (other reads)
Note that the barrier between the stores in thread 1, and between
the loads in thread 2, has been optimized here to a write or a
@@ -276,7 +283,6 @@ place barriers instead:
smp_mb_acquire();
Similarly, atomic_mb_set() can be transformed as follows:
- smp_mb():
smp_mb_release();
for (i = 0; i < 10; i++) => for (i = 0; i < 10; i++)
@@ -284,6 +290,8 @@ place barriers instead:
smp_mb();
+ The other thread can still use atomic_mb_read()/atomic_mb_set().
+
The two tricks can be combined. In this case, splitting a loop in
two lets you hoist the barriers out of the loops _and_ eliminate the
expensive smp_mb():
@@ -296,8 +304,6 @@ expensive smp_mb():
atomic_set(&a[i], false);
smp_mb();
- The other thread can still use atomic_mb_read()/atomic_mb_set()
-
Memory barrier pairing
----------------------
@@ -386,10 +392,7 @@ and memory barriers, and the equivalents in QEMU:
note that smp_store_mb() is a little weaker than atomic_mb_set().
atomic_mb_read() compiles to the same instructions as Linux's
smp_load_acquire(), but this should be treated as an implementation
- detail. QEMU does have atomic_load_acquire() and atomic_store_release()
- macros, but for now they are only used within atomic.h. This may
- change in the future.
-
+ detail.
SOURCES
=======