Oct 22, 2014

[C++][Note] lock free programming in C++

synchronize-with == happens-before



Meaningful only for shared variable among threads.
  • RMW
    • std::atomic::fetch_add
      • Not guaranteed lock free, use std::atomic<>::is_lock_free to make sure.
    • std::atomic_compare_exchange_weak
      • 將Destination 與 Comparand 比對. 相同表示沒變
        沒變表示可以將Exchange assign給Destination, 並把原本的
        Destination 返回
      • used in while/for loop
    • std::atomic_compare_exchange_strong
      • used not inside a loop.
  • Difference between fence and operation
    • Operation
      • only a store can be a release operation.
        An operation with release semantics is one which does not permit preceding memory operations to be delayed past it.
      • only a load can be an acquire operation.
        An operation with acquire semantics is one which does not permit subsequent memory operations to be advanced before it.
    • Fence
      • An acquire fence prevents the memory reordering of any read which precedes acquire fence in program order with any read or write which follows acquire fence in program order.
      • A release fence prevents the memory reordering of any read or write which precedes release fence in program order with any write which follows release fence in program order.
        i.e A release fence actually prevents all preceding memory operations from being reordered past subsequent writes.
-----------------------------------------------------------------
  • #LoadLoad
if (IsPublished) // Load and check shared flag
{
LOADLOAD_FENCE(); // Prevent reordering of loads
return Value; // Load published value
}


  • #StoreStore
Value = x; // Publish some data
STORESTORE_FENCE();
IsPublished = 1; // Set shared flag to indicate availability of data


  • #LoadStore
    • prevent go store first then load if there's a miss on load.
  • #StoreLoad
    • A StoreLoad barrier ensures that all stores performed before the barrier are visible to other processors, and that all loads performed after the barrier receive the latest value that is visible at the time of the barrier.
    • In other words, it effectively prevents reordering of all stores before the barrier against all loads after the barrier
    • On most processors, instructions that act as a #StoreLoad barrier tend to be more expensive than instructions acting as the other barrier types.
-----------------------------------------------------------------
void SendTestMessage(void* param)
{
// Copy to shared memory using non‐atomic stores.
g_payload.tick = clock();
g_payload.str = "TestMessage";
g_payload.param = param;
// Perform an atomic write‐release to indicate that the message is ready.
g_guard.store(1, std::memory_order_release);
}

//-----

bool TryReceiveMessage(Message& result)
{
// Perform an atomic read‐acquire to check whether the message is ready.
int ready = g_guard.load(std::memory_order_acquire);
if (ready != 0)
{
// Yes. Copy from shared memory using non‐atomic loads.
result.tick = g_payload.tick;
result.str = g_payload.str;
result.param = g_payload.param;
return true;
}
// No.
return false;
}
-----------------------------------------------------------------
consume operations:

  • always safely replace it with memory_order_acquire
  • acquire operations provide all of the guarantees of consume operations, and then some. In other words, acquire is stronger.
  • Both consume and acquire serve the same purpose: To help pass nonatomic information safely between threads. 
  • Like acquire operations, a consume operation must be combined with a release operation in another thread.
  • (PowerPC and ARM)weakly ordered CPUs, some cases they do enforce memory ordering at the machine instruction level without the need for explicit memory barrier instructions. They preserve memory ordering between data dependent instructions. 
    • Definition:
      • Two machine instructions, executed in the same thread, are data dependent whenever the first instruction outputs a value and the second instruction uses that value as input
    • Because there is a data dependency between these two instructions, the loads will be performed in order.
  • Consume semantics basically trying to make the compiler exploit data dependencies on all weak memory order processor families.
  • It’s not enough to simply change memory_order_acquire to memory_order_consume. Must also make sure there are data dependency chains at the C++ source code level.
  • At the source code level, a dependency chain is a sequence of expressions whose evaluations all carry a dependency to each another.
  • Carries a dependency:
    • one evaluation carries a dependency to another if the value of the first is used as an operand of the second.
atomic<int> Guard(0);
int Payload = 0;
//Thread - 1

Payload = 42;
Guard.store(1, memory_order_release);

//Thread - 2
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;


-----------------------------------------------------------------
atomic_load specialize for shared_ptr
shared_ptr<T> a;
auto p = atomic_load(&a); // We don't have atomic<shared_ptr<t>> p; p.load(); p.compare_exchange_weak(e,d); yet..
std::atomic_compare_exchange_weak(&a,&e,d);

-----------------------------------------------------------------
quote from stackoverflow:

[[carries_dependency]] is used to allow dependencies to be carried across function calls.
This potentially allows the compiler to generate better code when used with std::memory_order_consume
for transferring values between threads on platforms with weakly-ordered architectures
such as IBM's POWER architecture.

In particular, if a value read with memory_order_consume is passed in to a function,
then without [[carries_dependency]], then the compiler may have to issue a memory fence
instruction to guarantee that the appropriate memory ordering semantics are upheld.
If the parameter is annotated with [[carries_dependency]] then the compiler can assume
that the function body will correctly carry the dependency, and this fence may no longer be necessary.

Similarly, if a function returns a value loaded with memory_order_consume,
or derived from such a value, then without [[carries_dependency]] the compiler may be required
to insert a fence instruction to guarantee that the appropriate memory ordering semantics are upheld.
With the [[carries_dependency]] annotation, this fence may no longer be necessary,
as the caller is now responsible for maintaining the dependency tree.

e.g.
void print(int * val)
{
    std::cout<<*p<<std::endl;
}

void print2(int * [[carries_dependency]] val)
{
    std::cout<<*p<<std::endl;
}

std::atomic<int*> p;
int* local=p.load(std::memory_order_consume);
if(local)
    std::cout<<*local<<std::endl; // 1

if(local)
    print(local); // 2

if(local)
    print2(local); // 3


In line (1), the dependency is explicit, so the compiler knows that local is dereferenced,
and that it must ensure that the dependency chain is preserved in order to avoid a fence on POWER.

In line (2), the definition of print is opaque (assuming it isn't inlined), so the compiler must
issue a fence in order ensure that reading *p in print returns the correct value.

On line (3), the compiler can assume that although print2 is also opaque then the dependency from
the parameter to the dereferenced value is preserved in the instruction stream, and no fence is
necessary on POWER.

Obviously, the definition of print2 must actually preserve this dependency,
so the attribute will also impact the generated code for print2.

Reference:
articles on http://preshing.com/
What does the [[carries_dependency]] attribute mean?
ABA_problem
C++11/14/1z Atomic operations library
CppCon 2014: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades), Part I"
CppCon 2014: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades), Part II"

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.