Reference:
In Praise of Idleness [Bruce Dawson]
https://docs.microsoft.com/en-us/windows/desktop/DxTechArts/lockless-programming
http://preshing.com/20120226/roll-your-own-lightweight-mutex/
In Praise of Idleness [Bruce Dawson]
https://docs.microsoft.com/en-us/windows/desktop/DxTechArts/lockless-programming
http://preshing.com/20120226/roll-your-own-lightweight-mutex/
https://rigtorp.se/spinlock/
They(CPU/GPU) are related:
They(CPU/GPU) are related:
- Each spinning thread draws power, and the total power is capped for both CPU and GPU.
i.e A spinning thread will cause the lower of all cores' performance. - Calling sleep(0) is BAD.
(system call, context switch, from user space to kernel space etc.) - Hyperthreading means that two logical cores residing on one physical core may be sharing resources, such as execution units or L1 caches.
Even using a logical core doing spinning is BAD. - Compare And Swap is BAD.
https://en.cppreference.com/w/cpp/atomic/atomic_compare_exchange
It's a global operation, trigger RMW/WMB.
Ref: http://vsdmars.blogspot.com/2015/10/c-concurrent-notenote-ch5-study-note.html - Becareful of false sharing/ping_pong effect between cores' cache lines.
- On PC, it's unrealistic to set thread affinity, which hinders performance.
- Don't rely on thread's priority set by the OS.
Spinlock implement:
The CMPXCHG (compare-exchange) instruction on x86/x64
always writes to the target memory location, even if the comparison fails.
i.e It sends read-invalidate while doing a WMB.
i.e It sends read-invalidate while doing a WMB.
void Lock(AtomicI32& _spinLock)
{
for (;;)
{
i32 expected = 0;
i32 store = 1;
// invalidate each other's cache line even if a third Core has the lock...
if (_spinLock.compareExchange_Acquire(expected, store))
break;
}
}
vs.
(Much Better)
void Lock(AtomicI32& _spinLock)
{
for (;;)
{
// Only do CAS if no lock has been acquired.
if (_spinLock.load_Acquire() == 0)
{
i32 expected = 0;
i32 store = 1;
if (_spinLock.compareExchange_Acquire(expected, store))
break;
}
// Pause Intrinsic https://software.intel.com/en-us/node/524249
// Safe ENERGY by using 'pause', which waits the current loop finish it's works.
EMIT_PAUSE_INSTRUCTION();
}
}
As for 'pause':
Pause Intrinsic can help prevent a busy wait from completely overwhelming the system, by inserting pauses in the instruction stream that prevent the busy loop from overwhelming the processor.
This is particularly important on hyperthreaded systems since it gives the other logical core time to run.
If must busy wait then be sure to use pause.
Checkout folly's implementation:
https://github.com/facebook/folly/blob/master/folly/synchronization/MicroSpinLock.h
Use Folly::MicroLock instead, nice document in the header file:
https://github.com/facebook/folly/blob/master/folly/MicroLock.h
Uses acquire and release, do NOT use sequential consistency which is unnecessary and slow.
struct spinlock {
std::atomic<bool> lock_ = {0};
void lock() noexcept {
for (;;) {
// Optimistically assume the lock is free on the first try
if (!lock_.exchange(true, std::memory_order_acquire)) {
return;
}
// Wait for lock to be released without generating cache misses
while (lock_.load(std::memory_order_relaxed)) {
// Issue X86 PAUSE or ARM YIELD instruction to reduce contention between
// hyper-threads
__builtin_ia32_pause();
}
}
}
bool try_lock() noexcept {
// First do a relaxed load to check if lock is free in order to prevent
// unnecessary cache misses if someone does while(!try_lock())
return !lock_.load(std::memory_order_relaxed) &&
!lock_.exchange(true, std::memory_order_acquire);
}
void unlock() noexcept {
lock_.store(false, std::memory_order_release);
}
};
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.