Mar 16, 2022

[C++] memory model as for high performance concerns

Tips

The downside with sequential consistency is that it can hurt performance.
Use atomic with a relaxed memory model instead.




std::shared_ptr
To satisfy thread safety requirements, the reference counters are typically incremented using an equivalent of std::atomic::fetch_add with std::memory_order_relaxed (decrementing requires stronger ordering to safely destroy the control block).



Performance guidelines

  • correctly over performance
  • Avoid contention
  • Minimize the time spent in critical sections.
  • Avoid blocking operations (posix system call / sync call)
  • Be aware of number of threads/CPU cores
  • Thread priorities
    • Important for lowering the latency of tasks
  • Avoid priority inversion (ref: Golang's goroutine model; if one goroutine is starving in 1ms, has high priority)
    i.e A thread with high priority is waiting to acquire a lock that is currently held by a low-priority thread.
  • For real-time applications, we cannot use locks to protect any shared resources that need to be accessed by real-time threads.
    A thread that produces real-time audio, for example, runs with the highest possible priority, and in order to avoid priority inversion, it is not possible for the audio thread to call any functions (including std::malloc() ) that might block and cause a context switch.
  • Thread affinity;  a request to the scheduler that some threads should be executed on a particular core if possible, to minimize cache misses.
  • False sharing
    Pad each element in the array so that two adjacent elements cannot reside on the same cache line.
    Since C++17, there is a portable way of doing this using the std::hardware_destructive_interference_size constant defined in <new> in combination with the alignas specifier.

// Sqeeze data into same cache line thus true sharing
// Seperate data into different cache line thus avoid false sharing
// each vector element owns a cacheline
struct alignas(std::hardware_destructive_interference_size) Element {
	int counter_{};
};
auto elements = std::vector<Element>(num_threads);

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.