Dec 8, 2022

[C++/Rust] use of thread_local in code.

Reference:
  1. All about thread-local storage by MaskRay
  2. A Deep dive into (implicit) Thread Local Storage; in detail about use cases for thread_local.
  3. ELF Handling For Thread-Local Storage by Ulrich Drepper
  4. clang attribute 'tls-model'
  5. How fast is thread local variable access on Linux
  6. Mastering x86 Memory Segmentation
  7. x86 and amd64 instruction reference

This note is focused on C++ coding practice with thread_local; knowledge are collected from daily engineering and references above.


C++ Language definitions:
  1. Zero-initialization
    https://en.cppreference.com/w/cpp/language/zero_initialization
    https://vsdmars.blogspot.com/2014/04/c11-zero-initialisation-for-classes.html
  2. Constant initialization
    https://en.cppreference.com/w/cpp/language/constant_initialization
    Init. Rule memorize: C.Z , Constant first if possible, then Zero init.
  3. constinit specifier
    https://en.cppreference.com/w/cpp/language/constinit
  4. Potentially-evaluated expressions
    https://en.cppreference.com/w/cpp/language/expressions#Potentially-evaluated_expressions
  5. [C++20] consteval / constexpr
    https://vsdmars.blogspot.com/2022/06/cc20-consteval-constexpr.html


  6. The thread_local keyword is only allowed for objects declared at namespace scope, objects declared at block scope, and static data members.
    It indicates that the object has thread storage duration.
    If thread_local is the only storage class specifier applied to a block scope variable, static is also implied.
    It can be combined with static or extern to specify internal or external linkage (except for static data members which always have external linkage) respectively.
    It can be combined with constinit to reduce overhead that would otherwise be incurred by a hidden guard variable.
  7. thread storage duration. The storage for the object is allocated when the thread begins and deallocated when the thread ends. Each thread has its own instance of the object(i.e. clone()). Only objects declared thread_local have this storage duration. thread_local can appear together with static or extern to adjust linkage.
  8. thread_local is init. ordered in C.Z; i.e first init. with const-init; if can't, do zero-init.
When variable decorated with static or thread_local it will be constant initialized if possible than a runtime zero initialization. [[basic.start.static]]
i.e.
#include <iostream>
using namespace std;

bool runtimeFunc() {
  return std::is_constant_evaluated(); // always false
}

constexpr bool constexprFunc() {
  return std::is_constant_evaluated(); // may be false or true
}

consteval bool constevalFunc() {
  return std::is_constant_evaluated(); // always true
}

void foo() {
  static bool v1 = constexprFunc();       // T

  // implicit static
  thread_local bool v2 = constexprFunc(); // T
  thread_local bool v3 = constevalFunc(); // T

  int y = 42;
  static int v4 = y + runtimeFunc();         // 42
  static int v5 = y + constexprFunc();       // 42
  static int v6 = y + constevalFunc();       // 43

  // implicit static
  thread_local int v7 = y + runtimeFunc();   // 42
  thread_local int v8 = y + constexprFunc(); // 42
  thread_local int v9 = y + constevalFunc(); // 43
}

int main() { foo(); }

Usage
  • thread_local should not be used in signal handler; while signal handler can be called in different threads, thus the fact that thread_local is not sync between threads can introduce buggy logic.
  • thread_local is relatively slow in DSO usecases, use local caching instead.
        1 instruction in Windows, Linux
        3-4 in OSX
  • in dlopen; DSO interacts with thread_local as follows:
    • When a thread starts(i.e. clone()), init. thread_local objects with thread storage duration at namespace scope.
      When a thread exits, destruct objects with thread storage duration.
  • What happens if the library is unloaded before all threads exit?
    • In glibc, use RTLD_NODELETE, this will have DF_1_NODELETE set in ELF, thus does not unload the shared object during dlclose().
    • Consequently, the object's static and global variables are not reinitialized if the object is reloaded with dlopen() at a later time.
    • Also,  dlclose() in the middle of destructing thread_local objects is a no-op when RTLD_NODELETE is used.
    • Usecases for thread_local in DSO can be slow due to __tls_get_addr@plt to get the address of the thread_local variable out of the DSO.
  • Thread local variables should not be used in coroutines to prevent buggy logic.
    https://rules.sonarsource.com/cpp/RSPEC-6367


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.