Jun 13, 2022

[C++][CPPCON] tips from 'High Performance Trading Systems'

Reference:
https://youtu.be/NH1Tta7purM

If you're not at all interested in performance, shouldn't you be in the Python room down the hall? - Scott Meyers
  • the hot-path is only exercised 0.01% of the time = the rest of the time the system is idle or doing administrative work
  • OS, networks and hardware are focused on throughput and fairness
  • Jitter is unacceptable



What matters

  • compiler(version)
  • machine architecture
  • 3rd party library
  • build and link flags


Template-based configuration

Using template to remove branches, eliminates code that won't be executed, etc.


Lambda functions are fast and convenient.

template<typename T>
void sendMsg(T&& lambda) {
    lambda();
}


Memory allocation

  • Allocations are costly
    Use a pool of preallocated objects
  • Reuse objects instead of deallocating
    Intrusive container
  • Delete large objects with another thread.
    beware of shared allocator


Exceptions is OK in (gcc, clang, msvc)

  • Zero cost if don't throw
  • Don't use exceptions for control flow, slow.


Multi-threading

Multi-threading is best avoided for latency-sensitive code
  • sync of data via locking is expensive
  • lock free code may still require locks at the hardware level
  • mind-bendingly complex
  • Easy for the producer to accidentally saturate the consumer
If multi-thread is a must
  • keep shared data to an absolute minimum
  • Multiple threads writing to the same cacheline will get expensive
  • Consider passing copies of data rather than sharing. e.g. single writer, single reader
  • lock free queue
  • If have to share data, consider not using synchronization
    e.g. maybe live with out-of sequence updates.


When using map, 
consider using open addressing algorithm map,
e.g. google's dense_hash_map

A hybrid approach:



Something about 'inline'

  • inline keyword mainly means: external linkage
  • attribute always_inline and noinline  are a stronger hint to the compiler, measure before use.


Keeping the cache hot:




Don't share L3 cache
    disable all but 1 core (or lock the cache)

If you do have multiple cores enabled, choose your neighbours carefully:
    - Noisy neighbours should probably be move to a different physical CPU

std::pow can be slow.


Don't use system-calls.


Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike


A language that doesn't affect the way you think about programming is not worth knowing. - Alan Perlis

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.