Ataraxia through Epoché: [C++][CPPCON] tips from 'High Performance Trading Systems'

Reference:
https://youtu.be/NH1Tta7purM

If you're not at all interested in performance, shouldn't you be in the Python room down the hall? - Scott Meyers

the hot-path is only exercised 0.01% of the time = the rest of the time the system is idle or doing administrative work
OS, networks and hardware are focused on throughput and fairness
Jitter is unacceptable

What matters

compiler(version)
machine architecture
3rd party library
build and link flags

Template-based configuration

Using template to remove branches, eliminates code that won't be executed, etc.

Lambda functions are fast and convenient.

template<typename T>
void sendMsg(T&& lambda) {
    lambda();
}

Memory allocation

Allocations are costly
Use a pool of preallocated objects
Reuse objects instead of deallocating
Intrusive container
Delete large objects with another thread.
beware of shared allocator

Exceptions is OK in (gcc, clang, msvc)

Zero cost if don't throw
Don't use exceptions for control flow, slow.

Multi-threading

Multi-threading is best avoided for latency-sensitive code

sync of data via locking is expensive
lock free code may still require locks at the hardware level
mind-bendingly complex
Easy for the producer to accidentally saturate the consumer

If multi-thread is a must

keep shared data to an absolute minimum
Multiple threads writing to the same cacheline will get expensive
Consider passing copies of data rather than sharing. e.g. single writer, single reader
lock free queue
If have to share data, consider not using synchronization
e.g. maybe live with out-of sequence updates.

When using map,

consider using open addressing algorithm map,

e.g. google's dense_hash_map

A hybrid approach:

Something about 'inline'

inline keyword mainly means: external linkage
attribute always_inline and noinline are a stronger hint to the compiler, measure before use.

Keeping the cache hot:

Don't share L3 cache

disable all but 1 core (or lock the cache)

If you do have multiple cores enabled, choose your neighbours carefully:

- Noisy neighbours should probably be move to a different physical CPU

std::pow can be slow.

Don't use system-calls.

Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike

A language that doesn't affect the way you think about programming is not worth knowing. - Alan Perlis

Ataraxia through Epoché

Jun 13, 2022

[C++][CPPCON] tips from 'High Performance Trading Systems'