https://youtu.be/NH1Tta7purM
If you're not at all interested in performance, shouldn't you be in the Python room down the hall? - Scott Meyers
- the hot-path is only exercised 0.01% of the time = the rest of the time the system is idle or doing administrative work
- OS, networks and hardware are focused on throughput and fairness
- Jitter is unacceptable
What matters
- compiler(version)
- machine architecture
- 3rd party library
- build and link flags
Template-based configuration
Using template to remove branches, eliminates code that won't be executed, etc.
Lambda functions are fast and convenient.
template<typename T>
void sendMsg(T&& lambda) {
lambda();
}
Memory allocation
- Allocations are costly
Use a pool of preallocated objects - Reuse objects instead of deallocating
Intrusive container - Delete large objects with another thread.
beware of shared allocator
Exceptions is OK in (gcc, clang, msvc)
- Zero cost if don't throw
- Don't use exceptions for control flow, slow.
Multi-threading
Multi-threading is best avoided for latency-sensitive code
- sync of data via locking is expensive
- lock free code may still require locks at the hardware level
- mind-bendingly complex
- Easy for the producer to accidentally saturate the consumer
If multi-thread is a must
- keep shared data to an absolute minimum
- Multiple threads writing to the same cacheline will get expensive
- Consider passing copies of data rather than sharing. e.g. single writer, single reader
- lock free queue
- If have to share data, consider not using synchronization
e.g. maybe live with out-of sequence updates.
When using map,
consider using open addressing algorithm map,
e.g. google's dense_hash_map
A hybrid approach:
Something about 'inline'
- inline keyword mainly means: external linkage
- attribute always_inline and noinline are a stronger hint to the compiler, measure before use.
Keeping the cache hot:
Don't share L3 cache
disable all but 1 core (or lock the cache)
If you do have multiple cores enabled, choose your neighbours carefully:
- Noisy neighbours should probably be move to a different physical CPU
std::pow can be slow.
Don't use system-calls.
Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike
A language that doesn't affect the way you think about programming is not worth knowing. - Alan Perlis
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.