Ataraxia through Epoché
Symphilosophein 狂者進取,狷者有所不為也。
Mar 30, 2025
Mar 28, 2025
Mar 25, 2025
[Bits manipulation]
align:
return (value + align_size - 1) & (~(align_size - 1);
has_single_bit:
return x && !(x & (x - 1));
midpoint:
return (a & b) + (a ^ b) / 2;
Mar 14, 2025
[cppcon-2024] Ultrafast Trading Systems in C++
Reference:
When Nanoseconds Matter: Ultrafast Trading Systems in C++ - David Gross - CppCon 2024
Latency constraint for algorithmic trading
It's not just about being fast but being fast to be accurate
Latency constraint to not drop packets on the NIC
Limited number of buffers
Not reading fast enough will cause the application to drop packets, causing an outage.
Up to hundreds of thousands of price updates per second
Principle 1: Most of time no need for node containers(associated container)
Paper: Optimizing Open Addressing
Principle 2: Understanding the problem(by looking at the data)
Principle 3: hand tailored(specialized) algorithms are key to achieve performance.
Running perf on a benchmark
Branchless binary search
Access HW counts with libpapi
Binary search - memory access
Principle 4: Simplicity is the ultimate sophistication.
Principle 5: Mechanical sympathy. Harmony with hardware.
- Solarflare - industry standard low-latency NIC (https://www.amd.com/en/products/ethernet-adapters.html)
- OpenOnLoad (https://github.com/majek/openonload)
- Easy to use: compatible with BSD sockets (no code changes needed!)
- $ onload --profile=latency ./your_binary
- TCPDirect (https://stacresearch.com/system/files/resource/files/STAC-Summit-7-November-2016-Solarflare.pdf)
- Custom TCP/UDP stack
- Reduced set of features
- EF_VI (https://docs.amd.com/r/en-US/ug1586-onload-user/ef_vi)
- Layer 2 API: interface, buffers and memory management (similar to DPDK)
- Lowest latency
Principle 6: True efficiency is found not in the layers of complexity we add, but in the unnecessary layers we remove.
- Why shared memory
- if you don't need sockets, no need to pay for their complexity
- As fast as it gets, kernel isn't involved in any operations(only if you mmap it)
- multi processes requires it - which is good for minimizing operational risk.
- What works well in shared memory
- Contiguous blocks of data: arrays.
- One writer, one or multiple readers, stay away from multiple writers.
- shm_open, mmap, munmap, shm_unlink, ftruncate, flock...
Feb 12, 2025
[Algorithm] Fowler–Noll–Vo (FNV) hash
static uint64_t FNVhash64(uint64_t val) {
const uint64_t FNV_offset_basis_64 = 0xCBF29CE484222325L;
const uint64_t FNV_prime_64 = 1099511628211L;
uint64_t hashval = FNV_offset_basis_64;
for (int i = 0; i < 8; i++) {
hashval = hashval ^ (val & 0xff);
hashval = hashval * FNV_prime_64;
val >>= 8;
}
return (hashval & 0x8000000000000000) ? -hashval : hashval;
}
Feb 6, 2025
Jan 31, 2025
[C++] transparently replaceable
https://eel.is/c++draft/basic.life#9
An object o1 is transparently replaceable by an object o2 if
(9.1) the storage that o2 occupies exactly overlays the storage that o1 occupied,
(9.2) o1 and o2 are of the same type (ignoring the top-level cv-qualifiers)
(9.3) o1 is not a const, complete object
(9.4) neither o1 nor o2 is a potentially-overlapping subobject ([intro.object])
(9.5) either o1 and o2 are both complete objects, or o1 and o2 are direct subobjects of objects p1 and p2 , respectively, and p1 is transparently replaceable by p2 .
Thus, corner case of std::optional<T>:
If optional<T> holds its T subobject using a [[no_unique_address]] member (in order to pack its bool into the tail padding of T), then you can't use a pointer to the old object to transparently point to the new one.