Nov 3, 2024

[C++] Next-Gen C++ Optimization Techniques

Reference:
Unlocking Modern CPU Power - Next-Gen C++ Optimization Techniques - Fedor G Pikus - C++Now 2024
https://vsdmars.blogspot.com/2016/01/likely-or-unlikely-easy-misleading.html
https://vsdmars.blogspot.com/2022/10/book-art-of-writing-efficient-programs.html

RCU:
https://vsdmars.blogspot.com/2024/07/c-rcu.html

TLB:
https://vsdmars.blogspot.com/2020/07/virtual-memory-refresh.html
https://vsdmars.blogspot.com/2020/07/pacific-2018re-read-designing-for.html
https://vsdmars.blogspot.com/2018/11/pacific-2018-designing-for-efficient.html


Modern CPUs rely on caches and pipelining to a much greater degree.
 Penalty for not using caches and for disrupting pipelines is far greater.

Memory access is characterized bny bandwidth and latency
 Bandwidth is much higher than 'latency per word'
 Random access speed is limited by latency
 Sequential access speed is limited by bandwidth

Prefetch attempts to predict future memory accesses and transfers memory content into cache in advance.
 Random access defeats prediction.









Key


In NUMA, the basic unit is NUMA node.

Solution to cross NUMA node latency-bound program


Trick: task_count_ as in main thread.


even better; batch processing

Redesign for NUMA data structure is intrusive.


CMD:
$ /sbin/lspci
$ cat /sys/bus/pci/devices/xxx/numa_node
$ numactl



GPU

I/O bound program



Real world cases
1) old code run slower on faster hardware



NUMA comes into play


Kernal flushes everything if TLB is outdated through 'TLB shootdown"; which is an inter-processor interrupt. The shootdown kernel code runs on the CPU. The 
shootdown is counted as 'system time' in the profiler.


NUMA migrations

Debugging TLB shootdown

Disable NUMA migration cmd:
$ echo 0 > /proc/sys/kernel/numa_balancing

Reduce TLB shootdown impact
Increase page size.(usually 4kb https://stackoverflow.com/a/11543988 )

$ echo madvise > /sys/kernel/mm/transparent_hugepage/enabled


madvise API

2) Kernel tuning

Monitoring everything(metrics) inside the code.

Pay attention to the hardware spec


Wrap-up





No comments:

Post a Comment

Note: Only a member of this blog may post a comment.