Reference:
What do you mean by "Cache Friendly"? - Björn Fahller
What Every Programmer Should Know About Memory
Simplistic model of cache behaviour
Includes
- The cache is small
- and consists of fixed size lines
- and is very very fast when hit
- and missing is very slow
Excludes
- Multiple levels of caches
- Associativity
- Threading
All models are wrong, but some are useful
$ valgrind --tool=callgrind --cache-sim=yes --dump-instr=yes --branch-sim=yes
$ perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-Load-misses --call-graph=dwarf
$ perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-Load-misses --call-graph=dwarf
$ pert report
$ hotspot perf.data
$ perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-Load-misses --call-graph=dwarf
$ perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-Load-misses --call-graph=dwarf
$ pert report
$ hotspot perf.data
macos:
$ samply
Rule of thumb
follow/chasing pointer -> cache miss.
So,
Get rid of pointers.
Use std::vector (contiguous memory) vs. prev/next pointers.
std::vector be aware of memmove/memcpy.
memmove cause cache miss; if possible, memcpy.
Thus try to move less data.
binary tree/search still chasing pointer,
can we get Log(n) lookup/insert without chasing pointers?
Heap. Use contiguous memory(e.g. std::vector) to implement.
But the heap is not searchable. Need extra contiguous memory to store the elements in order;
and the element holds the index to the contigous memory. Thus when the smallest heap
is processed, use the index to the contiguous memory and process that chunk.
Trade off: extra memory usage for speed and less cache miss.
Can we make it better?
B-Heap(binary heap implemented to keep subtrees in a single page); smaller heap fit into cache-line.
(not really better, depends on use case. i.e. if within a loop, the cache is hot, good.
otherwise, not worth the extra instructions due to cache is flushed anyways.)









No comments:
Post a Comment
Note: Only a member of this blog may post a comment.