Categorized into 6 sections
- Cache Lines
- Hardware Prefetch
- Access Locality
- Multiple CPU Core consideration
- Write Combined Memory
- Address Translation
This talk can be a supplement to 'OOP Is Dead, Long Live Data-oriented Design - Stoyan Nikolov' talk.
Cache Lines
Transfer occur as cache lines.
(Think about Data Oriented Design)
reference:
Hardware Prefetch
Predictable access patterns are faster.
We need sequential locality.
Access Locality
Cache locality
- spatial
- temporal
Use vector.
Hash map with key designed being flat.
reference:
Paper:
Multiple CPU Core consideration
(MESI)
Write Combined Memory
Use compiler intrinsics:
- SSE2
- _mm_stream_si32: store 4 bytes
- _mm_Stream_si128: store 16 bytes
- AVX
- _mm256_stream_si256: store 32 bytes
- AVX-512
- _mm512_stream_si512: store 64 bytes
Address Translation
TLB Size (4KiB pages)
- Address translation can be a significant overhead.
- Large pages can help.
Linux
- Huge TLB Page
- Allocate on hugetlbfs
- Access via mmap or shared memory
- Transparent Huge Pages
- Latency spike bewared
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.