- https://v8.dev/blog/oilpan-pointer-compression
- https://v8.dev/blog/oilpan-library
- oilpan code [google git]
- https://en.wikipedia.org/wiki/64-bit_computing#Limits_of_processors
- https://vsdmars.blogspot.com/2020/07/virtual-memory-refresh.html
- Pointer Compression in Oilpan design doc
- https://v8.dev/blog/pointer-compression
Pointer compression has been used in many opensource projects (e.g. cachelib, chrome);
the idea is to use less bits in 64-bit arch (usually 1 word/8 bytes for pointer, 2-words for pointer to member function) to present virtual memory address.
Thus the pointer size compression implementation design can be done as follows:
- cage' (or slab) a range of heap memory block
- The size of a heap cage is limited by the available bits for the offset. e.g., a 4GB heap cage requires 32-bit offsets.
The compressed pointer contains only the offset index from the base address of the 'cage' heap virtual memory. - the 'cage' continuously heap virtual memory base address is per thread, thus, thread_local base pointer can be used here. However, thread local storage (TLS) is slow; thus Oilpan uses single caged heap memory per process.
Oilpan design requirements:
'Member' type instance(i.e. ref counted smart pointer) can take:
- A valid heap pointer to an object;
- The C++ nullptr (or similar);
- A sentinel value which must be known at compile time. The sentinel value can e.g. be used to signal deleted values in hash tables that also support nullptr as entries.
nullptr has its own type domain; what 's value of compress(nullptr) ?
Is it nullptr means deleted object or just pointing to null?
Extra requirements:
- Compress/decompress should be inlined at call site. (i.e. __attribute__((always_inline)) )
- Fast and compact instruction sequence to minimize i-cache misses.
- Branchless instruction sequence to avoid using up branch predictors.
- Consider read/write separately. Read > Write counts; thus:
Fast decompression is preferred.
The main idea for the scheme that is implemented as of today is to separate regular heap pointers from nullptr and sentinel by relying on alignment of the heap cage.
Thus, for cage heap memory, allocated it with alignment such that the least significant bit
of the upper half-word is always set.
cage heap memory allocated with alignment as base address:
0x00 00 00 01 | 00 00 00 00
nullptr:
0x00 00 00 00 | 00 00 00 00
sentinel:
0x00 00 00 00 | 00 00 00 02
Compression generates a compressed value by merely right-shifting by one and truncating away the upper half of the value. In this way, the alignment bit (which now becomes the most significant bit of the compressed value) signals a valid heap pointer.
e.g.
original heap memory:
00000000 00000000 00000000 00000001 00000000 11111111 00000000 00000000
compressed:
00000000 00000000 00000000 00000000 10000000 01111111 10000000 00000000
and truncating away the upper half:
10000000 01111111 10000000 00000000 (half word, the msb 1 indicates a valid heap memory address)
With this implementation, compressed nullptr become:
00000000 00000000 00000000 00000000
With this implementation, compressed sentinel become:
00000000 00000000 00000000 00000001
Note that this allows for figuring out whether a compressed value represents a heap pointer, nullptr, or the sentinel value, which is important to avoid useless decompressions in user code.
Decompression relies on a specifically crafted base pointer, in which the least significant 32 bits are set to 1.
Base:
0x00 00 00 01 | FF FF FF FF
The decompression operation first sign extends the compressed value and then left-shifts to undo the compression operation for the sign bit.
And the decompressed pointer is just the result of a bitwise and between this intermediate value and the base pointer.
Heap pointer:
10000000 01111111 10000000 00000000 to
00000000 00000000 00000000 00000000 10000000 01111111 10000000 00000000 to
00000000 00000000 00000000 00000001 00000000 11111111 00000000 00000000 (first stage decompressed)
00000000 00000000 00000000 00000001 11111111 11111111 11111111 11111111 &
----------------------------------------------------------------------------------------------------------------
00000000 00000000 00000000 00000001 00000000 11111111 00000000 00000000 (decompressed)
nullptr:
00000000 00000000 00000000 00000000 000000000 00000000 00000000 00000000
00000000 00000000 00000000 00000001 111111111 11111111 11111111 11111111 &
----------------------------------------------------------------------------------------------------------------
00000000 00000000 00000000 00000000 000000000 00000000 00000000 00000000 (decompressed)
sentinel:
00000000 00000000 00000000 00000000 000000000 00000000 00000000 00000010
00000000 00000000 00000000 00000001 111111111 11111111 11111111 11111111 &
----------------------------------------------------------------------------------------------------------------
00000000 00000000 00000000 00000000 000000000 00000000 00000000 00000010 (decompressed)
Several gotcha in the article mentioned worth noted here:
- Optimizing cage base load, the cage base pointer an't constexpr at runtime thus impedes compilter to reason for generating faster code. The Oilpan team tackle this with clang's attributes; i.e. using
__attribute__((require_constant_initialization));
(https://chromium-review.googlesource.com/c/v8/v8/+/2739979/17/include/cppgc/member.h#38
https://chromium.googlesource.com/chromium/src/+/f47da96363899cbe1b3b851119bb3409eac253e1/base/allocator/partition_allocator/pcscan.h#17
https://clang.llvm.org/docs/AttributeReference.html#require-constant-initialization-constinit-c-20) - Avoiding decompression at all;
- decompress nullptr to check if it's null
- constructing or assigning a Member from another Member needs no decompression/compression
- Comparison of pointers is preserved by compression, so we can avoid transformations for them as well.
- Hashing can be sped up with compressed pointers. Decompression for hash calculation is redundant, because the fixed base does not increase the hash entropy. Instead, a simpler hashing function for 32-bit integers can be used.
Blink has many hash tables that use Member as a key; the 32-bit hashing resulted in faster collections! - Helping clang where it fails to optimize; remove unnecessary decompression in memory barriar blocks.
- While now the pointer has been compressed, be ware of padding since pointer is now size of int_32; using compressed pointer inside the structure should be padding considered.
TBD:
oilpan-library code dig.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.