Reference:
A Deep Dive Into C++ Object Lifetimes - Jonathan Müller - C++Now 2024
[C++] null pointer and memory laundering.
[C++] transparently replaceable
[Book]Inside the C++ Object Model
nifty counter
Category
Storage (i.e. either have in memory or in the instruction)
- unit, in byte. Every byte has unique address.
- What's on the storage can be anything.
- When storage for an object with automatic or dynamic storage duration is obtained,
the object has an indeterminate value, and if no initialization if performed for the object,
that object retains an indeterminate value until that value is replaced. If an indeterminate
value is produced by an evaluation, the behavior is undefined. - In C++26, read of indeterminate value is erroneous, not undefined. Ref: P2795
Duration
- minimum potential lifetime of the storage containing the object.
- Static, thread, and automatic storage durations are associated with objects introduced by declarations.
automatic storage durations
- Lasts until the block in which they are created exits.
static storage duration
- namespace scope, first declared with the static or extern keywords. Last the duration of the program.
- function-local static vs. global scope
- constinit vs. dynamic initialization
- nifty counters, module dependency graph, inline variables.
thread storage duration
- thread_local keyword. Last for the duration of the thread they are created.
Value (i.e. being initialized)
Type (determin the storage alloting size.)
- Mapping the bits to the interpretation.
Object
- a particular type and occupies a region of storage at a particular
address where its value is stored. - Function is not an object(function address can be changed.)
- Reference is not an object. However, pointer type is an object.
Lifetime
- Lifetime of an object is a runtime property of the object.
- Before the lifetime of an object starts and after its lifetime ends
there are significant restrictions on the use of the object.
Object lifetime spans
- storage is allocated
- object is initialized, the lifetime starts
- object is used, its value changed or read.
- object is destroyed, the lifetime ends.
- storage is deallocated.
Object can be created: This does not necessarily start the lifetime yet.
Object can be destroyed: This ends the lifetime.
The lifetime of an object of type T begins when
- storage with the proper alignment and size for type T is obtained, and
- its initialization (if any) is complete.
int main() {
int* i = new int; // however, `new int()` has default value.
std::print("{}\n", *i); // UB
}
Whenever a prvalue is used in a context where an xvalue is expected, a temporary object is created
- binding a reference to a prvalue
- member-access on a prvalue
- using an array prvalue
- discarding the result of a function call that returns a prvalue.
Temporary objects are destroyed as the last step in evaluating the full-expression that contains the point where they were created.
stc::vector<std::string> get_strings();
int main() {
for (auto&& str: get_strings()) {
std::print("{}\n", str);
} // temporary destroyed here.
// lifetime expanded.
auto&& str_vec = get_strings();
// C++23, only for 'range for'
// https://en.cppreference.com/w/cpp/language/lifetime
for (auto&& c : get_strings()[0]) {
std::print("{}\n", c);
} // temporary destroyed here.
// this is dangling
// auto&& str = get_strings()[0];
}
void* memory = ::operator new(sizeof(int));
int* ptr = ::new(memory) int(11);
std::destroy_at(ptr);
::operator delete(memory);
alignas
alignas(int) unsigned char buffer[sizeof(int)];
int* ptr = ::new(static_cast<void*>(buffer)) int(11);
std::destroy_at(ptr);
int x = 11;
std::destroy_at(&x); // end lifetime
int* ptr = ::new(static_cast<void*>(&x)) int(42);
You cannot legally reuse the memory of an object originally declared const to construct a new object if that construction modifies the memory. The const promise extends to the storage in this scenario.
- The C++ standard states ([dcl.type.cv] p4 in C++20, similar rules in earlier versions): "Except that any class member declared mutable can be modified, any attempt to modify an object declared with const-qualified type through a glvalue of other than const-qualified type results in undefined behavior."
- While you technically ended the lifetime of the original const int object, you are attempting to write (int(42)) into the storage that was originally allocated for an object declared const.
- The standard effectively forbids reusing the storage of a const object to create a new object if that creation involves modifying the storage. The "const-ness" is associated not just with the object's lifetime but also with the storage it occupied in this specific context.
- Attempting to write 42 into memory that the compiler might have placed in a read-only segment (because x was const) could lead to a hardware exception (like a segmentation fault).
- Even if not in read-only memory, the compiler's optimizations might rely on that memory location never changing from 11. Overwriting it violates the assumption.
const int x = 11;
std::destroy_at(&x); // end lifetime
// UB
::new(static_cast<void*>(&x)) int(42);
const int* ptr = new const int(11);
std::destroy_at(&ptr); // end lifetime
::new(static_cast<void*>(ptr)) int(42);
transparently replaceable object
T is transparently replaceable by U if
- T and U use the same storage, and
- T and U have the same type (ignoring top-level cv-qualifiers)
T is not transparently replaceable if
- const objects, const heap objects can be fixed through std::launder
- base classes
- [[no_unique_address]] members
When replacing sub-objects, (member variables or array elements), the rules apply
recursively to the parent object.
// x can't be in the register.
int x = 11;
std::destroy_at(&x);
::new(static_cast<void*>(&x)) int(42); // transparent replacement.
std::print("{}\n", x); // ok
foo& foo::operator=(const foo& other) {
std::destroy_at(this);
::new(static_cast<void*>(this)) foo(other); // transparent replacement.
return *this; // ok
}
non-transparent
const int* ptr = new const int(11);
std::destroy_at(ptr);
int* new_ptr = ::new(static_cast<void*>(ptr)) const int(42); // non-transparent
std::print("{}\n", *new_ptr); // ok
std::print("{}\n", *ptr); // UB
std::launder
- launder is for /previous/ object, not the new one. Compiler always give out right value for new one.
- launder update the provenance of an object. (see below about provenance, a compiler optimization term.)
const int* ptr = new const int(11);
std::destroy_at(ptr);
int* new_ptr = ::new(static_cast<void*>(ptr)) const int(42); // non-transparent
std::print("{}\n", *new_ptr); // ok
std::print("{}\n", *std::launder(ptr)); // ok
Implicit create object(and initialize it.)
int* ptr = static_cast<int*>(std::malloc(sizeof(int))); // create an int, not init.
*ptr = 11;
2) Anything that starts the lifetime of an unsigned char/std::byte array.
alignas(int) unsigned char buffer[sizeof(int)]; // create an int, not init.
int* ptr = std::launder(reinterpret_cast<int*>(buffer)); // P3006, launder can be avoided.
*ptr = 11;
// create nothing due to it's char array, not unsigned char array.
alignas(int) char buffer[sizeof(int)];
std::memcpy(buffer, &some_int, sizeof(int)); // create an int
int* ptr = std::launder(reinterpret_cast<int*>(buffer));
std::print("{}\n", *ptr);
int* ptr = static_cast<int*>(mmap(...));
std::print("{}\n", *ptr);
// create int or float, later compiler time-traval backs here.
alignas(int) unsigned char buffer[sizeof(int)];
if(...)
*std::launder(reinterpret_cast<int*>(buffer)) = 11;
else
*std::launder(reinterpret_cast<float*>(buffer)) = 11.1;
// Still UB
int i = 11;
float f = *std::launder(reinterpret_cast<float*>(&i)); // UB, we don't have float type.
struct data {
std::uint8_t op;
std::uint32_t a, b, c;
};
void process(unsigned char* buffer, std::size_t size) {
data* ptr = std::launder(reinterpret_cast<data*>(buffer));
std::print("{}\n", *ptr); // might be UB depends on how the buffer is created.
}
struct data {
std::uint8_t op;
std::uint32_t a, b, c;
};
void process(unsigned char* buffer, std::size_t size) {
data* ptr = ::new(static_cast<void*>(buffer));
// ok, but could be wrong due to new start a lifetime of new object.
// *ptr might not hold the previous buffer value.
std::print("{}\n", *ptr);
}
// Fix, C++23,
// std::start_lifetime_as https://en.cppreference.com/w/cpp/memory/start_lifetime_as
// std::start_lifetime_as_array<data>(ptr, count);
struct data {
std::uint8_t op;
std::uint32_t a, b, c;
};
void process(unsigned char* buffer, std::size_t size) {
data* ptr = std::start_lifetime_as<data>(buffer);
std::print("{}\n", *ptr); // ok.
}
template<typename T>
T* start_lifetime_as(void* ptr) {
std::memmove(ptr, ptr, sizeof(T));
return std::launder(static_cast<T*>(ptr));
}
Implicit destruction of objects
The lifetime of an object o of type T ends when
- if T is a non-class type, the object is destroyed, or
- if T is a class type, the destructor call starts, or
- the storage which the object occupies is released, or is reused
by an object that is not nested within o.
int x = 11;
::new(static_cast<void*>(&x)) int(42); // end + start new lifetime.
std::print("{}\n", x);
alignas(int) unsigned char buffer[sizeof(int)]; // start lifetime
int* ptr = ::new(static_cast<void*>(buffer)) int(11); // end + start new lifetime.
std::print("{}\n", x);
memory leaks are not UB, but just memory leak.
std::string str = "leaking"; // leaked after next line.
::new(static_cast<void*>(&str)) std::string("new str");
Provenance
- Each object has a unique provenance.
- All objects in an array have the same provenance.
- Re-using the memory of an object changes the provenance unless
the object is transparently replaced. (std::launder)
A pointer T* is logically a pair(address, provenance)
- The address is the only thing that is physically observable.
- The provenance identifies to the object of allocation the pointer was derived from.
A pointer dereference is only valid if
- The address is in the range of allowed addresses for the provenance.
- The current provenance of that address is the same as the provenance of the pointer.
The pointer provenance cannot be changed using pointer arithmetic.
Thus e.g.
int foo() {
int x, y;
y = 11;
if(&x + 1 == &y) {
do_sth(&x);
}
return y;
}
void do_sth(int* ptr) {
*(ptr + 1) = 42; // UB, address not in range.
}
const int* ptr = new const int(11); // provenance A
std::destroy_at(ptr);
int* new_ptr = ::new(static_cast<void*>(ptr)) const int(42); // non-transparent, provenance B
std::print("{}\n", *new_ptr); // ok
std::print("{}\n", *ptr); // UB due to provenance does not match, launder comes into the play.
// fix
std::print("{}\n", *std::launder(ptr)); // launder updates the provenance and make it updated.
Reference has provenance as well.
const int* ptr = new const int(11); // provenance A
const int& ref = *ptr; // provenance B
std::destroy_at(ptr);
::new(static_cast<void*>(ptr)) const int(42); // non-transparent, provenance C
std::print("{}\n", ref); // UB, provenance B != provenance C
// fix
std::print("{}\n", *std::launder(&ref)); // launder updates the provenance and make it updated.
Type punning
reinterpret_cast between unrelated types can be done butdereferencing the cast pointer is UB.
int i = 11;
float* f_ptr = ::new(static_cast<void*>(&i)) float(3.14);
std::print("{}\n", *f_ptr); // ok
std::print("{}\n", i); // UB
int i = 11;
float* f_ptr = std::start_lifetime_as<float>(&i);
std::print("{}\n", *f_ptr); // ok
std::print("{}\n", i); // UB
Be careful about getting the pointer
int i = 11;
float* f_ptr = reinterpret_cast<float*>(&i);
::new(static_cast<void*>(&i)) float(3.14);
std::print("{}\n", *f_ptr); // UB
int i = 11;
::new(static_cast<void*>(&i)) float(3.14);
float* f_ptr = reinterpret_cast<float*>(&i);
std::print("{}\n", *f_ptr); // UB
int i = 11;
float* f_ptr = ::new(static_cast<void*>(&i)) float(3.14);
std::print("{}\n", *f_ptr); // ok
int i = 11;
float* f_ptr = reinterpret_cast<float*>(&i);
::new(static_cast<void*>(&i)) float(3.14);
std::print("{}\n", *std::launder(f_ptr)); // ok
alignas(int) unsigned char buffer[sizeof(int)];
int* ptr = reinterpret_cast<int*>(buffer);
*ptr = 11; // currently needs to call std::launder but fixed in P3006
When to use std::launder?
When want to re-use the storage of
- const heap objects; const object cannot be fixed. Once it's const, it's const for life.
- base classes
- [[no_unique_address]] members
- Or when re-using memory as storage for a different type.
There are exceptions for dereferencing from reinterpret_cast with different types.
i.e.
If a program attempts to address the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:
- the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
- a char, unsigned char, or std::byte type.
int i = 11;
std::print("{}\n", *reinterpret_cast<unsigned*>(&i)); // ok
std::print("{}\n", *reinterpret_cast<std::byte*>(&i)); // ok
Object representation
Allow access to the object representation, the sequence of bytes the object represents in memory.
Code below currently doesn't work but fixed in p1839.
int object = 11;
std::byte* ptr = reinterpret_cast<std::byte*>(&object);
for (auto i = 0z; i != sizeof(object); ++i) {
std::print("{:02x} ", static_cast<int>(*ptr++));
}
Type punning via std::memcpy
int i = 11;
float f;
std::memcpy(&f, &i, sizeof(f));
std::print("{}\n", f); // ok
std::print("{}\n", i); // ok
// C++20, std::bit_cast, doing same as std::memcpy, but constexpr
int i = 11;
float f = std::bit_cast<float>(i);
std::print("{}\n", f); // ok
std::print("{}\n", i); // ok
Another exceptions
If two objects are pointer-interconvertible, then they have the same address,
and it is possible to obtain a pointer to one from a pointer to the other via a
reinterpret_cast.
Two objects a and b are pointer-interconvertible if
- they are the same object, or
- one is a union object and the other is a non-static data member of that object ([class.union]), or
- one is a standard-layout class object and the other is the first non-static data member of that object or any base class sub-object of that object ([class.mem]), or
- there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast
struct A {
int member;
};
A a{.member = 11};
int* i_ptr = reinterpret_cast<int*>(&a);
std::print("{}\n", *i_ptr); // ok
std::print("{}\n", reinterpret_cast<A*>(i_ptr)->member); // ok
Union
union U {
int i;
float f;
};
U u{.i = 11};
u.f = 3.14f; // now f is the active member of the union.
std::print("{}\n", u.f); // ok
std::print("{}\n", u.i); // UB
union U {
struct A {
int prefix;
int i;
} a;
struct B {
int prefix2;
float f;
} b;
};
U u{.a = {.prefix = 0, .i = 11}};
std::print("{}\n", u.a.prefix); // ok
std::print("{}\n", u.b.prefix2); // ok, due to same address with same /type/.
Take away
Don't rely on implicit object creation
- Use placement new to explicitly create a new object, thus new provenance.
- Use std::start_lifetime_as to re-interpret raw bytes as an object, thus new provenance.
- Whenever possible, use the pointer from placement new and std::start_lifetime_as directly, thus new provenance.
- Use union { char empty, T t;} instead of alignas(T) unsigned char buffer[sizeof(T)];
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.