Dec 27, 2025

[C++] coroutine cheat sheet - 3

Reference:
[C++] Object Lifetimes reading minute



When HALO (Heap Allocation Elision Optimization) could happen.

  1. The Lifetime is Strictly Nested
    The compiler must be able to prove that the coroutine's lifetime ends before the caller's execution finishes.
    If the coroutine object (the "handle" or "task") is returned and its destruction point cannot be determined at compile-time, the compiler must play it safe and use the heap.
  2. The Coroutine State Size is Known
    The compiler needs to know exactly how much space the coroutine requires (including captured variables and promise objects) at the call site.
    This usually requires the coroutine body to be visible to the compiler
    (i.e., in the same translation unit or available via Link Time Optimization).
  3. Use of std::get_return_object_on_allocation_failure (Optional but relevant)
    If our promise_type defines this static member function, it signals to the compiler how to handle allocation failures.
    While this doesn't force an opt-out, it changes the allocation strategy to be more robust.

The HALO Optimization Process:

The optimization works roughly like this:

  • Analysis: The compiler looks at the co_await and destruction points of the coroutine object.
  • Inlining: It attempts to inline the coroutine logic into the caller.
  • Elision: If the compiler sees that the coroutine state does not "escape" the function, it replaces operator new with a local stack allocation.

How to Encourage the Compiler to Opt-Out

Since we cannot explicitly keywords like noheap, we have to "help" the compiler's optimizer:
  • Keep the coroutine local: Avoid passing the coroutine handle to other threads or storing it in global containers.
  • Enable High Optimization: HALO typically requires -O2 or -O3 (GCC/Clang) or /O2 (MSVC).
  • Inline the Coroutine: Define the coroutine in a header or the same file where it is called so the compiler can see the full lifecycle.
Task my_coroutine() {
    co_return 42;
}

void caller() {
    auto t = my_coroutine(); // Compiler can see 't' lives only here
    // ... do something ...
} // 't' is destroyed here; Heap allocation likely elided. 


Limitations

  • Dynamic Dispatch: If we call a coroutine through a virtual function or a function pointer, the compiler usually cannot perform HALO.
  • Tail Calls: Complex chains of coroutines can sometimes make it difficult for the compiler to prove nested lifetimes.

The "Manual" Opt-Out (Custom Allocators)

If we cannot rely on the compiler's optimization (e.g., in embedded systems), 
we can manually opt-out of the default heap by overloading operator new in our promise_type.

Using a Static Buffer or Arena:

We can provide a custom operator new that pulls from a pre-allocated memory pool or a stack-based arena, effectively bypassing the system heap.
struct promise_type {
    // Overloading new allows we to use a custom allocator
    void* operator new(std::size_t size) {
        return my_custom_arena.allocate(size);
    }
    void operator delete(void* ptr) {
        my_custom_arena.deallocate(ptr);
    }
    // ... other promise members
};


How to Verify if Elision Happened

Since HALO is an optimization, it can be fragile. We can verify it using these tricks:
  • Print in Custom New: Add a printf inside our promise_type::operator new. If it doesn't print during execution at -O3, the compiler successfully elided the call.
  • Compiler Explorer (Assembly): Check for the absence of call operator new or malloc in the generated assembly.
  • Clang-Specific Attributes: Clang is experimenting with attributes like [[clang::coro_inplace_task]] to make this elision more deterministic, though this is not standard C++20. (reference: Language Extension for better, more deterministic HALO for C++ Coroutines)
#include <coroutine>
#include <iostream>
#include <array>

struct StaticTask {
    struct promise_type {
        // 1. Intercept the arguments of the coroutine function
        // This allows 'operator new' to see the buffer passed to the coroutine
        void* operator new(std::size_t size, std::span<std::byte> buffer) {
            if (size > buffer.size()) {
                throw std::bad_alloc();
            }
            std::cout << "Allocating " << size << " bytes from stack buffer\n";
            return buffer.data();
        }

        // Must provide a matching delete (even if it does nothing)
        void operator delete(void*, std::size_t) {}

        StaticTask get_return_object() { return {std::coroutine_handle<promise_type>::from_promise(*this)}; }
        std::initial_suspend initial_suspend() { return {}; }
        std::final_suspend final_suspend() noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() { std::terminate(); }
    };

    std::coroutine_handle<promise_type> handle;
};

// Usage
StaticTask my_coro(std::span<std::byte> storage) {
    std::cout << "Coroutine running!\n";
    co_return;
}

int main() {
    std::array<std::byte, 1024> stack_space; 
    auto task = my_coro(stack_space); // Passes the buffer to 'operator new'
}

In C++20, the operator new for a coroutine is uniquely powerful because the compiler performs a special lookup.
It doesn't just look for a standard void* operator new(size_t); it looks for an overload that matches the entire signature of our coroutine function.

Why it takes the buffer as an argument
When we call a coroutine like my_coro(some_buffer), the compiler needs to allocate space for the "coroutine frame" 
(which holds local variables and the promise).

To give us total control, the C++ standard says:

The compiler will first try to find an operator new in our promise_type that takes (std::size_t, Args...),
where Args... are the exact types passed to the coroutine function.

If it finds this "matching" version, it calls it and passes the arguments we provided in the function call.

This is the "magic hook" that allows us to pass a specific memory source (like a stack-based span or a custom Arena&) directly into the allocation logic.

The Lookup Mechanics

The compiler follows this priority list when it sees a coroutine call:

Priority   Signature the compiler looks for Description
  1. (Best)  operator new(size_t, P1, P2...)
    Takes the size plus all coroutine arguments (P1,P2).
  2. operator new(size_t)                   
    The standard class-specific allocator.
  3. ::operator new(size_t)                 
    Global heap allocation. (Fallback)
Note: For member functions, the first argument after size_t is actually the this pointer, followed by the function arguments.



struct promise_type {
    std::span<std::byte> m_buffer;

    // 1. Used to POSITION the memory (Allocation)
    void* operator new(std::size_t size, std::span<std::byte> buffer) {
        if (size > buffer.size()) throw std::bad_alloc();
        return buffer.data();
    }

    // 2. Used to INITIALIZE the promise (Construction)
    // The compiler sees that the coroutine was called with a span,
    // so it looks for a constructor that accepts it.
    promise_type(std::span<std::byte> buffer) : m_buffer(buffer) {
        std::cout << "Promise initialized with buffer of size " << m_buffer.size() << "\n";
    }

    // ... rest of promise_type members ...
};

[[clang::coro_await_elidable_argument]] 

attribute is a specialized Clang-specific optimization hint designed to reduce the overhead of asynchronous programming. It is primarily used to enable HALO (Heap Allocation Elision Optimization) for coroutines.

This attribute is applied to a parameter of a function (usually an operator await). It tells the compiler that the coroutine being passed as an argument is a temporary that will not outlive the current function. This gives the compiler a "green light" to:
  • Elide the heap allocation: Instead of putting the coroutine frame on the heap, it puts it on the caller's stack.
  • Inline the coroutine: It allows for better devirtualization and inlining of the coroutine's lifecycle.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.