Ataraxia through Epoché: [C++][C++20] coroutine minute

Reference:
https://en.cppreference.com/w/cpp/language/coroutines
https://www.packtpub.com/product/c-high-performance-second-edition/9781839216541
https://clang.llvm.org/docs/DebuggingCoroutines.html (debugging practice / worth another post with live experience)
https://www.reddit.com/r/cpp/comments/vwt6xl/debugging_c_coroutines/

Restrictions

Coroutines cannot use variadic arguments, plain return statements, or placeholder return types (auto or Concept).
Constexpr functions, constructors, destructors, and the main function cannot be coroutines.

Stackless

Stackless coroutines need to store the coroutine frame somewhere else (typically on the heap) and then use the stack of the currently executing thread to store nested call frames. (i.e current stack stores nested call frames; while heap stores coroutine call frame; nested call frames are functions being called inside the coroutine frame.)

Stackless coroutines use the stack of the currently running thread to handle nested function calls.

The effect of this is that a stackless coroutine can never suspend from a nested call frame.

Memory footprint: Coroutine frame

Remember

"std::coroutine_handle<promise>" has
"using promise_type = struct promise;" defined.
And "struct promise" provides several contract member functions and some of them returns the
"std::coroutine_handle<promise>"
promise's get_return_object() return type kind should have promise_type defined.

coroutine state

an internal, heap-allocated (unless the allocation is optimized out), object that contains

the promise object
the parameters (all copied by value)
some representation of the current suspension point, so that resume knows where to continue and destroy knows what local variables were in scope
local variables and temporaries whose lifetime spans the current suspension point

Switching between coroutines is substantially faster than switching between processes and OS threads, partly because it doesn't involve any system calls that require the CPU to run in kernel mode.
In general, a stackful coroutine has a more expensive context switch operation since it has more information to save and restore during suspend and resume compared to a stackless coroutine. Resuming a stackless coroutine is comparable to a normal function call.

When a coroutine begins execution, it performs the following (Important concept flow)

allocates the coroutine state object using operator new (see example code below)
copies all function parameters to the coroutine state:

by-value parameters are moved or copied,
by-reference parameters remain references (and so may become dangling if the coroutine is resumed after the lifetime of referred object ends)
this pointer is also copied into the coroutine state on heap; thus beware that this is destroyed and became dangling.

calls the constructor for the promise object. If the promise type has a constructor that takes all coroutine parameters, that constructor is called, with post-copy coroutine arguments. Otherwise the default constructor is called.
calls promise.get_return_object() (the result type of get_return_object() can be any kind, this type kind is the type kind that the caller of the coroutine gets; i.e. the same type kind the coroutine function signature return type is.) and keeps the result in a local variable.
The result of that call will be returned to the caller when the coroutine first suspends.
Any exceptions thrown up to and including this step propagate back to the caller, not placed in the promise.
calls promise.initial_suspend() and co_awaits its result. Typical Promise types either return a std::suspend_always, for lazily-started coroutines; or std::suspend_never, for eagerly-started coroutines.
when co_await promise.initial_suspend() resumes, starts executing the body of the coroutine

When a coroutine reaches a suspension point(i.e. suspension point inside the coroutine state, i.e. co_await co_yield)

the return object obtained earlier(i.e. from promise.get_return_object()) is returned to the caller/resumer, after implicit conversion to the return type of the coroutine, if necessary.

When a coroutine reaches the co_return statement

calls promise.return_void() for

co_return;
co_return expr where expr has type void
falling off the end of a void-returning coroutine. The behavior is undefined if the Promise type has no Promise::return_void() member function in this case.

or calls promise.return_value(expr) for co_return expr where expr has non-void type
destroys all variables with automatic storage duration in reverse order they were created.
calls promise.final_suspend() and co_awaits the result.

coroutine ends with an uncaught exception

catches the exception and calls promise.unhandled_exception() from within the catch-block
calls promise.final_suspend() and co_awaits the result (e.g. to resume a continuation or publish a result). It's undefined behavior to resume a coroutine from this point.

When the coroutine state is destroyed either because it terminated via co_return or uncaught exception, or because it was destroyed via its handle

calls the destructor of the promise object.
calls the destructors of the function parameter copies.
calls operator delete to free the memory used by the coroutine state
transfers execution back to the caller/resumer.

Heap allocation

coroutine state is allocated on the heap via non-array operator new.
If the Promise type defines a class-level replacement, it will be used, otherwise global operator new will be used.
If the Promise type defines a placement form of operator new that takes additional parameters, and they match an argument list where the first argument is the size requested (of type std::size_t) and the rest are the coroutine function arguments, those arguments will be passed to operator new (this makes it possible to use leading-allocator-convention for coroutines)
The call to operator new can be optimized out (even if custom allocator is used) if

The lifetime of the coroutine state is strictly nested within the lifetime of the caller, and
the size of coroutine frame is known at the call site
that is to say, no escape of coroutine occurs
in that case, coroutine state is embedded in the caller's stack frame
(if the caller is an ordinary function) or coroutine state (if the caller is a coroutine)

If allocation fails, the coroutine throws std::bad_alloc, unless the Promise type defines the member function Promise::get_return_object_on_allocation_failure()
If that member function is defined, allocation uses the nothrow form of operator new and on allocation failure, the coroutine immediately returns the object obtained from Promise::get_return_object_on_allocation_failure() to the caller.

Promise

The Promise type is determined by the compiler from the return type of the coroutine using std::coroutine_traits.

#include <coroutine>
#include <iostream>

// Consider promise is the config for coroutine.
struct promise;
struct coroutine : std::coroutine_handle<promise>
{ using promise_type = struct promise; };

struct promise {
  coroutine get_return_object()
  { return {coroutine::from_promise(*this)}; }
  std::suspend_always initial_suspend() noexcept { return {}; }
  std::suspend_always final_suspend() noexcept { return {}; }
  void return_void() {}
  void unhandled_exception() {}
};

struct S {
  int i;
  coroutine f() {
    std::cout << i;
    co_return;
  }
};

void bad1() {
  coroutine h = S{0}.f();
  // S{0} destroyed
  // and due to promise::initial_suspend() returns std::suspend_always ; 
  // code will suspend on previous line and only continue at h.resume()
  h.resume(); // resumed coroutine executes std::cout << i, uses S::i after free
  h.destroy();
}

coroutine bad2() {
  S s{0};
  // S is RAIIed; coroutin has the copy of pointer to s instance(i.e. this) which is now dangling.
  return s.f(); 
}

void bad3() {
  coroutine h = [i = 0]() -> coroutine { // a lambda that's also a coroutine
    std::cout << i;
    co_return;
  }(); // immediately invoked
  // lambda destroyed
  h.resume(); // uses (anonymous lambda type)::i after free
  h.destroy();
}

void good() {
  coroutine h = [](int i) -> coroutine { // make i a coroutine parameter
    std::cout << i;
    co_return;
  }(0);
  // lambda destroyed
  h.resume(); // no problem, i has been copied to the coroutine frame as a by-value parameter
  h.destroy();
}

Stackful

Stackful coroutines have a separate side stack (similar to a thread) that contains the coroutine frame and the nested call frames.

Stackful coroutines are sometimes called fibers, and in the programming language Go, they are called goroutines. (more details in blog's goroutine notes)

Stackful coroutines remind us of threads, where each thread manages its own stack.

There are two big differences between stackful coroutines (or fibers) and OS threads:

OS threads are scheduled by the kernel and switching between two threads is a kernel mode operation.
Most OSes switch OS threads preemptively (the thread is interrupted by the scheduler), whereas a switch between two fibers happens cooperatively. A running fiber keeps running until it passes control over to some manager that can then schedule another fiber.

Memory footprint: Coroutine frame + call stack

Suspend point

co_await

co_yield

In details:

co_await: An operator that suspends the current coroutine

co_yield: Returns a value to the caller and suspends the coroutine

co_return: Completes the execution of a coroutine and can, optionally, return a value

<coroutine> header including the following:

std::coroutine_handle: A template class that refers to the coroutine state,
enabling the suspending and resuming of the coroutine
std::suspend_never: A trivial awaitable type that never suspends
std::suspend_always: A trivial awaitable type that always suspends
std::coroutine_traits: Used to define the promise type of a coroutine

coroutine has the following restrictions:

A coroutine cannot use variadic arguments like f(const char*...)
A coroutine cannot return auto or a concept type: auto f()
A coroutine cannot be declared constexpr
Constructors and destructors cannot be coroutines
The main() function cannot be a coroutine

Coroutin State == Coroutine Frame ; which is create on heap(if is needed)

A bit more about co_await; which enables coroutine as an alternative of using in-elegant thread/future/promise API.

co_await

The unary operator co_await suspends a coroutine and returns control to the caller.
此時caller 拿到的operand type為awaitable

Its operand is an expression that either

is of a class type that defines a member operator co_await or may be passed to a non-member operator co_await, or
is convertible to such a class type by means of the current coroutine's Promise::await_transform.

co_await expr

expr is converted to an awaitable as follows

if expr is produced by an

initial suspend point, or
a final suspend point, or
a yield expression,

otherwise, if the current coroutine's Promise type has the member function await_transform, then the awaitable is promise.await_transform(expr)
otherwise, the awaitable is expr, as-is.

Then, the awaiter object is obtained, as follows

if overload resolution for operator co_await gives a single best overload, the awaiter is the result of that call (awaitable.operator co_await() for member overload,
operator co_await(static_cast<Awaitable&&>(awaitable)) for the non-member overload)
otherwise, if overload resolution finds no operator co_await, the awaiter is awaitable, as-is
otherwise, if overload resolution is ambiguous, the program is ill-formed

If the expression above is a prvalue, the awaiter object is a temporary materialized from it.

Otherwise, if the expression above is an glvalue, the awaiter object is the object to which it refers.

Then, awaiter.await_ready() is called.
(this is a short-cut to avoid the cost of suspension if it's known that the result is ready or can be completed synchronously, 也就是false代表返回caller，coroutine suspend; true代表立刻繼續執行coroutine).

If its result, contextually-converted to bool is false then

The coroutine is suspended (its coroutine state is populated with local variables and current suspension point).
awaiter.await_suspend(handle) is called, where handle is the coroutine handle representing the current coroutine.
Inside that function, the suspended coroutine state is observable via that handle, and it's this function's responsibility to schedule it to resume on some executor, or to be destroyed (returning false counts as scheduling)

if await_suspend returns void, control is immediately returned to the caller/resumer of the current coroutine (this coroutine remains suspended), otherwise
if await_suspend returns bool,

the value true returns control to the caller/resumer of the current coroutine
the value false resumes the current coroutine.

if await_suspend returns a coroutine handle for some other coroutine, that handle is resumed (by a call to handle.resume())
(note this may chain to eventually cause the current coroutine to resume)
if await_suspend throws an exception, the exception is caught, the coroutine is resumed, and the exception is immediately re-thrown

Finally, awaiter.await_resume() is called (whether the coroutine was suspended or not), and its result is the result of the whole co_await expr expression.
await_resume() 回傳的object是指斷點恢復後繼續下去的code.
execution pattern與Python的yield一樣
If the coroutine was suspended in the co_await expression, and is later resumed, the resume point is immediately before the call to awaiter.await_resume().
Note that because the coroutine is fully suspended before entering awaiter.await_suspend(), that function is free to transfer the coroutine handle across threads, with no additional synchronization.
For example, it can put it inside a callback, scheduled to run on a threadpool when async I/O operation completes.
In that case, since the current coroutine may have been resumed and thus executed the awaiter object's destructor, all concurrently as await_suspend() continues its execution on the current thread, await_suspend() should treat *this as destroyed and not access it after the handle was published to other threads.

e.g.

#include <coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
 
auto switch_to_new_thread(std::jthread& out) {
  struct awaitable {
    std::jthread* p_out;
    bool await_ready() { return false; } // returns false thus await_suspend will be called.

	// the coroutine is fully suspended before calling this function.
    // thus it is OK to pass the handle of that suspended coroutine to another thread.
    void await_suspend(std::coroutine_handle<> h /* controlls the corountin */) {
      std::jthread& out = *p_out;
      if (out.joinable()) // empty, not running yet while no callable function can be run
        throw std::runtime_error("Output jthread parameter not empty");
    
      // capture the coroutine's handler; beware *this will be destroyed after h.resume()
      // thus not to operate on captured *this (can make a copy of it though)
      out = std::jthread([h] {
          std::cout << "call h.resume" << std::endl << std::flush;
          h.resume(); 
          std::cout << "end h.resume" << std::endl << std::flush;
      });
      // Potential undefined behavior: accessing potentially destroyed *this
      // std::cout << "New thread ID: " << p_out->get_id() << '\n';
      std::cout << "New thread ID: " << out.get_id() << '\n'; // #3
    }
    
    // in final step this function will be called.
    // 可以回傳object; 此object將回用於coroutine內
    // e.g.
    // int return_value = co_await switch_to_new_thread(out);
    int await_resume() {return 42;} 

    ~awaitable() {
      std::cout << "awaitable destructor called" << std::endl << std::flush;
    }
  };
  return awaitable{&out};
}
 
struct task{
  struct promise_type {
    task get_return_object() { return {}; }
    std::suspend_never initial_suspend() { return {}; }
    std::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
  };
};
 
task resuming_on_new_thread(std::jthread& out) {
  std::cout << "Coroutine started on thread: " << std::this_thread::get_id() << '\n'; // #2
  co_await switch_to_new_thread(out);
  // awaiter destroyed here
  std::cout << "Coroutine resumed on thread: " << std::this_thread::get_id() << '\n';
}
 
int main() {
  std::jthread out; // empty, not running yet while no callable function can be run
  std::cout << "call resuming_on_new_thread" << std::endl << std::flush; // #1
  resuming_on_new_thread(out);
  std::cout << "end resuming_on_new_thread: " << std::this_thread::get_id() <<
   std::endl << std::flush; // #4
  // BLOCK from exit main() due to jthread's destructor call.
}

result:

call resuming_on_new_thread

Coroutine started on thread: 139837705672512

New thread ID: 139837685438208

end resuming_on_new_thread: 139837705672512

call h.resume

awaitable destructor called

Coroutine resumed on thread: 139837685438208

end h.resume

co_yield

Yield-expression returns a value to the caller and suspends the current coroutine:

it is the common building block of resumable generator functions

co_yield expr

co_yield braced-init-list

above code is same as:

co_await promise.yield_value(expr)

A typical generator's yield_value would store

(copy/move or just store the address of, since the argument's lifetime crosses the suspension point inside the co_await)

its argument into the generator object and return std::suspend_always, transferring control to the caller/resumer.

#include <coroutine>
#include <exception>
#include <iostream>
 
template<typename T>
struct Generator {
   // The class name 'Generator' is our choice and 
   // it is not required for coroutine magic. 
   // Compiler recognizes coroutine by the presence of 'co_yield' keyword.
   // You can use name 'MyGenerator' (or any other name) instead
   // as long as you include nested struct promise_type 
   // with 'MyGenerator get_return_object()' method .
   //(Note:You need to adjust class constructor/destructor names too when choosing to rename class)
 
  struct promise_type;
  using handle_type = std::coroutine_handle<promise_type>;
 
  struct promise_type {// required 
    T value_;
    std::exception_ptr exception_;
 
    Generator get_return_object() {
      return Generator(handle_type::from_promise(*this));
    }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() noexcept { return {}; }
    void unhandled_exception() { exception_ = std::current_exception(); }//saving exception
    template<std::convertible_to<T> From> // C++20 concept
    std::suspend_always yield_value(From &&from) {
      value_ = std::forward<From>(from);//caching the result in promise
      return {};
    }
    void return_void() {}
  };
 
  handle_type h_;
 
  Generator(handle_type h) : h_(h) {}
  ~Generator() { h_.destroy(); }
  explicit operator bool() {
    fill();// The only way to reliably find out whether or not we finished coroutine, 
           // whether or not there is going to be a next value generated (co_yield) in coroutine
           // via C++ getter (operator () below) 
           // is to execute/resume coroutine until the next co_yield point (or let it fall off end).
           // Then we store/cache result in promise to allow getter (operator() below to grab it 
           // without executing coroutine)
    return !h_.done();
  }
  T operator()() {
    fill();
    full_ = false;//we are going to move out previously cached result to make promise empty again
    return std::move(h_.promise().value_);
  }
 
private:
  bool full_ = false;
 
  void fill() {
    if (!full_) {
      h_();
      if (h_.promise().exception_)
        std::rethrow_exception(h_.promise().exception_);
        //propagate coroutine exception in called context
 
      full_ = true;
    }
  }
};
 
Generator<uint64_t>
fibonacci_sequence(unsigned n)
{
 
  if (n==0)
    co_return;
 
  if (n>94)
    throw std::runtime_error("Too big Fibonacci sequence. Elements would overflow.");
 
  co_yield 0;
 
  if (n==1)
    co_return;
 
  co_yield 1;
 
  if (n==2)
    co_return;
 
  uint64_t a=0;
  uint64_t b=1;
 
  for (unsigned i = 2; i < n;i++)
  {
    uint64_t s=a+b;
    co_yield s;
    a=b;
    b=s;
  }
}
 
int main()
{
  try {
 
    auto gen = fibonacci_sequence(10); //max 94 before uint64_t overflows
 
    for (int j=0;gen;j++)
      std::cout << "fib("<<j <<")=" << gen() << '\n';
 
  }
  catch (const std::exception& ex)
  {
    std::cerr << "Exception: " << ex.what() << '\n';
  }
  catch (...)
  {
    std::cerr << "Unknown exception.\n";
  }
}

Ataraxia through Epoché

Apr 21, 2022

[C++][C++20] coroutine minute

Restrictions

Stackless

"std::coroutine_handle<promise>" has
"using promise_type = struct promise;" defined.
And "struct promise" provides several contract member functions and some of them returns the
"std::coroutine_handle<promise>"
promise's get_return_object() return type kind should have promise_type defined.

coroutine state

When a coroutine begins execution, it performs the following (Important concept flow)

When a coroutine reaches a suspension point(i.e. suspension point inside the coroutine state, i.e. co_await co_yield)

When a coroutine reaches the co_return statement

coroutine ends with an uncaught exception

When the coroutine state is destroyed either because it terminated via co_return or uncaught exception, or because it was destroyed via its handle

Heap allocation

Promise

Stackful

Suspend point

co_await

co_yield

No comments:

Post a Comment

Apr 21, 2022

[C++][C++20] coroutine minute

Restrictions

Stackless

"std::coroutine_handle<promise>" has "using promise_type = struct promise;" defined.And "struct promise" provides several contract member functions and some of them returns the"std::coroutine_handle<promise>"promise's get_return_object() return type kind should have promise_type defined.

coroutine state

When a coroutine begins execution, it performs the following (Important concept flow)

When a coroutine reaches a suspension point(i.e. suspension point inside the coroutine state, i.e. co_await co_yield)

When a coroutine reaches the co_return statement

coroutine ends with an uncaught exception

When the coroutine state is destroyed either because it terminated via co_return or uncaught exception, or because it was destroyed via its handle

Heap allocation

Promise

Stackful

Suspend point

co_await

co_yield

No comments:

Post a Comment

"std::coroutine_handle<promise>" has
"using promise_type = struct promise;" defined.
And "struct promise" provides several contract member functions and some of them returns the
"std::coroutine_handle<promise>"
promise's get_return_object() return type kind should have promise_type defined.