Jun 21, 2018

[Go] The Go scheduler

Reference:
Morsing's The Go Scheduler article
Dmitry Vyukov's Scalable Go Scheduler Design doc
MIT Paper: Scheduling Multithreaded Computations by Work Stealing
Intel TBB Task Scheduler
Go's work-stealing scheduler
Command trace

Why a userspace scheduler for go?
Because of garbage collector.

The Go garbage collector requires that all threads are stopped when running a collection and that memory must be in a consistent state.
This involves waiting for running threads to reach a point where we know that the memory is consistent.

Golang is using M:N thread model, i.e M as gorotines, N as OS threads.

layout:

G: goroutine (aka. Task, can be stolen)
M: OS thread (machine, aka. OS thread)
P: processor, NOT CPU processor, but a logical one (aka. context, can steal, own 1 M)
The number of P is set to GOMAXPROCS/runtime function GOMAXPROCS()
Each P has it's own local runqueue, which contains Gs. 
This design relieve mutex contention on single thread. 
Why? Because there's no mutex variable crossing CPU cores , which there's no WB/RB barriers involves.

Why P(context)? Because P can be moving around the threads.
If 1 P is blocked due to it's goroutine called a syscall, we could have that goroutin + M stay and moving the P with it's runqueue to other thread to run, consider P as a package can be move around to other threads.

The scheduler makes sure there are enough threads to run all contexts.

When the syscall returns, the (blocking) thread must try and get a context(P) in order to run the returning goroutine.
The normal mode of operation is to steal a context(P) from one of the other threads. If it can't steal one, it will put the goroutine on a global runqueue, put itself(i.e the 'was' blocking thread) on the thread cache and go to sleep.

Global runqueue:

  • Global queue: G's can be put into here if original P is blocked and resumed(awake) but can't grab a M.
  • The global runqueue is a runqueue that contexts(P) pull from when they run out of their local runqueue.
  • Contexts(P) also periodically check the global runqueue for goroutines. Otherwise the goroutines on global runqueue could end up never running because of starvation.
Logic sequence:
runtime.schedule() {
    // only 1/61 of the time, check the global runnable queue for a G.
    // if not found, check the local queue.
    // if not found,
    //     try to steal from other Ps.
    //     if not, check the global runnable queue.
    //     if not found, poll network.

}


Task Stealing:

  • If local runqueue is drained, look into global runqueue.
  • If the global runqueue is empty, the P will steal from other P's local runquene, half of them.

Spinning threads(M):

A thread is spinning if:
  • An M with a P assignment is looking for a runnable goroutine.
  • An M without a P assignment is looking for available Ps.
  • Scheduler also unparks an additional thread and spins it when it is readying a goroutine if there is an idle P and there are no other spinning threads.

Logical Processor: (This is the essential of Go)


The purpose of a P is to limit the amount of total concurrency running Go code. 

By default set the number of P's to the number of CPU cores on the system (including hyperthreading).  

The user can control it by setting the GOMAXPROCS environment variable, and the program can control it by calling runtime.GOMAXPROCS

M's, on the other hand, which are operating system threads, are started as needed, so the user and program have no control over them.  

G's, which are goroutines, are started by the program but there is no way to limit the total number of goroutines. 

So using P's is how the Go runtime tries to keep the program running efficiently without thrashing.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.