Ataraxia through Epoché: golang

Showing posts with label golang_thread. Show all posts

Aug 3, 2021

[C++] note about std::shared_mutex and pthread_rwlock_t

Reference:
std::shared_mutex
pthread_rwlock_init
stackoverflow response:
https://stackoverflow.com/a/57709957
https://stackoverflow.com/a/2190271

Take away:

C++17's std::shared_mutex on linux might using pthread_rwlock_t underneath, thus in order to tweak

the behavior of write starvation, set PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP in the pthread_rwlock_init call's pthread_rwlockattr_t is necessary.

For those not using pthread_rwlock_t, std::shared_mutex should rely on linux kernel's scheduler, which is fair, avoid either write/read starvation.

Here's how Go handles stavation:

http://vsdmars.blogspot.com/2021/03/go-methods-for-lock-starvation-barging.html

Mar 7, 2021

[Go] methods for lock starvation: Barging / Handoff / Spinning

Before Go v1.9.0, methods for thread starvation:

Barging
Spinning

After Go v.1.9.0 includes:

Handoff

Goroutines wait for the lock for more than one millisecond, aka. bounded waiting, will be flagged as starving.

If there are Goroutines flagged as starving, the unlock method will hand off the lock to the first waiter directly.

[reference: Bounded waiting]

If there are Goroutines flagged as starving, the spinning method is deactivated.

Jul 1, 2020

[Go] g0's roles

g0 traits (for every M has g0)

Fix and larger stack size(ordinary goroutin occupies 2kb at start).
g0's stack usually doesn't grow.
responsible for goroutine creation.
responsible for defer function allocation.
GC operations. Including STW, mark and sweep operations.
Stack growth for running goroutins.
Schedule goroutine to run. (each goroutine end with calling runtime.goexit() which notify g0 it's finished) Or goroutine can call runtime.Goexit() manually)
Maintain 2 goroutine queues, one with recycled goroutines with an allocated stack(max 2kb size), one with recycled goroutines with empty stack)
There are 2 global goroutine queues as well protected by locks.

Each local goroutine queue for P has a maximum capacity of 256, and any new incoming goroutine is pushed to the global queue after local queue is full.

Each P also maintains a queue(64 size) for freed goroutines.(goroutines are recycleable )

Work-stealing

When a processor does not have any work, it applies the following rules until one can be satisfied:

pull work from the local queue
pull work from the global queue
pull work from network poller
steal work from the other P’s local queues

Threads can be blocked on system calls and the number of blocked threads is not limited in Go runtime.

Affinity limitation

P's Local goroutin queue will be used for all operations expect system calls

such as blocking operations on channels and selects, waiting on timers and locks.

Two features could restrict the affinity between a goroutine and a thread:

Work-stealing.
When a processor P does not have enough work in its local queue, it will steal goroutines from others P if the global queue and the network poller are empty. When stolen, the goroutines will then run on another thread.
System calls.
When a syscall occurs (e.g. files operations, http calls, database operations, etc.), Go moves the running OS thread in a blocking mode, letting a new thread processing the local queue on the current P.

However, with goroutins using the same channel will be grouped into same P during scheduling. And for those waiting for channel will be scheduled with high priority then other goroutin to run next.

Dec 8, 2019

[Go] LockOsThread

Reference:
https://github.com/golang/go/wiki/LockOSThread

package sdl

// Arrange that main.main runs on main thread.
func init() {
 runtime.LockOSThread()
}

// Main runs the main SDL service loop.
// The binary's main.main must call sdl.Main() to run this loop.
// Main does not return. If the binary needs to do other work, it
// must do it in separate goroutines.
func Main() {
 for f := range mainfunc {
  f()
 }
}

// queue of work to run in main thread.
var mainfunc = make(chan func())

// do runs f on the main thread.
func do(f func()) {
 done := make(chan bool, 1)
 mainfunc <- func() {
  f()
  done <- true
 }
 <-done
}

func Beep() {
 do(func() {
  // whatever must run in main thread
 })
}

Jun 21, 2018

[Go] The Go scheduler

Reference:
Morsing's The Go Scheduler article
Dmitry Vyukov's Scalable Go Scheduler Design doc
MIT Paper: Scheduling Multithreaded Computations by Work Stealing
Intel TBB Task Scheduler
Go's work-stealing scheduler
Command trace

Why a userspace scheduler for go?
Because of garbage collector.

The Go garbage collector requires that all threads are stopped when running a collection and that memory must be in a consistent state.
This involves waiting for running threads to reach a point where we know that the memory is consistent.

Golang is using M:N thread model, i.e M as gorotines, N as OS threads.

layout:

G: goroutine (aka. Task, can be stolen)
M: OS thread (machine, aka. OS thread)
P: processor, NOT CPU processor, but a logical one (aka. context, can steal, own 1 M)
The number of P is set to GOMAXPROCS/runtime function GOMAXPROCS()
Each P has it's own local runqueue, which contains Gs.
This design relieve mutex contention on single thread.
Why? Because there's no mutex variable crossing CPU cores , which there's no WB/RB barriers involves.

Why P(context)? Because P can be moving around the threads.
If 1 P is blocked due to it's goroutine called a syscall, we could have that goroutin + M stay and moving the P with it's runqueue to other thread to run, consider P as a package can be move around to other threads.

The scheduler makes sure there are enough threads to run all contexts.

When the syscall returns, the (blocking) thread must try and get a context(P) in order to run the returning goroutine.
The normal mode of operation is to steal a context(P) from one of the other threads. If it can't steal one, it will put the goroutine on a global runqueue, put itself(i.e the 'was' blocking thread) on the thread cache and go to sleep.

Global runqueue:

Global queue: G's can be put into here if original P is blocked and resumed(awake) but can't grab a M.
The global runqueue is a runqueue that contexts(P) pull from when they run out of their local runqueue.
Contexts(P) also periodically check the global runqueue for goroutines. Otherwise the goroutines on global runqueue could end up never running because of starvation.

Logic sequence:
runtime.schedule() {
// only 1/61 of the time, check the global runnable queue for a G.
// if not found, check the local queue.
// if not found,
// try to steal from other Ps.
// if not, check the global runnable queue.
// if not found, poll network.
}

Task Stealing:

If local runqueue is drained, look into global runqueue.
If the global runqueue is empty, the P will steal from other P's local runquene, half of them.

Spinning threads(M):

A thread is spinning if:

An M with a P assignment is looking for a runnable goroutine.
An M without a P assignment is looking for available Ps.
Scheduler also unparks an additional thread and spins it when it is readying a goroutine if there is an idle P and there are no other spinning threads.

Logical Processor: (This is the essential of Go)

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/golang-nuts/5Zo403s3LC0/P4JwsHsKAQAJ

The purpose of a P is to limit the amount of total concurrency running Go code.

By default set the number of P's to the number of CPU cores on the system (including hyperthreading).

The user can control it by setting the GOMAXPROCS environment variable, and the program can control it by calling runtime.GOMAXPROCS

M's, on the other hand, which are operating system threads, are started as needed, so the user and program have no control over them.

G's, which are goroutines, are started by the program but there is no way to limit the total number of goroutines.

So using P's is how the Go runtime tries to keep the program running efficiently without thrashing.

Design Doc:

https://docs.google.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit