Notes from reading Eli Bendersky's blog post:
Basics of Futexes
Context switch happens between userspace and kernel space.
Thus, we have VDSO ( http://vsdmars.blogspot.com/search/label/linux_vdso ) to relax some system calls' burden, as for locks, we have futex.
The difference here is that futex doesn't involve any business logic but simply to acquire the lock to access memory concurrently safe.
It is likely that when a thread acquires a lock, the lock hasn't been locked yet.
In this case, no system call involved, a compare_and_swap atomic instruction would be enough. ( cmpxhg , which is cheaper than a system call )
Reference:
https://stackoverflow.com/a/27856649
The high-level locks that lock-free algorithms try to avoid can guard arbitrary code fragments whose execution may take arbitrary time and thus, these locks will have to put threads into wait state until the lock is available which is a costly operation, e.g. implies maintaining a queue of waiting threads.
This is an entirely different thing than the CPU LOCK prefix feature which guards a single instruction only and thus might hold other threads for the duration of that single instruction only. Since this is implemented by the CPU itself, it doesn’t require additional software efforts.
Therefore the challenge of developing lock-free algorithms is not the removal of synchronization entirely, it boils down to reduce the critical section of the code to a single atomic operation which will be provided by the CPU itself.
However; if there's lock needed, the atomic CAS would fail.
Reference:
http://man7.org/linux/man-pages/man2/futex.2.html
The futex() system call provides a method for waiting until a certain condition becomes true. It is typically used as a blocking construct in the context of shared-memory synchronization. When using futexes, the majority of the synchronization operations are performed in user space. A user-space program employs the futex() system call only when it is likely that the program has to block for a longer time until the condition becomes true. Other futex() operations can be used to wake any processes or threads waiting for a particular condition.
Go by example code from Eli Bendersky
Child process:
Parent process:
wait_on_futex_value:
loop that waits
pause if the val is the expected value, but not yet being waked yet.
If val is not the expected value, continue looping.
Then another process sent out wake event, stop pausing, check if the val is the expected value, i.e not a spurious wake up call, if is, returns.
FUTEX_WAIT (since Linux 2.6.0)
This operation tests that the value at the futex word pointed to by the address uaddr still contains the expected value val,and if so, then sleeps waiting for a FUTEX_WAKE operation on the futex word. The load of the value of the futex word is an atomic memory access (i.e., using atomic machine instructions of the respective architecture). This load, the comparison with the expected value, and starting to sleep are performed atomically and totally ordered with respect to other futex operations on the same futex word. If the thread starts to sleep, it is considered a waiter on this futex word. If the futex value does not match val, then the call fails immediately with the error EAGAIN.
The purpose of the comparison with the expected value is to prevent lost wake-ups. If another thread changed the value of the futex word after the calling thread decided to block based on the prior value, and if the other thread executed a FUTEX_WAKE operation (or similar wake-up) after the value change and before this FUTEX_WAIT operation, then the calling thread will observe the value change and will not start to sleep.
If the timeout is not NULL, the structure it points to specifies a timeout for the wait. (This interval will be rounded up to the system clock granularity, and is guaranteed not to expire early.) The timeout is by default measured according to the CLOCK_MONOTONIC clock, but, since Linux 4.5, the CLOCK_REALTIME clock can be selected by specifying FUTEX_CLOCK_REALTIME in futex_op. If timeout is NULL, the call blocks indefinitely.
wake_futex_blocking:
send wake up event.
FUTEX_WAKE (since Linux 2.6.0)
This operation wakes at most val of the waiters that are waiting (e.g., inside FUTEX_WAIT) on the futex word at the address uaddr. Most commonly, val is specified as either 1 (wake up a single waiter) or INT_MAX (wake up all waiters). No guarantee is provided about which waiters are awoken (e.g., a waiter with a higher scheduling priority is not guaranteed to be awoken in preference to a waiter with a lower priority).
figure: https://lwn.net/Articles/360699/
Timed blocking with FUTEX_WAIT
Ok, isn't this familiar?
Golang's channel + context timeout.
And yet we could use Golang channel to implement a mutex lock.
Reference:
https://eli.thegreenplace.net/2018/basics-of-futexes/
http://man7.org/linux/man-pages/man2/futex.2.html
A futex overview and update
[C++] something about spinlock
Futexes are tricky [PDF]
Basics of Futexes
Background:
System calls are expensive.Context switch happens between userspace and kernel space.
Thus, we have VDSO ( http://vsdmars.blogspot.com/search/label/linux_vdso ) to relax some system calls' burden, as for locks, we have futex.
The difference here is that futex doesn't involve any business logic but simply to acquire the lock to access memory concurrently safe.
It is likely that when a thread acquires a lock, the lock hasn't been locked yet.
In this case, no system call involved, a compare_and_swap atomic instruction would be enough. ( cmpxhg , which is cheaper than a system call )
Reference:
https://stackoverflow.com/a/27856649
The high-level locks that lock-free algorithms try to avoid can guard arbitrary code fragments whose execution may take arbitrary time and thus, these locks will have to put threads into wait state until the lock is available which is a costly operation, e.g. implies maintaining a queue of waiting threads.
This is an entirely different thing than the CPU LOCK prefix feature which guards a single instruction only and thus might hold other threads for the duration of that single instruction only. Since this is implemented by the CPU itself, it doesn’t require additional software efforts.
Therefore the challenge of developing lock-free algorithms is not the removal of synchronization entirely, it boils down to reduce the critical section of the code to a single atomic operation which will be provided by the CPU itself.
However; if there's lock needed, the atomic CAS would fail.
2 choices here:
- busy 'for loop CAS' (spinlock), which consumes CPU core power. Although it's in userspace, still a very bad idea.
Reference:
https://vsdmars.blogspot.com/2018/09/c-something-about-spinlock.html - "sleep" (i.e pause) until the lock free.
Reference:https://vsdmars.blogspot.com/2018/09/c-something-about-spinlock.html
As for 'pause':
Pause Intrinsic can help prevent a busy wait from completely overwhelming the system, by inserting pauses in the instruction stream that prevent the busy loop from overwhelming the processor.
This is particularly important on hyperthreaded systems since it gives the other logical core time to run.
If must busy wait then be sure to use pause.
Reference:
http://man7.org/linux/man-pages/man2/futex.2.html
The futex() system call provides a method for waiting until a certain condition becomes true. It is typically used as a blocking construct in the context of shared-memory synchronization. When using futexes, the majority of the synchronization operations are performed in user space. A user-space program employs the futex() system call only when it is likely that the program has to block for a longer time until the condition becomes true. Other futex() operations can be used to wake any processes or threads waiting for a particular condition.
Focus on 2 futex system calls:
- FUTEX_WAIT (mutex.Lock)
waits on an event.
Caller is suspended by the kernel and will only be scheduled awake when there's a wake-up signal. - FUTEX_WAKE (mutex.Unlock)
signals an event.
Go by example code from Eli Bendersky
Child process:
- Waits for 0xA to be written into a shared memory slot.
- Writes 0xB into the same memory slot.
Parent process:
- Writes 0xA into the shared memory slot.
- Waits for 0xB to be written into the slot.
wait_on_futex_value:
loop that waits
pause if the val is the expected value, but not yet being waked yet.
If val is not the expected value, continue looping.
Then another process sent out wake event, stop pausing, check if the val is the expected value, i.e not a spurious wake up call, if is, returns.
FUTEX_WAIT (since Linux 2.6.0)
This operation tests that the value at the futex word pointed to by the address uaddr still contains the expected value val,and if so, then sleeps waiting for a FUTEX_WAKE operation on the futex word. The load of the value of the futex word is an atomic memory access (i.e., using atomic machine instructions of the respective architecture). This load, the comparison with the expected value, and starting to sleep are performed atomically and totally ordered with respect to other futex operations on the same futex word. If the thread starts to sleep, it is considered a waiter on this futex word. If the futex value does not match val, then the call fails immediately with the error EAGAIN.
The purpose of the comparison with the expected value is to prevent lost wake-ups. If another thread changed the value of the futex word after the calling thread decided to block based on the prior value, and if the other thread executed a FUTEX_WAKE operation (or similar wake-up) after the value change and before this FUTEX_WAIT operation, then the calling thread will observe the value change and will not start to sleep.
If the timeout is not NULL, the structure it points to specifies a timeout for the wait. (This interval will be rounded up to the system clock granularity, and is guaranteed not to expire early.) The timeout is by default measured according to the CLOCK_MONOTONIC clock, but, since Linux 4.5, the CLOCK_REALTIME clock can be selected by specifying FUTEX_CLOCK_REALTIME in futex_op. If timeout is NULL, the call blocks indefinitely.
wake_futex_blocking:
send wake up event.
FUTEX_WAKE (since Linux 2.6.0)
This operation wakes at most val of the waiters that are waiting (e.g., inside FUTEX_WAIT) on the futex word at the address uaddr. Most commonly, val is specified as either 1 (wake up a single waiter) or INT_MAX (wake up all waiters). No guarantee is provided about which waiters are awoken (e.g., a waiter with a higher scheduling priority is not guaranteed to be awoken in preference to a waiter with a lower priority).
- Futexes are kernel queues for userspace code.
- A futex is a queue the kernel manages for userspace convenience.
- Futex allows userspace code asking the kernel to suspend until a certain condition is meet, and allows other userspace code signal that condition and wake up the waiting processes.
- Futexes are implemented in kernel/futex.c
- Kernel keeps a hash table keyed by the address to quickly find the proper queue data structure and adds the calling process to the wait queue.
figure: https://lwn.net/Articles/360699/
Timed blocking with FUTEX_WAIT
Ok, isn't this familiar?
Golang's channel + context timeout.
And yet we could use Golang channel to implement a mutex lock.
Reference:
https://eli.thegreenplace.net/2018/basics-of-futexes/
http://man7.org/linux/man-pages/man2/futex.2.html
A futex overview and update
[C++] something about spinlock
Futexes are tricky [PDF]
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.