Intro to Lock Free

Atomics

Atomic types

If one thread writes to an atomic object while another thread reads from it, the behavior is well-defined (see memory model for details on data races). In addition, accesses to atomic objects may establish inter-thread synchronization and order non-atomic memory accesses as specified by std::memory_order.

Atomic operations

These components are provided for fine-grained atomic operations allowing for lockless concurrent programming. Each atomic operation is indivisible with regards to any other atomic operation that involves the same object. Atomic objects are free of data races.

Compare-And-Swap

Compare-And-Swap (CAS) is a basic read-modify-write (RMW) atomic operation. It means that it can't be divided - it is either done or not, and there is no intermediate execution state. It has the following pseudocode:

bool CAS( Node* addr, Node expected, Node new_val ) {
    if ( *addr == expected ) {
       *addr = new_val;
       return true;
    }
return false;
}

There are two implementations of that function in C++: compare_exchange_strong and compare_exchange_weak. The weaker version may result in spurious failures when the comparison fails even with the same content of the nodes.

ABA problem

Consider the situation with 2 threads. The first thread T1 reads the value of a node. Then, another thread T2 interrupts the execution and pops that same node from the queue, deallocates and reallocates it, and pushes it back to the queue. Only after that, the execution switches again to the T1, and because the value of the node remains the same, the next Compare-And-Swap could succeed using the old pointer to the deallocated memory, even though the state has changed. This problem is known as an ABA problem.

Tagged pointers

Tagged pointers are used to deal with the ABA problem. To remain the nodes unique, each of them stores a version tag. With each CAS instruction, the tag is being incremented by 1. Thus, in the previously described ABA problem, the nodes will be distinguished by the different tag values as different nodes, and the CAS will fail as expected.

To be able to use a 64-bit CAS on the node, we can use the aligned and not used values of a 64-bit pointer to store the tag value. Thus, the first 12 and the last 2 bits of the pointer are used to store the tag, and the rest 50 bits are used for the memory address itself.

Considering the fact that the maximal possible value of the tag is equal to (2^16 - 1) = 65535, there is a possibility for the tag to be overflowed. To deal with that, the tag is cyclic; it restores the version counter to 0 as soon as it reaches the upper limit.

Memory orders

Enumerator std::memory_order is used for setting the rules for memory access to be ordered around atomic operations. Absent any constraints on a multi-core system, when multiple threads simultaneously read and write to several variables, one thread can observe the values change in an order different from the order another thread wrote them. Indeed, the apparent order of changes can even differ among multiple reader threads. Some similar effects can occur even on uniprocessor systems due to compiler transformations allowed by the memory model.

The default behavior of all atomic operations in the library provides for sequentially consistent ordering (see discussion below). That default can hurt performance, but the library's atomic operations can be given an additional std::memory_order argument to specify the exact constraints, beyond atomicity, that the compiler and processor must enforce for that operation.

For every read or write operation, it is required to choose such memory order that is strict enough, but it is preferable for it to be not too slow.

Back-off

Back-off strategies are used to increase container performance.

template<typename U> void enqueue(U&& value) {
    Backoff backoff;
    while ( true ) { 
    ... all the boring stuff ...
    ... CAS unsuccessful, the loop continues ...
    backoff(); 
    }
}

There are 2 strategies implemented as functors:

Yield back-off There is a set lower bound for the number of unsuccessful CAS instructions. This back-off will be executed only if the number of unsuccessful CAS's reaches that bound and will not be executed otherwise. The yield strategy is to perform std::this_thread::yield(), which switches the execution onto other threads that modify the container structure and are the reasons for the unsuccessful CAS operations in the current thread.
Exponential back-off After every unsuccessful CAS instruction, we perform a certain number of nop instructions, starting with a small initial value. With every unsuccessful CAS, that value increases exponentially on a given step until it reaches the upper bound.

Related sources:

Atomics:
- https://en.cppreference.com/w/cpp/atomic/atomic
- https://youtu.be/ZQFzMfHIxng
Intro to the lock-free programming: https://habr.com/ru/company/wunderfund/blog/322094/
Safe Memory Reclamation (tagged pointers, hazard pointers, etc.): https://habr.com/en/post/202190/
Memory ordering:
Back-off strategies: https://habr.com/ru/post/216013/

Libraries:

Libcdc:
- Source code: https://github.com/khizmax/libcds
- Doc: http://libcds.sourceforge.net/doc/cds-api/index.html
- Intro guide: https://habr.com/ru/post/196834/
Boost:
- Source code: https://www.boost.org/doc/libs/1_60_0/doc/html/lockfree.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly