I’ve implemented the first comparative benchmark. It’s for the freelist. The locking based freelist simply using a mutex. (I can in future also do further locking variants, such as a spinlock).
These are the results, and they’re an eye-opener!
Release 7 Lock-Free Freelist Benchmark #1 CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability 1,302325258,30232526,0,1.00 2,297807034,14890352,27333,0.49 3,110733252,3691108,760105,0.12 4,107517282,2687932,143228,0.09 Release 7 Mutex Freelist Benchmark #1 CPUs,total ops,mean ops/sec per CPU,standard deviation,scalability 1,23123970,2312397,0,1.00 2,1501904,75095,335,0.03 3,1683186,56106,17,0.02 4,1662666,41567,37,0.02
We see lock-free is 10x faster with one core; and is 100x faster with two cores! (the three and four core rows are to be ignored; they’re hyperthreading cores, not real cores).
However, we still see that scalability for lock-free with two cores is < 0.5, so you still get more work done with one core than two. This might be due to live-lock. Backoff is on the to-do list.
A note on the mutex freelist. It -is- the lock-free freelist; the source code was copied, the type names are changed (prefixed with an ‘m’) and that where the lock-free freelist uses lock-free instructions, the mutex freelist obtains a mutex, does the necessary pointer manipulation, and then releases the mutex.
The benchmark is identical in both cases, because it is the same benchmark; it is now given pointers to functions. For the lock-free benchmark, those functions point to the lock-free freelist functions (new, delete, push and pop) and for the mutex benchmark, to the matching mutex freelist functions.