freelist scaling update #2

The single-threaded freelist with elimination layer is faster than spinlocks because it gets to pop/push with a single exchange, where-as spinlocks have to do two single-word CAS.

The best solution so far (for the current implementation) seems to be incrementing the push/pop EA array index on every push/pop, but if a pop is successful then setting the push EA array index to be that of the success pop EA array index to that value, and when a pop is successful setting the push EA array index to the pop value (i.e. we know we’ve just pop/pushed successfully on a given EA array index, so we can know – for a bit – that the next push/pop will be okay on that index).