API design experimentation

I’ve been toying with the freelist header file, seeing how to present two ideas I’ve had for a long time.

First is to add support for single-threaded use. This means you can flip the freelist into singlethreaded mode, from which point onwards only the thread which made the flip can use the freelist, until that thread flips it back to multithreaded mode, and then everyone else can join in again.

Singlethreaded mode is at least an order of magnitude faster than multithreaded mode used by a single thread; if you’re doing performance work, you simply cannot use a lock-free freelist in a single thread – it’s insane – but currently you’d have to, because if you did build up a single-threaded freelist, how would you get those elements into the lock-free freelist? (think also about stacks, queues, etc)

It also means the freelist can *in general* be used as a single-threaded freelist, which is often required. It also then provides support for scenarios like one thread building up a freelist and then throwing it over to a set of other threads (okay – not something you’d do so much with a freelist, but imagine a stack, or queue, etc, all of which should get the same treatment).

Providing a fully single-threaded mode provides the necessary functionality for the benchmark programme’s benchmark for the unbounded single/single queue.

Second is being able to form freelist elements up into a chain of freelist elements, and then pushing the chain to the freelist, where that push regardless of the chain length occurs as a single atomic operation. So we’re offering write combining in the API.

I’ve done some actual like work!

Shock, horror, I’ve done some real work.

For two months now I’ve been working mainly on the re-write of a back-end system at work. It’s taken forever because it’s an office job, rather than working from home. It *should* have taken a week.

So this weekend I’ve been working on getting the benchmark back into operation. I’ve needed to modify the code so that the thread functions are stored on a per-thread basis, rather than on a per-benchmark basis, because I’ve introduced a benchmark for a single/single data structure – so I have now an enqueue thread and a dequeue thread, i.e. different thread functions – whereas before the threads all ran the same function.

I’ve done the code for this now (I can make it compile now, but I haven’t just yet) but it’s become clear I now need a single threaded freelist, so the enqueue thread in the queue callback function where it liberates emptied queue elements from the queue can return those elements to the freelist for the main queue thread to use.

This lines up with my long, long time wish to add single threaded functionality *in general* to the data structures. The reason for this is that it permits *amalgamation of operations* by a single thread – e.e. if a thread has a number of elements to push, it can combine them into a single local chain, and then push that – only one atomic operation.

In general, *elimination* (e.g. a push and a pop cancel each other out) *removes* load from the lock-free data structure, and *combination* acts to *reduce* load. Both are desireable.

So when this is done I’ll have a benchmark and so a gunplot for the unbounded, single reader/single writer queue. THis is an important data structure, because its performance is so very great.