Ringbuffer disappointment

The ringbuffer uses the M&S queue and so needs a dummy element.

This disfigures the API.

I implemented Dmitry Vyukov’s bounded MPMC queue, to use that instead.

However, having implemented and now coming to adjust the ringbuffer, I realise it’s no use.

The problem is that it’s a soft queue and also that it runs from an array. By soft I mean that although all enqueue and dequeue operations occur, they are not necessarily visible to all other cores by the time the enqueue/dequeue function returns.

If we imagine a queue as a ringbuffer, then what it means is that if we try to enqueue and we cannot, because the queue is full, we dequeue, and then enqueue – throwing away one element.

The problem is that when we come to dequeue, someone else may dequeue just before us; our dequeue will succeed but if the other thread is first, then until that other thread’s dequeue becomes visible to us, we can’t continue, because we need the *next element in the array*, the one just dequeued, to become visible to us as dequeued – we can’t enqueue until *it* is available, regardless of the success of our own dequeue.

What’s more, we can’t just say “well, we did a dequeue, so we’ll get something sooner or later” because other threads could be genuinely enqueuing more stuff. So we can’t *know* whether we need to re-issue the dequeue.

The basic issue is that the queue is blocked by any one thread performing a dequeue where that thread is slow (to be clear, though, the queue was never meant to be otherwise – it’s just I’m really appreciating the impact of this now).

It can be solved by forcing a store at the end of the dequeue, so everyone has visibility. This halves the speed of the data structure (currently one CAS for enqueue and one for dequeue). Also, we still are exposed to the problem of a thread performing a dequeue being idled by the OS when it’s halfway – it’s done the first CAS, so the dequeue has occurred, bu it’s not yet made the element usable (adjusting the sequence number). During this period, an idled thread blocks the data structure.