SMR design

So, I actually did a bit of Googling to see the literature for SMR. I wanted to see if I was missing something obvious. Probably I still am 🙂

However, it looks like I might actually have something minutely novel, so I figured I’d write it up a bit here.

As an aside, I’ve also now make the generation-advancing function safely multi-threaded, so callers don’t need to worry about that anymore, and you’ll still get forward progress, etc.

So, SMR design.

It’s epoch based.

There’s a main SMR state, and there’s one state per thread.

A thread registers its state with the main state, which has an atomic add-only list of thread states (which can be in the states active, available, retired, etc – it’s add-only, so callers can try to re-use an available state in the list; each thread state knows its NUMA node number, since we care a lot on the per-thread level about NUMA appropriateness).

Each thread state has a single state variable, which is used as a bitmask, so we can atomically (single CAS) modify all the state information in one operation.

The main state has a generation counter which begins at 0.

So, we begin operations – make a main state, each thread makes a per-thread state and registers with the main state.

All the thread begin idle.

Now a thread comes to perform an operation using lock-free operations; it calls a macro, “BEGIN_LOCK_FREE”, which non-atomically sets a bit in the per-thread state which indicates a read section is in progress, and issues a store barrier.

The thread then finishes the lock-free operations, and so calls a second macro, “END_LOCK_FREE”. This non-atomically clears the “read section in progress” bit, and sets a second bit, “exited read section”.

When a thread has removed an allocation from the (say) lock-free data structure and wants to reuse it, it places the allocation in a single-threaded list which is in the per-thread state, noting the generation count in the main state.

So, now we’re humming along – threads are entering and exiting read sections, threads are submitting elements for reuse.

Now what?

Now we come to the point where the user calls the function to try to advance the generation counter in the main state.

We iterate over the list of thread states in the main state. Now, each thread has two status bits we care about – one is a flag, which is raised when the thread enters a lockfree section and lowered when it leaves, the other is a flag which is raised when a thread exit a lockfree section; the threads themselves never lower this flag – it is lowered, and in every thread state, by this function we’re in now, which is trying to advance the generation counter, if and only if the generation counter is advanced.

What we need, to advance the generation counter, is that every thread has exited a lock-free section (which means all elements queued for reuse are safe) or has been idle (hasn’t entered a lock-free section at all, since the last scan, and so all elements queued for reuse are safe).

So; if we see the exited flag is set, we know a thread has been in and has exited a read section – we’re good.

If we see the active flag is set, and the exited flag is set, we’re still good.

However, if we see the active flag is set, and the exited flag is *not* set, then we’re screwed – we can’t advance the counter.

So, if we can’t advance the counter, we don’t, that’s that. We return.

If we can advance the counter, we do so – but now comes one final vital point.

So – we have threads queuing up elements for reuse. The generation counter begins at 0, so these elements have a generation count of 0. When we check to advance the generation counter, we need to know that every thread has been completely idle, or has exited a read section *after the element was submitted for reuse*. However, when we do get round to scanning the thread states, all we can see is that the exit bit has been set… so we know all the threads HAVE exited a read section, but we don’t know WHEN. It could have been (say) right after only the very first element was submitted – and then it might be one of the threads has been stuck in a long running read section all the time since then – which would mean only the first element was actually safe for reuse.

How do we handle this?

The answer is that when a thread comes to scan its release candidate list, comparing each elements generation counter with the current main state generation counter, elements are only released when the difference is greater than TWO, not ONE.

In other words – having done this first scan (and finding all threads have been idle or have exited), we do advance the generation counter *and we set the exited bit in each thread state to lowered* (this is vital, remember it) – but that does not mean the previous generation (0 in this case) can now be released. It cannot – for we do not know when each thread exited, so we cannot know which elements are safe to release. However, having lowered the exit bit, when we come to scan again, if we see *AGAIN* all threads have been idle or have exited a read section, THEN ALL THREADS MUST HAVE EXITED **AFTER** THE FINAL GENERATION 0 ELEMENT WAS SUBMITTED – which means generaton zero is now safe to release.

So we’re always a generation behind on releasing.

I’m thinking I must have missed an obvious way to simplify this – but I can’t see it. We have to know if a thread had entered a lockfree section (to detect idle threads). We have to know when all the threads have exited (so we can know to advance the generation counter). I mean, basically, the design is that threads indicate they’ve exited a lockfree section, and we notice this only whenever the user calls the generation counter advance function, so we can’t know when they exited, only that they have, so we have to have two rounds of every thread having exited to know the generation before last is clear and safe to release.