So, yeah, thought about it a bit.
What stops the generation counter from advancing past a given generation? a thread which is in a read section. So when a thread enters a read section, it posts in its per-thread state the current main state generation value, and then clears that when it exits the read section. When we come to release reuse candidates, we scan the per-thread states, pick up any busy threads’ posted main state generation value, and the lowest value of them is how far up to we can release reuse candidates.
Idle threads are always permissive – no need to have any extra house-keeping to detect them – and what’s nice is because we’re now reversed, it doesn’t matter if threads when posting read an older version of the main state counter – it just makes us less efficient, rather than breaking the system.
However, it still means the main state counter has to increment every time a thread enters a read section (doing only on reuse is no good – I think you end up needing a period where no threads are in read sections, to be able to release reuse candidates, otherwise on a busy system you end up always seeing threads are in read sections) and these increments do need to be atomic – if we lost a write, a thread would think it is in an earlier generation than it really is, so we could reuse elements not yet safe to reuse.
Basically, I’m barking up the wrong tree – for performance, all information which lets a scan to advance the generation counter has to be stored and maintained in the per-thread state, with only read access to the main state. Only the scan to advance the generation counter can write the main state.
This is how the current mechanism works.