Just uncovered a stonking design flaw in the freelist elimination layer.
The idea is to have one cache line for every logical core.
What I was actually doing was having one atomic_isolation per logial core.
On CAS platforms, this is nominally one cache line (except of course Intel are now bringing over two cache lines at once, so even there it’s wrong) but on ARM where the max ERG is 2048, instead of having one cacheline I had a huge 2kb.
So what I actually need is one cache line, with atomic_isolation separation.