On ARM, okay, so we likely don’t need to be cache-line aligned. But we would *like* to be, for that bit of extra performance. But this touches upon another issue – normal mis-aligned access. I’m manually forming up the state – there’s no help from the compiler silently padding. Some CPUs will abort on mis-aligned access, others are merely slow.
So… I have to pad all entries to atom_t boundaries.
I have to say the mounting complexity of run-time alignment is concerning me.