LL/SC – so close and yet so far

LL/SC as LL/SC looks like a bust.

You can only use it to emulate CAS.

To use LL/SC as LL/SC you have to be able to do a store to memory between the LL and SC.

I’ve been looking at ARM, and in fact it turns out that the exclusive access “bit” used by LL/SC is in fact a region the size of which varies by platform by ranges from 8 to 2048 bytes.

So if the STR target is in the same block as the LL/SC target, the STR inbetween the LL and SC breaks the LL/SC! so you loop forever.

And in fact the same basic problem happens on MIPS, with it’s LL/SC – it’s “exclusive bit” is a cache line. Now, you can easily ensure your STR target isn’t in the same cache line as the LL/SC target – but you *can’t* so easily ensure your STR target isn’t false sharing the cache line!

You’d have to get internal cache topology from somewhere and fiddle your allocs to ensure you never cache line share with the LL/SC target for that data structure instance.

In fact, that false sharing problem affects Intel with its CAS, but false sharing with CAS is just a performance hit, not an endless loop.

So, ARM can’t use LL/SC, PowerPC has false cache sharing to worry about. SPARC has only single word CAS. That leaves Alpha (about which I don’t care) and PowerPC. I think PowerPC is cache line based as well and will also fail from false sharing.

In fact though… thinking about it, the ARM approach to this is in fact genius. All you have to do for your given data structure instance state is align it and pad it to 2048 bytes!

ARM, I kiss you!

It’s the other CPUs which fail on false sharing which are the bastards!

Comments are closed.