So, what I’m playing with now is timing a NOP and a CAS, so I can see how many NOP I need for a single CAS delay.
So let’s say I time one million NOP and one million CAS, which lets me figure out the ratio of NOP to CAS.
But the time for a single instruction is very short, so I time the whole loop.
Problem is, what happens if the thread is pre-empted during the loop? that’ll screw the timing up something rotten!
One idea is to time a million operations but also ten thousand operations. The ratio of time taken should be about 10:1. If one of the two loops is pre-empted, the ratio will be crazy. If both are pre-empted, the ratio remains crazy – the ten to one ratio of the work done is overwhelmed by the time spent ready-to-run, so the ratio looks like 1:1.
I’m curious to see what the ratio is (I’m guessing about 300 NOP to a CAS – although of course CAS is quicker than DCAS, so it varies on that as well).
I then want to see how much difference it makes when you have optimal (or at least approximately optimal) values for delay granularity.