On what is I think now the fifth attempt (over the course of about two years) I finally have the design of the benchmark programme correct.
It’s a complex beasty – you have the data structures, the benchmarks, the many different types of locks (and the lock-free as well of course), where the locks vary by platform, CAS and DWCAS backoffs need to vary, where each benchmark runs over a range of logical processor sets (e.g. say on every logical core, then on one logical core per NUMA node, etc, etc – the full meainngful set of sets), and where you want to be able to produce arbitrary gnuplots afterwards.
It used to be more complex – there were NUMA and non-NUMA variants to consider, and also before the SMR variants of data structures became a separate data structure, you also had to consider the SMR and non-SMR variants.
I remember spending two weeks in particular Tunis working on this – it’s a deadly trap. The problem is you can just about fully optimize the code layout; abstract the lock types, loop over the benchmarks, loop over the data structures, etc, etc, etc – you end up with a perfect factorization. The problem is, it’s at the limit of what you can manage… so when you come back to it, you can’t remember how it works, and, more painfully, if *ANY* new requirement comes along, it can’t fit – but you’re already at your limit *even when you were fresh to the code and had it all in your mind*.
A simpler approach was needed.
What I’ve done is basically two things – first, I’ve removed the per-lock abstraction. Each data structure now has an implementation for each lock. This was actually needed because having one template data structure which used function pointers for locks added the overhead to the locking benchmark runs of following that function pointer. Probably the cost is very small – even negligible – but it’s unwise to assume, so it would have to be investigated to be sure, and of course adding overhead to the locking benchmarks favoured liblfds. Second, I’ve used an abstraction layer, as with liblfds; the abstraction layer implements the locks – all of themm, for all platforms, dummying them up if they’re not actually available – and all the code above that is always fully compiled on all platforms.
I’m still thinking about what to do with the output though, since there can be many thousands of output records, and I want to access them arbitrarily to make gnuplots – I could run into a performance problem!
Anyways, I’ve just made the full basic structure of the benchmark programme compile. There’s a lot of code thrown in there from the previous attempts – it all needs a lot of work on quality – but it compiles and the fundamental design is correct, so we’re on the right track.
Hopefully will tomorrow have some gnuplots for freelist.