I’ve been thinking quite a bit.
I’m currently (re-re-re-re-)writing the benchmark app.
That’s fine. I’ve gone back to a using a library for as much stuff as possible, because I find it easier to think and lay out code properly. When writing libraries these days, I look to avoid memory allocation – I have the user pass in allocations. Problem is when you come to something where the size of the allocation cannot reasonably be known – it’s the result of some complex work inside the library, so the user can’t know beforehand how much store to allocate.
One simple, inefficient solution is to ask the user to pass in worst-case allocations. For smaller allocations, where the worst-case can easily be calculated, this can be okay, and I’m pretty sure I can use this approach for the benchmark app, in its current form – although this is because the benchmark app makes no large per-NUMA node allocations; all the per-NUMA stuff can be allocated on the stack, in each benchmark thread, and so naturally (normally – OS decides but this is how Windows and Linux work) goes on the NUMA node for that thread.
If the benchmark did make large per-NUMA node allocations, the user would have to make them and hand them over…
…which is exactly how things are with the test app. This *does* make large per-NUMA node allocations.
In fact, what ends up emerging from all this is that the benchmark and test both become libaries, and the apps are just stdlib main()-type binaries which call the very high level benchmark/test call (singular), handing over the necessary memory, and getting results back.
This way embedded users who cannot malloc simply reserve heap space and pass it in.
(Abstraction layer for I/O, where the output is going into a line buffer, which is passed to a user provided callback – both the normal output and the gnuplots going to this interface, which on the Windows/Linux stuff means to stdout – the user pipes to disk).
This almost suggests test and benchmark become integral in the liblfds library – but this isn’t so, for two reasons; firstly, it makes the library bigger, where test and benchmark only need to run for particular reasons rather than being generally available and secondly, test requires a thread-starter/waiter abstraction and benchmark requires a thread-starter/waiter and topology abstraction.
The test app generally right now needs to be rewritten from scratch to be something really decent.
The benchmark app is now in its design really decent, but I’m at the point where I need to choose between a malloc or non-malloc approach.