Still working away on the benchmark app.

Added NUMA support.

This means if you have no NUMA, or one NUMA node, you get an SMP benchmark.

If you have two or more NUMA nodes, the benchmark allocates NUMA aware – so for example, the btree benchmark, which has 1024 elements per thread, allocates from the NUMA node of the thread.

There is however a second NUMA mode, for comparison purposes, where NUMA is not used. This is different to SMP or a single NUMA node sysem, however. Benchmarks themselves sometimes have significant admin work to do in the backround. As such, for any given iteration of the benchmark there might be say 20 memory accesses, but only 5 are actually by liblfds. As such, if we were to compare NUMA with non-NUMA simply by in the latter case putting all allocs in the same NUMA, we would not be comparing apples with apples – we would be seeing a MUCH larger decrease in performance, but where most of it is due to all the work the benchmark itself is doing.

As such, when testing non-NUMA, we need to ensure ONLY the memory accesses performed by liblfds are non-NUMA, so we need to ensure all the admin work is going to the local NUMA node. So that mode exists as well.

I’m thinking once this is done of quickly knocking out a bounded MCMP queue, so I can simplify the ringbuffer API and improve its performance.