The current idea then is that SMR maintains a list of thread states per NUMA node.
If the user starts a thread and pins it to a core, then the core number is known, which means we can know the NUMA node and the user can allocate from the correct NUMA node and we can put that allocation into the correct list.
But what if the user doesn’t pin the a thread to a core? what if he doesn’t mind which NUMA node the thread runs in?
Then we need to be able to query the OS to find out which core the thread is running on.
Turns out this functionality, for Windows, turned up only in Vista! and even then, only for systems with less than 64 logical cores. For systems with more, you need Windows 7!
Another big drawback is that it requires liblfds proper, rather than just the benchmark library, to have knowledge of CPU topology. That’s fine if you don’t have to port it, but it’s a PITA if you do.
Also though imagine porting to a new platform. A system requirement is knowing which code a thread is running upon. I bet that’s not commonly supported.
But if you want to be NUMA aware, hell, you gotta know which damn NUMA node you’re using!
I think thought it’s reasonable to assume threads are pinned to cores – no high performance design will have floating threads, or more than one thread per logical core.
Still means I need to know CPU topology. That’s not good. I think thought a simplified version could be implemented, where the data is only cores and NUMA – no info on caches. This I can query from the OS for Windows and Linux. For porting to non-NUMA aware platforms, they can lie and state everything belongs to the first NUMA node.