blog

2018-08-05

Update

Made the test and benchmark library compile.

Both liblfds and libtest_and_benchmark (libtab for short) have subsets of libstds (library single threaded data structures) in.

Originally and wrongly I was expanding the subset in liblfds, where those data structures were being used by libtab. Now I have only the data structures needed by liblfds in liblfds, and only those needed by libtab in libtab.

I now need to move libtab fully away from using liblfds to using libstds.

It was a blunder to have used liblfds, because liblfds provides data structures to the extent you have atomic support, which means you might not have a list, for example - but libtab uses the list everywhere.

Actually maintaining this portability behaviour in the code is a lot of work. If I just assumed x64-level atomics, the portability code would go away. In a sense it matters, because the portability code right now is untested. I do not - and I will need to - build variants which pretend to have less support. With software if it’s not tested, it doesn’t work.

2018-08-06

Update

Just finished moving the test and benchmark library over to the single threaded data structures.

2018-08-12

Shared memory and NUMA

I’ve been thinking about shared memory and NUMA.

Windows always does things differently to Linux, which is usually bad, because Linux usually gets it right or pretty much right.

I think Linux made a bad job of NUMA. Linux tries to make NUMA go away, in the sense of making it so the developer doesn’t need to think about it. This is done by the OS offering NUMA policies, which control how memory allocations are handled with regard to NUMA - local node, striping across all nodes, etc. Critically, when a page has been paged out and then is paged back in, the page is normally expected to be able to change which NUMA node it is in (although it might well not do so).

Windows, which went for a more “here are the controls, do the right thing” approach, is more like C. The developer has to handle the matter.

The library supports bare metal platforms so it does not perform memory allocation; rather, the user passes memory in. The same has to be true for the test and benchmark application, so it can be run on bare metal platforms.

So the user allocates memory and passes it in.

But what happens about shared memory, for the position independent data structures?

THe user allocates shared memory, rather than normal memory, and passes it in, and the child test processes when they run open the shared memory and use it.

So that’s okay.

What happens with NUMA?

The user allocates equal memory on each NUMA node and passes it all in.

There’s a function for this in Windows and Linux, so that’s okay for Windows, but what about Linux moving pages between NUMA nodes on paging-in? the only way to stop this is to pin a memory page, so it cannot be paged out.

So, okay, I can do this for the tests and benchmarks.

What about shared memory with NUMA?

Well, obviously now I would need to allocate equal blocks of shared memory on each NUMA node and pass them in.

Oh. Problems.

On Windows it’s fine - there’s a function to allocate shared memory on a specific NUMA node.

On Linux, there is no such function. Shared memory is placed on NUMA nodes just as non-shared memory, according to the NUMA policy.

I think I might be able to change the NUMA policy just before creation of the shared memory to use and only use a singe NUMA node, the one I want to use; but shared memory like all allocations is really allocated on faulting, so doing this doesn’t do anything.

I suspect what I need to do is change NUMA policy, create shared memory, pin the memory, then fault every page, then revert NUMA policy.

(Another way, says SO, is to create, then move the pages to the desired NUMA node.)

Obviously, this all feels wrong.

Am I doing the wrong thing?

Should I just suck it up and let Linux do what it want to do?

One issue here is comparing like with like.

Actually it raises the question of what is like with like?

If I run the benchmarks on Windows, with low-level NUMA control, and then I run them on Linux, with the same low-level NUMA control, I have like with like.

But if on Linux users are simply using NUMA policy, then I’m coming apples and oranges… …except if Linux is normally like this, then it really is what you normally get, and so that is what you have to compare against.

Shared memory

Position independent data structures support shared memory (i.e. differing virtual address ranges) by using offsets from a known base rather than full virtul addresses.

So far I’ve only supported s single shared memory segment, so all data used has to be in that one segment. The offset is from the data structure state.

This is obviously a problem with NUMA.

With NUMA, you might well want to have a shared memory segment in every NUMA node.

This means in general multiple shared memory segments, which means mutiple offsets, which means when you are manipulating elements in the data structure and so working with offsets, knowing which shared memory segment a given offset is from, so you can know its base.

Central to almost all data structures is the atomic compare-and-swap (CAS).

If we have one segment only, we can compare the offsets across all the different virtual memory ranges and we will know we’re comparing the same data structure element.

If we have multiple segments, we can have the same offset but in different segements. Somehow we have to know, in the CAS, which segment an offset belongs to.

The only way I can see to do this is to borrow some most significant bits.

On 64-bit platforms this should be fine.

If we borrow say 8 bits, we can have 256 shared memory segments, and we have 56 bits remaining for the offset.

On 32-bit platforms it barely works.

If we borrow just 4 bits, and so can have 16 shared memory segments, we have 28 bits left over for the offset - which is 512mb.

It also means we have at times to do a lookup, in the data structure; we have an array, and here we store the base addresses of the different segements, and we look them up when we need to convert the offset to a full virtual address (which we do when we pass elements back to the user, i.e. after a dequeue() or pop()).

Position independence without NUMA is basically a fail, so I think this has to happen.

2018-08-13

Multiple shared memory segments, NUMA, Linux and Windows

I bin learning fings, Oi have.

With position independent (i.e. or maybe e.g. shared memory) data structures;

On Linux you do not need support for multple shared memory segments as far as NUMA is concerned.

This is obvious really - you just turn on striping.

You do need support for multiple shared memory segments just because, i.e. the user may want this for whatever reason.

On Windows, you do need support for multiple shared memory segments as far as NUMA is concerned, to perform striping manually, which is how you have to do it under Windows.

You also need it for itself, as on Linux.

More NUMA / shared memory thoughts

Spent the day thinking over shared memory and NUMA.

Supporting a single segment of shared memory is smooth and graceful. It looks good in the API, is simple and easy to understand for the user.

Multiple segments is messy. The user needs to provide per-process state, and to register each segment in each process, before it can be used. Most significant bits have to be taken from the offset value, to indicate which segment the offset is from. When the user passes in a pointer, a lookup has to occur to figure out which segment that pointer is from.

There is a reason to use multiple segments in Linux.

This is that memory policy is on a per-process basis, not per-data structure.

So if I go striped, fine, I can allocate one shared memory block and it’ll be striped on a page basis.

But what if I want striped for one data structure, but something else for another?

There is only one policy, and it is enforced when pages are swapped back in, so you can’t set it, do stuff, and then change it : whatever you have set now is what gradually comes to be applied, as pages swap in and out.

In fact this is a problem anyway : if I do have multiple shared memory segments, one per NUMA node, and I’m so controlling my NUMA directly, and striping on a per-entity bais - memory policy will mess it up for me by applying itself to my allocations.

So there is only one memory policy and it applies to everything in your process, like it or not. You’re fucked anyway. Multiple segments will not save you, unless you pin the pages so they can’t swap, which isn’t a reasonable thing to ask.

So on Linux, multple shared memory segments are not useful, because memory policy stops you from controlling your own NUMA anyway.

On Windows, you do need multiple shared memory segments because the OS does not control NUMA. You do it yourself. So if you want to spread an allocation over multiple NUMA nodes, you need to manually allocate on each of them and then put those elements into the data structure.



Home Blog Forum Mailing Lists Documentation GitHub Contact

admin at liblfds dot org