More NUMA / shared memory thoughts

Spent the day thinking over shared memory and NUMA.

Supporting a single segment of shared memory is smooth and graceful. It looks good in the API, is simple and easy to understand for the user.

Multiple segments is messy. The user needs to provide per-process state, and to register each segment in each process, before it can be used. Most significant bits have to be taken from the offset value, to indicate which segment the offset is from. When the user passes in a pointer, a lookup has to occur to figure out which segment that pointer is from.

There is a reason to use multiple segments in Linux.

This is that memory policy is on a per-process basis, not per-data structure.

So if I go striped, fine, I can allocate one shared memory block and it’ll be striped on a page basis.

But what if I want striped for one data structure, but something else for another?

There is only one policy, and it is enforced when pages are swapped back in, so you can’t set it, do stuff, and then change it : whatever you have set *now* is what gradually comes to be applied, as pages swap in and out.

In fact this is a problem anyway : if I do have multiple shared memory segments, one per NUMA node, and I’m so controlling my NUMA directly, and striping on a per-entity bais – memory policy will mess it up for me by applying itself to my allocations.

So there is only one memory policy and it applies to everything in your process, like it or not. You’re fucked anyway. Multiple segments will not save you, unless you pin the pages so they can’t swap, which isn’t a reasonable thing to ask.

So on Linux, multple shared memory segments are not useful, because memory policy stops you from controlling your own NUMA anyway.

On Windows, you do need multiple shared memory segments because the OS does not control NUMA. You do it yourself. So if you want to spread an allocation over multiple NUMA nodes, you need to manually allocate on each of them and then put those elements into the data structure.

Multiple shared memory segments, NUMA, Linux and Windows

I bin learning fings, Oi have.

With position independent (i.e. or maybe e.g. shared memory) data structures;

On Linux you do not need support for multple shared memory segments *as far as NUMA is concerned*.

This is obvious really – you just turn on striping.

You do need support for multiple shared memory segments *just because*, i.e. the user may want this for whatever reason.

On Windows, you *do* need support for multiple shared memory segments *as far as NUMA is concerned*, to perform striping manually, which is how you have to do it under Windows.

You also need it for itself, as on Linux.

Shared memory

Position independent data structures support shared memory (i.e. differing virtual address ranges) by using offsets from a known base rather than full virtul addresses.

So far I’ve only supported s single shared memory segment, so all data used has to be in that one segment. The offset is from the data structure state.

This is obviously a problem with NUMA.

With NUMA, you might well want to have a shared memory segment in every NUMA node.

This means in general multiple shared memory segments, which means mutiple offsets, which means when you are manipulating elements in the data structure and so working with offsets, knowing which shared memory segment a given offset is from, so you can know its base.

Central to almost all data structures is the atomic compare-and-swap (CAS).

If we have one segment only, we can compare the offsets across all the different virtual memory ranges and we will know we’re comparing the same data structure element.

If we have multiple segments, we can have the same offset but in different segements. Somehow we have to know, in the CAS, which segment an offset belongs to.

The only way I can see to do this is to borrow some most significant bits.

On 64-bit platforms this should be fine.

If we borrow say 8 bits, we can have 256 shared memory segments, and we have 56 bits remaining for the offset.

On 32-bit platforms it barely works.

If we borrow just 4 bits, and so can have 16 shared memory segments, we have 28 bits left over for the offset – which is 512mb.

It also means we have at times to do a lookup, in the data structure; we have an array, and here we store the base addresses of the different segements, and we look them up when we need to convert the offset to a full virtual address (which we do when we pass elements back to the user, i.e. after a dequeue() or pop()).

Position independence without NUMA is basically a fail, so I think this has to happen.

Shared memory and NUMA

I’ve been thinking about shared memory and NUMA.

Windows always does things differently to Linux, which is usually bad, because Linux usually gets it right or pretty much right.

I think Linux made a bad job of NUMA. Linux tries to make NUMA go away, in the sense of making it so the developer doesn’t need to think about it. This is done by the OS offering NUMA policies, which control how memory allocations are handled with regard to NUMA – local node, striping across all nodes, etc. Critically, when a page has been paged out and then is paged back in, the page is normally expected to be able to change which NUMA node it is in (although it might well not do so).

Windows, which went for a more “here are the controls, do the right thing” approach, is more like C. The developer has to handle the matter.

The library supports bare metal platforms so it does not perform memory allocation; rather, the user passes memory in. The same has to be true for the test and benchmark application, so it can be run on bare metal platforms.

So the user allocates memory and passes it in.

But what happens about shared memory, for the position independent data structures?

THe user allocates shared memory, rather than normal memory, and passes it in, and the child test processes when they run open the shared memory and use it.

So that’s okay.

What happens with NUMA?

The user allocates equal memory on each NUMA node and passes it all in.

There’s a function for this in Windows and Linux, so that’s okay for Windows, but what about Linux moving pages between NUMA nodes on paging-in? the only way to stop this is to pin a memory page, so it cannot be paged out.

So, okay, I can do this for the tests and benchmarks.

What about shared memory with NUMA?

Well, obviously now I would need to allocate equal blocks of shared memory on each NUMA node and pass them in.

Oh. Problems.

On Windows it’s fine – there’s a function to allocate shared memory on a specific NUMA node.

On Linux, there is no such function. Shared memory is placed on NUMA nodes just as non-shared memory, according to the NUMA policy.

I think I might be able to change the NUMA policy just before creation of the shared memory to use and only use a singe NUMA node, the one I want to use; but shared memory like all allocations is really allocated on faulting, so doing this doesn’t *do* anything.

I suspect what I need to do is change NUMA policy, create shared memory, pin the memory, then fault every page, then revert NUMA policy.

(Another way, says SO, is to create, then move the pages to the desired NUMA node.)

Obviously, this all feels wrong.

Am I doing the wrong thing?

Should I just suck it up and let Linux do what it want to do?

One issue here is comparing like with like.

Actually it raises the question of what is like with like?

If I run the benchmarks on Windows, with low-level NUMA control, and then I run them on Linux, with the same low-level NUMA control, I have like with like.

But if on Linux users are simply using NUMA policy, then I’m coming apples and oranges… …except if Linux *is* normally like this, then it really is what you normally get, and so that *is* what you have to compare against.

Update

Just finished moving the test and benchmark library over to the single threaded data structures.

Brexit and the UK press

There’s quite a lot of rabid insanity in the UK press about Brexit.

It’s the kind of debate where holding opposing views means you attack the *person*, rather than debate the views themselves.

One particularly rabid newspaper has been The Express, a mass-market, populist tabloid which is one step above the newspapers which print topless women.

A headline today was – “UK weather: Heatwave to last until OCTOBER – Portugal hot temperatures spread to BRITAIN”

And I thought, mockingly, “ah, another reason to leave Europe!” 🙂

Update

Made the test and benchmark library compile.

Both liblfds and libtest_and_benchmark (libtab for short) have subsets of libstds (library single threaded data structures) in.

Originally and wrongly I was expanding the subset in liblfds, where those data structures were being used by libtab. Now I have only the data structures needed by liblfds in liblfds, and only those needed by libtab in libtab.

I now need to move libtab fully away from using liblfds to using libstds.

It was a blunder to have used liblfds, because liblfds provides data structures to the extent you have atomic support, which means you might not have a list, for example – but libtab uses the list everywhere.

Actually maintaining this portability behaviour in the code is a lot of work. If I just assumed x64-level atomics, the portability code would go away. In a sense it matters, because the portability code right now is untested. I do not – and I will need to – build variants which pretend to have less support. With software if it’s not tested, it doesn’t work.

Calibre is brain damaged

I added a few PDFs to Calibre and put them in my e-reader to see if they rendered okay.

They did not.

I put the e-reader back in and in Calibre selected the PDFs and selected “Remove matching books from Device”.

This command permanently and without warning deletes the PDFs from your hard disk.

But it’s worse than that.

What I have now also found is that it does NOT remove them from your e-reader, and you now *cannot* remove them from your e-reader because Calibre will not show the ebooks because they are not in the Calibre library.

I mean, on one hand, I’ve known and always known Calibre was insane, because of it’s UI. This behavious is *not* out of keeping with how the program presents itself.

Light’s Hope suck too

I’ve played for a long time on private vanilla WoW servers.

I was playing on Lightshope, but the server as I think happens with vanilla as it becomes older has more and more of a problem with griefing, and it became impractical to do anything except instances and walk around in non-PvP zones, and then a forum mod was an idiot (they all are in the end – they’re just normal people, and normal people have no clue how to manage responsibility) and that put me off more, so I ended up finding I just wasn’t playing.

But now I’ve finished my contract work and I felt like ambling around a bit on the new server, since griefing won’t be such a problem there.

I go to log in, but my account has been suspended.

That’s happened before – LH do it now and then for mysterious internal reasons.

Ltas time I posted on the forum, but the forum is down, because it was hacked some weeks ago.

I go the web-site, and look for what to do.

Page says : read this page (helpfully links to itself) and check your ban status page on your account (provided a link).

That link 404s.

Okay, so I try to log into my account and then look for a status page.

Username, fine. Password, hmm. Not sure if it’s the same as my WoW login password, or a longer version of it… but the problem is there’s a Google captcha.

They’re fairly broken. If you really need to complete it, you can, but not for something like multiple password attempts.

Okay, so I need to contact LH.

What’s possible?

Github, Reddit, Discord.

Github requires an account, and they silently don’t email you the confirmation email if you use mailinator.

Reddit registration doesn’t work in Pale Moon browser. This will be because the site doens’t target HTML, but instead targets browsers, which means Chrome.

Discord is the worst of them all. Part of what it is is IRC. You can log in with just a username, you’re presented with the most complex graphical UI I’ve ever seen. It’s like 1980 has returned. Now that’s one problem but here’s the real problem – without an account, you can’t write anything BUT IT DOESN’T TELL YOU THIS. You just sit there, looking at this bizzarely complex UI, trying to figure out which part of it is for typing, until eventually you realise *IT’S NOT THERE*.

I then tried to direct message a mod, to be told the message was blocked because of one of a list of possible reasons.

So…

No go.

LH, you’re idiots.

Try using your own systems.

Amazon suck too

Amazon normally are okay, but occasionally they fuck it up.

I added a new address.

I entered the name, and phone number – they come first – then changed the country to Spain.

This wipes and reloads the form.

Fucking BRAIN SURGEONS.

So I do it again, I hit submit, was asked to enter my password.

I did, which took me back to a now empty new address form.

Checked, and indeed, the address has not been added.

Then spent 10 minutes trying to find a way – ANY FUCKING WAY – to let Amazon know.

Eventually gave up and used the “suspicious email” option to email them.

Srsly, Amazon.

Web-site bugs fine, they happen.

Making it fucking impossible for people to contact you?

All large corporations do whatever they can to keep end-users as far away as possible.

Finally sorted out all this shit – why does every fucking order online take at least 30 minutes of frustration – and lo and behold, half the order can’t be sent to Spain.