C versions

So, I was working on the position independence stuff and thinking about how I’ve currently implemented multi-segment support – I’m comparing pointers in different allocations. This is not legit, but normally it works… which doesn’t fill me with joy.

While looking over how this can fail (I was thinking about memory range address extensions on 32-bit platforms) I came across some posts about the type of the result of subtracting a void pointer from a void pointer.

It’s a ptrdiff_t, of course…

…except it’s not.

Standard C does not support subtracting void pointers from void pointers, because the type is unknown.

GCC lets you do this as an extention – treats the pointers as pointers to unsigned char.

I then after a bit realised I was getting a -lot- of extras from using -std=gnu89 rather than -std=c89, which I’d done just to get C++ style comments.

So wrote a script, converted all C++ style comments to C comments, and changed to c99 with -pedantic-warnings.


“Type long long not supported”.

Bloody right too – “long long” didn’t turn up in the Standard until C99.

It’s a GCC extension prior to C99.

My problem is I’ve had to support Windows, and MS stopped supporting C decades ago, so they’re still C89 – and in fact I’ve been using a platform extension there too. Until now I’ve not worried very much about the later C versions – I’ve got the specs, I need to sit down and read them through. I’d rather have a good book than the spec though – I originally learned C from “The C Book”, which is excellent. (But now I’ve got some time, yes, I’ll finally read through them all. It’s a shame in a way – I’d like to only have to know one version of C, but because of Windows, I need to hang on to remembering C89 – this is actually why I’ve not moved on yet.)

So anyways embarrassement aside, C99 is only really properly supported until GCC 4.5.0.

This is a bit of an interesting problem.

GCC doesn’t really start to compile for non x86/x86_64 platforms until about 4.5.4 (and 4.7.3 for ARM), so that’s okay.

However, I have GCC 4.20, 4.2.2 and 4.2.4 compiled for x86_64 – so if I wanted to support them, I’d need to use gnu89.


All in all, looks like I need to use -std=c99.

This is fine for non-Windows platforms – it will enable long long, and I won’t use any other functionality, so I can still compile on Windows (since the porting layer defines will mean I use the platform extention on Windows for 64 bit types).

Ironically, of course, this means the work I did to replace C++ style comments with C style comments is now not necessary =-)


Been implementing multi-segment position indepndence.

Originally, I was thinking for each data structure instance would have its own list of segment base addresses. I realised while implementing in fact the list of segments could often for a process be the same across all data structure instances – I’m thinking there will normally be only a few shared memory segments, and they will contain all shared data, which means all the data for many data structures and their elements. As such then the multi-segment code needed to be in a little API of its own, so the same instance of segment base address information can be used with all data structure instances.

Multi-segment position independent data structures

After a month of being a temporary full-time house-husband, I’m actually doing a bit of coding.

Writing up the first multi-segment position-independent data structure, just a freelist, simplest thing there is so I can go through the multi-segment stuff for the first time.

In the end there will be a single-segment and multi-segment position independent variation of every data structure.


Just finished a grand renaming.

Not happy with it really.

Have a better idea now.

One thing from this though – I did the work manually.

This does not work – it does not scale.

When you code base is large enough, you must have tools to make changes across the code-base.

So now I write a little script which will let me rename APIs.

How not to validate data

Trying to make an account on a Russian site, to order a book.

Need to enter name, email address, password.

I enter my name – in Latin.

“Enter your real name”.

I translate to Cyrillic, now we’re okay.

I guess they just don’t want all the extra business that comes from overseas. Damn all that extra profit!

I enter my email address – in Latin – this is fine.

I enter a password.

“Your password contains a link or email address.”

Actually, what it contains which they don’t like are spaces.

I remove the spaces, and get a green tick – password okay.

I hit submit.

“Your password is more than 20 characters.”

The only reason I’m even trying with these jokers is that this is an out of print book for a birthday present, which I already obtained once – but which is no longer available from that seller – and which the Royal Mail in the UK lost the instant I posted it with them; it didn’t even make it out of the post office.

Turns out when I first tried to submit, and was told the password was too long, they sent me a registration email – but with a blank password. When I tried again with a password shorter than 20 characters, they again sent me a registration email, *with the pasword in the clear*. At this point, I am bailing – I mean, I would make a burner credit card for them, with a balance in roubles equal to the bill, so I’m safe enough, but this is beyond the pale.

As an aside, yesterday, I was configuring the router in my friends place (where I am now) and I turned off WPS.

This made the 5 GHz network disappear.

Really disappear, not just no SSID.

No software works.

Software is the single most unreliable device that exists in the world, except for the Royal Mail.

More NUMA / shared memory thoughts

Spent the day thinking over shared memory and NUMA.

Supporting a single segment of shared memory is smooth and graceful. It looks good in the API, is simple and easy to understand for the user.

Multiple segments is messy. The user needs to provide per-process state, and to register each segment in each process, before it can be used. Most significant bits have to be taken from the offset value, to indicate which segment the offset is from. When the user passes in a pointer, a lookup has to occur to figure out which segment that pointer is from.

There is a reason to use multiple segments in Linux.

This is that memory policy is on a per-process basis, not per-data structure.

So if I go striped, fine, I can allocate one shared memory block and it’ll be striped on a page basis.

But what if I want striped for one data structure, but something else for another?

There is only one policy, and it is enforced when pages are swapped back in, so you can’t set it, do stuff, and then change it : whatever you have set *now* is what gradually comes to be applied, as pages swap in and out.

In fact this is a problem anyway : if I do have multiple shared memory segments, one per NUMA node, and I’m so controlling my NUMA directly, and striping on a per-entity bais – memory policy will mess it up for me by applying itself to my allocations.

So there is only one memory policy and it applies to everything in your process, like it or not. You’re fucked anyway. Multiple segments will not save you, unless you pin the pages so they can’t swap, which isn’t a reasonable thing to ask.

So on Linux, multple shared memory segments are not useful, because memory policy stops you from controlling your own NUMA anyway.

On Windows, you do need multiple shared memory segments because the OS does not control NUMA. You do it yourself. So if you want to spread an allocation over multiple NUMA nodes, you need to manually allocate on each of them and then put those elements into the data structure.

Multiple shared memory segments, NUMA, Linux and Windows

I bin learning fings, Oi have.

With position independent (i.e. or maybe e.g. shared memory) data structures;

On Linux you do not need support for multple shared memory segments *as far as NUMA is concerned*.

This is obvious really – you just turn on striping.

You do need support for multiple shared memory segments *just because*, i.e. the user may want this for whatever reason.

On Windows, you *do* need support for multiple shared memory segments *as far as NUMA is concerned*, to perform striping manually, which is how you have to do it under Windows.

You also need it for itself, as on Linux.

Shared memory

Position independent data structures support shared memory (i.e. differing virtual address ranges) by using offsets from a known base rather than full virtul addresses.

So far I’ve only supported s single shared memory segment, so all data used has to be in that one segment. The offset is from the data structure state.

This is obviously a problem with NUMA.

With NUMA, you might well want to have a shared memory segment in every NUMA node.

This means in general multiple shared memory segments, which means mutiple offsets, which means when you are manipulating elements in the data structure and so working with offsets, knowing which shared memory segment a given offset is from, so you can know its base.

Central to almost all data structures is the atomic compare-and-swap (CAS).

If we have one segment only, we can compare the offsets across all the different virtual memory ranges and we will know we’re comparing the same data structure element.

If we have multiple segments, we can have the same offset but in different segements. Somehow we have to know, in the CAS, which segment an offset belongs to.

The only way I can see to do this is to borrow some most significant bits.

On 64-bit platforms this should be fine.

If we borrow say 8 bits, we can have 256 shared memory segments, and we have 56 bits remaining for the offset.

On 32-bit platforms it barely works.

If we borrow just 4 bits, and so can have 16 shared memory segments, we have 28 bits left over for the offset – which is 512mb.

It also means we have at times to do a lookup, in the data structure; we have an array, and here we store the base addresses of the different segements, and we look them up when we need to convert the offset to a full virtual address (which we do when we pass elements back to the user, i.e. after a dequeue() or pop()).

Position independence without NUMA is basically a fail, so I think this has to happen.

Shared memory and NUMA

I’ve been thinking about shared memory and NUMA.

Windows always does things differently to Linux, which is usually bad, because Linux usually gets it right or pretty much right.

I think Linux made a bad job of NUMA. Linux tries to make NUMA go away, in the sense of making it so the developer doesn’t need to think about it. This is done by the OS offering NUMA policies, which control how memory allocations are handled with regard to NUMA – local node, striping across all nodes, etc. Critically, when a page has been paged out and then is paged back in, the page is normally expected to be able to change which NUMA node it is in (although it might well not do so).

Windows, which went for a more “here are the controls, do the right thing” approach, is more like C. The developer has to handle the matter.

The library supports bare metal platforms so it does not perform memory allocation; rather, the user passes memory in. The same has to be true for the test and benchmark application, so it can be run on bare metal platforms.

So the user allocates memory and passes it in.

But what happens about shared memory, for the position independent data structures?

THe user allocates shared memory, rather than normal memory, and passes it in, and the child test processes when they run open the shared memory and use it.

So that’s okay.

What happens with NUMA?

The user allocates equal memory on each NUMA node and passes it all in.

There’s a function for this in Windows and Linux, so that’s okay for Windows, but what about Linux moving pages between NUMA nodes on paging-in? the only way to stop this is to pin a memory page, so it cannot be paged out.

So, okay, I can do this for the tests and benchmarks.

What about shared memory with NUMA?

Well, obviously now I would need to allocate equal blocks of shared memory on each NUMA node and pass them in.

Oh. Problems.

On Windows it’s fine – there’s a function to allocate shared memory on a specific NUMA node.

On Linux, there is no such function. Shared memory is placed on NUMA nodes just as non-shared memory, according to the NUMA policy.

I think I might be able to change the NUMA policy just before creation of the shared memory to use and only use a singe NUMA node, the one I want to use; but shared memory like all allocations is really allocated on faulting, so doing this doesn’t *do* anything.

I suspect what I need to do is change NUMA policy, create shared memory, pin the memory, then fault every page, then revert NUMA policy.

(Another way, says SO, is to create, then move the pages to the desired NUMA node.)

Obviously, this all feels wrong.

Am I doing the wrong thing?

Should I just suck it up and let Linux do what it want to do?

One issue here is comparing like with like.

Actually it raises the question of what is like with like?

If I run the benchmarks on Windows, with low-level NUMA control, and then I run them on Linux, with the same low-level NUMA control, I have like with like.

But if on Linux users are simply using NUMA policy, then I’m coming apples and oranges… …except if Linux *is* normally like this, then it really is what you normally get, and so that *is* what you have to compare against.


Just finished moving the test and benchmark library over to the single threaded data structures.