Cute little problem

Running into a cute little problem, to do with initializing the test and benchmark programme.

The test and benchmark code is in a library – there’s a command line wrapper provided for convenience. The code is in a library so users on embedded platforms and the like can use the test and benchmark functionality.

The code in the library performs no allocations – the user passes in memory. The user could after all be an embedded platform with no malloc and just global arrays.

The library code is complex enough there needs to be some flexibility in memory allocation, so the user provided memory is the store which is placed into an API and that API offers malloc() like behaviour.

The test and benchmark code, being NUMA aware, needs an allocation on each NUMA node.

Asking the user to do the work to figure out his NUMA arrangement is quite onerous, though – and in fact the porting layer already has this code in it.

So what we really want it to get the topology info from the porting layer.

To do this though we need… some store from the user.

So it kinda looks like first we make a topology instance, and then the user uses this to find out about his NUMA layout and make allocs.

To make a topology instance though the user needs to know how much store to allocate – and that’s the cute little problem.

How do you write code which can either work and do its job, *or*, tell you how many bytes of store it will need to do its job?

Now if the function is “flat”, in that it needs no allocations to *find out* how much store it needs, then it’s straightfoward enough.

However, if the function is “deep”, and it needs allocations, as it goes along, to find out how much store it needs, then life is more complicated – in fact, what must happen is that the user calls the function repeatedly, until he gets the same result twice, and the user needs to pass in each time as much store as the function asked for the time before.

There are Windows functions like this.

Problem is… now it’s quite a bit of extra work, and I’m not sure I’m *getting* very much for all this.

Update

I’ve had some time to do a bit of work.

The library now compiles again.

I want to move over from using an array for pointer-and-counter to using a struct, but that’s a fundamental change, so I need to get the new test suite up and running.

So now I’m working to get the new test and benchmark programme working.

For now I think I’m only going to release single-segment position independence.

There’s a single multi-segment data structure, for the nodeallocate freelist; I’ll keep that, so I can benchmark it, to get a first sense of how much overhead multi-segment brings.

A thought

With position independent data structures, I have to use an offset.

Maximum convenience is when the offset is from the data structure state structure – but this implies negative offsets can occur, which means I have a problem becasuse I can’t address the entire address range.

Well – if we say all pointers used will be to 4 or 8 bytes offsets, then we have two or three bits free at the LS bit end of things.

When we do pointer math we get a ptrdiff_t, like it or not.

What we could do is cast the pointers to int long long unsigned (or anyway the unsigned int same length as a pointer), figure out the difference and if it’s negative or positive and use one of those free LS bits as the sign bit.

That way the *normal* data structure can support the full address range, using offsets, and so is used in the normal case and is also sound for use with single segment relocations.

(Multi-segment still needs all the extra work with storing multiple base addresses.)

This is all fly-by-night, of course. Totally outside of the Standard.

The problem with this – user data is a void pointer for key and a void pointer for value.

These are both converted to an offset.

They could point to chars.

They could also be set directly. Being converted to an offset doesn’t hurt, you’re converted back afterwards, but having the LS bit used for something else will.

Ack. Just realised, direct setting is a problem for the normal signed offset variant. If the user sets a value directly, and the value is far enough away from the address of the data structure state, signed overflow will occur. Implementation defined is not a place I want to be. Need a different macro for setting direct values.

Update

I was wrong about using unsigned offsets.

The result of pointer math will always be a ptrdiff_t, which is to say signed.

I can’t *get* an unsigned diff… well, mmm, unless I cast the pointers to unsigned ints first…?

I’m starting to worry that I’m pushing the limits here though.

In other news, where the offset/version combo for an offset has to be signed for the offset and unsigned for the version, an array of lfds720_pal_uint_t doesn’t work. I can take the address of the first element, cast it to a pointer to ptrdiff_t and then deref, but actually, having gone over the struct packing stuff again to be sure, I think I will actually be fine using a struct. I didn’t want to go that way, I felt it was opening a window for compiles to get things wrong, but I read a very nice article about it and now I feel happy.

This is the article;

http://www.catb.org/esr/structure-packing/

“The interesting news is that NTP has apparently being getting away with this for decades across a very wide span of hardware, operating systems, and compilers, including not just Unixes but under Windows variants as well. This suggests that platforms with padding rules other than self-alignment are either nonexistent or confined to such specialized niches that they’re never either NTP servers or clients.”

Anyways, the structures are nicer than the two element array – it’s clearer what’s going on.

Update / I’m sad

I am sad.

This is because I think I need three variants of every data structure.

The first is the normal variant – pointers are what they are, absolute values (virtual or physical as they may be).

The second is the single segment variant – pointers are offsets, and all data must be within a contiguous range which is half the address range; the only way the user can guarantee this is to make a single contiguous allocation. So here we’re looking at shared memory, or buffers passed down into or up from the kernel, that sort of thing.

The third is the multi segment variant – pointers are offsets, and we keep track of the base address of each segment, and that has to be done on a per-address range basis (so each process needs to register each new block of shared memory, to indicate what the base address is in its virtual address range). However, since all offsets are positive, we can use unsigned (and we have to anyway, since we borrow some MS bits for a segment number), and so we support the entire address range. Downside is the per-address range state, which the user has to set up and add segments to when they are made, and the overheads (array scan) in the data structure to figure out which segment a given pointer belongs to. Oh also actually since we borrow MS bits for the segment number, the more segments you want to support, the smaller the maximum address range for each segment. Again, this means users have to allocate blocks.

The third variant supports what the second variant does, but the third variant I think it primarily for NUMA on Windows – not for everyone and their dog – and the second variant is *so* much easier to use and the single segment use case is going to be common, shared memory between processes, so it’s worth having that variant.

Update

I finally had two free days.

I’ve been working on position independence.

Been backwards and forwards a bit.

Currently thinking to make all normal data structures work using offsets, which means they support a single address range regardless of absolute address values, as long as the relative positions of all data structure states, elements and user data are the same.

There can then be an extra variant for each data structure which supports multiple segments.

The one (and big) problem with this is that if I make the API nice, and use offsets relative to the data struture state, I have to support negative offsets, which means on 32 bit machines all data needs to be within 2 GB of the data structure state. That is by no means guaranteed, and is outside of the control of the user. Actually the same problem exists on 64 bit machines – if we imagine a 64 bit virtual adress range (it’s not 64 bit in practise, but this doesn’t change things) we still only have 63 bits, so all data still needs to be within half the address range.

If the API is not quite so nice, and you specify a base address and so all offsets are positive, this problem goes away. Note though reading/writing data structure element user data is in the nice API relative to the data structure element, so in the not quite so nice API, we have somehow to know the base pointer, which means each element has to point to the state structure (which has a copy), and this is a significant extra cost in terms of memory access.

C versions

So, I was working on the position independence stuff and thinking about how I’ve currently implemented multi-segment support – I’m comparing pointers in different allocations. This is not legit, but normally it works… which doesn’t fill me with joy.

While looking over how this can fail (I was thinking about memory range address extensions on 32-bit platforms) I came across some posts about the type of the result of subtracting a void pointer from a void pointer.

It’s a ptrdiff_t, of course…

…except it’s not.

Standard C does not support subtracting void pointers from void pointers, because the type is unknown.

GCC lets you do this as an extention – treats the pointers as pointers to unsigned char.

I then after a bit realised I was getting a -lot- of extras from using -std=gnu89 rather than -std=c89, which I’d done just to get C++ style comments.

So wrote a script, converted all C++ style comments to C comments, and changed to c99 with -pedantic-warnings.

Keeeeeeeeeerunch.

“Type long long not supported”.

Bloody right too – “long long” didn’t turn up in the Standard until C99.

It’s a GCC extension prior to C99.

My problem is I’ve had to support Windows, and MS stopped supporting C decades ago, so they’re still C89 – and in fact I’ve been using a platform extension there too. Until now I’ve not worried very much about the later C versions – I’ve got the specs, I need to sit down and read them through. I’d rather have a good book than the spec though – I originally learned C from “The C Book”, which is excellent. (But now I’ve got some time, yes, I’ll finally read through them all. It’s a shame in a way – I’d like to only have to know one version of C, but because of Windows, I need to hang on to remembering C89 – this is actually why I’ve not moved on yet.)

So anyways embarrassement aside, C99 is only really properly supported until GCC 4.5.0.

This is a bit of an interesting problem.

GCC doesn’t really start to compile for non x86/x86_64 platforms until about 4.5.4 (and 4.7.3 for ARM), so that’s okay.

However, I have GCC 4.20, 4.2.2 and 4.2.4 compiled for x86_64 – so if I wanted to support them, I’d need to use gnu89.

So…

All in all, looks like I need to use -std=c99.

This is fine for non-Windows platforms – it will enable long long, and I won’t use any other functionality, so I can still compile on Windows (since the porting layer defines will mean I use the platform extention on Windows for 64 bit types).

Ironically, of course, this means the work I did to replace C++ style comments with C style comments is now not necessary =-)

Update

Been implementing multi-segment position indepndence.

Originally, I was thinking for each data structure instance would have its own list of segment base addresses. I realised while implementing in fact the list of segments could often for a process be the same across all data structure instances – I’m thinking there will normally be only a few shared memory segments, and they will contain all shared data, which means all the data for many data structures and their elements. As such then the multi-segment code needed to be in a little API of its own, so the same instance of segment base address information can be used with all data structure instances.

Multi-segment position independent data structures

After a month of being a temporary full-time house-husband, I’m actually doing a bit of coding.

Writing up the first multi-segment position-independent data structure, just a freelist, simplest thing there is so I can go through the multi-segment stuff for the first time.

In the end there will be a single-segment and multi-segment position independent variation of every data structure.

Update

Just finished a grand renaming.

Not happy with it really.

Have a better idea now.

One thing from this though – I did the work manually.

This does not work – it does not scale.

When you code base is large enough, you must have tools to make changes across the code-base.

So now I write a little script which will let me rename APIs.