freelist elimination array design flaw

So!

I’ve been working on the build system.

I uncovered one subtle bug (wrong type used) in the test library, fixed that.

Then I found that the benchmark for the freelist with elimination layer was crashing consistently with two threads on aarch64 for liblfds 7.1.2 (the in-development version of 7.1.1, where the liblfds core library is present in the benchmark library of 7.2.0, so it can be benchmarked and compared against other versions of liblfds).

I’ve been looking into this for two days.

I think I’ve found a design flaw in the elimination array. I’m still thinking it over, but I think so.

In the current design, you specify when initializing the freelist how many cache lines worth of elimination array you are providing – recommendation is on the order of one per thread you expect to be using the freelist (where the cache lines are used to store freelist element pointers).

You will then push elements onto the freelist. I think often it’s necessary to know how many elements are available – so you might push 32 elements, and you *know* you have 32 elements, and it might be for your use case you know you’ll *never* use more than 32, so you can then know the freelist will never be empty.

However, the elimination array is present.

The way it works (in 7.1.x – it’s a bit different in 7.2.0, I’ll come to that) is that a cache line is randomly chose (using a decent PRNG) and its scanned – if you want to push, you’re looking for a NULL, if you want to pop, you’re looking for a pointer.

Now, the first problem is the use of the PRNG (rather than round robin, which is how it works in 7.2.0). The PRNG produces a random value, where the number of elimination array lines is always a power of two, so we subtract one from the number of array lines and binary and that with the random value, to select the array line to use.

Of course, being a random number generator, there’s always a bit of difference in the number of each number produced – the percentage is constant, but the absolute value represented by that percentage rises as the number of operations increases. So if we have say 100 pops and 2 cache lines, we might find we have 49 on cache line 0 and 51 on cache line 1. After 100,000 pops we might find it’s 49,800 on cache line 0 and 51,200 on cache line 1.

The problem is that the number of elements in the cache lines is very small – usually eight.

This means that fairly quickly one cache line is full and the other is empty.

That breaks the elimination layer function *anyway*, but it actually then – after much crossing of eyes and scratching of head – led on to reveal a second issue.

So, consider the user wants to put 32 elements in the freelist – and rely on there being 32 elements in there. However, we have the elimination array. The freelist selects one line, scans it, if no luck it pops from the freelist proper.

Imagine we have 32 elements, but we have say oh 20 cache lines, each with eight pointers. We could imagine all the elements being popped, then pushed back to the array, but leaving some array lines empty – and then we try to pop, we pick an empty array line, then we go to the freelist proper – and it’s empty. Embarassed looks all round.

To cope with this, the user is expected to push in an extra number of elements – enough to fill the elimination array completely *but minus one cache line*, the reasoning being the worst possible arrangment of these extra elements leave all caches lines full except for one, we pick that one, miss, and then go to the freelist proper – so it all works.

It doesn’t work though.

Imagine the above scenary. Say there’s four cache lines, three are full, one is empty. There are say four elements on the freelist proper. We want to pop, we select the empty cache line, we begin to scan it – nothing so far… we’re halfway down the scan *and the other threads pop the elements from the freelist, and push them into the array line *BEHIND* where we are now*.

We think the array is empty, we go to pop from the freelist proper, embarrassed looks all round.

I half-think the only solution to this is that elimination array begins fully populated.