liblfds.org blog

blog

2015-12-01

Pulling on a thread (thread - see? geddit? see what I did there?)

Sometimes, the only thing to say is this : gaaaaaaaaaaaaaaaaaaaaaahhhhhhhhhhh!!!

So, I had an innocent little thing on my small to-do list; “lfds700_list_asu_get_start_and_then_next() should be a macro, not a function”.

See, the lists have this convenience function, for iterating over the list. It takes the list state and a pointer to a list element, which is initialized before calling the function to NULL. If the pointer to the list element is NULL, then it is set to point to the first element in the list, otherwise, it is moved to point to the element after itself.

So you use it like this;

struct lfds700_asu_element *le = NULL;

while( lfds700_list_asu_get_start_and_then_next(ls, &le) ) { // TRD : do work }

The problem of course is you’re making a function call per iteration, which is totally not okay.

Now, there exist already a pair of macros, LFDS700_LIST_ASU_GET_START and LFDS700_LIST_ASU_GET_NEXT. That function uses them.

Now, given the expected usage, inside a while loop, we cannot in the macro version of the function use curley braces.

Problem is, those two macros, they both issue a load barrier - and that load barrier, on some platforms, is an atomic exchange - and that means we need some store, two temporary variables, and to get them… we’re using curley braces.

In fact this work touches upon this larger issue - I would like to be able to use all these macros inside while() brackets.

Exchange is not the only atomic operation which currently needs store - given the normalized form I’ve selected for CAS and DCAS, some platforms need store for those ops as well.

The only way I can see to get round this is to have these macros as inline functions.

That means changing the liblfds header files to be C files, and #including them rather than the header files.

2015-12-02

C ternary operator ‘strangeness’

The code below does not compile; “the left side of the assignment operator must be an l-value” (this for the right-most assignment operator).

int cr = 1; cr == 5 ? cr = 2 : cr = 3;

The code below does compile;

int cr = 1; cr == 5 ? (cr = 2) : (cr = 3);

This is true for both MSVC and GCC.

Turns out to be an apparently common compiler issue.

http://en.cppreference.com/w/c/language/operator_precedence

However, many C compilers use non-standard expression grammar where ?: is designated higher precedence than =, which parses that expression as e = ( ((a < d) ? (a++) : a) = d ), which then fails to compile due to semantic constraints: ?: is never lvalue and = requires a modifiable lvalue on the left. This is the table presented on this page.

API changes

One of the to-do’s was to reorder the hash API arguments.

This has pulled on a mental thread and led me to remove all the key and value arguments, where-ever they’re present.

So, for example, before we might have had;

enum lfds700_btree_au_link_result lfds700_btree_au_link_element( struct lfds700_btree_au_state baus, struct lfds700_btree_au_element baue, void key, void value, int (key_compare_function)(void new_key, void existing_key, void user_state), struct lfds700_btree_au_element **existing_baue, struct lfds700_liblfds_prng_state *ps );

The key and value arguments are convenience arguments - the function sets them into *baue. However, the user can do this - and there are prototypes where the presence of those key and value arguments makes the prototypes ungainly and indeed more complicated.

So now this has been removed, and all the functions now simply deal with data structure elements, pointers to them, and the user uses macros to get/set keys and values.

Update

Removed key/value from prototypes.

Since most of the data structures are now key/value, users must possess the capability to set value - and in fact, where key is fixed but value is not, value changes must be atomically written (i.e. atomic exchange) or they cannot be guaranteed to be visible to other threads.

This means the SET macros must call atomic exchange, and atomic exchange right now, as with all the other atomic macros, uses curley braces… i.e. there’s a compile error.

In fact, if we think about porting, and the use of asm with GCC, we see that asm is block-level entity; it cannot be used inside the conditional part of say a while.

So in fact those lovely atomic macros all have to go back to being inline functions, which means depending on the compiler to have an inline keyword.

I’m not very happy about this.

2015-12-04

Update

I’ve done all but one of the to-do list of code changes. There are some build config changes to do, but I can’t do them until the code changes are done.

Of the data structures which allow random access to elements (the lists, the btree) they now support the user setting value at any time. The others only support setting value at the time the data structure element enters the data structure.

As such, the former sets atomically (and issues a load barrier when getting a value) and the latter sets non-atomically (but with a store barrier), where the act of linking atomically to the data structure publishes value before the data structure element enters the data structure.

So that’s all good and well and makes sense and is done.

I’ve modified the macros so that they generally take one argument and ‘return’ a vaalue - it seems more intuitive to me, this way, for new readers.

I’ve updated the test programme, it again now compiles.

The atomic operations are all still macros, and with curley braces. What I’ve done for now (to compile) is accepted that LFDS700_PAL_ATOMIC_EXCHANGE uses curley braces and so other macros which use them (which is to say, SET_VALUE for the lists or btree) cannot be used inline.

I’m still thinking what to do about this - because there is another issue which is tied up with this, and that is the btree code for navigating a tree.

In the lists (the other random(ish) access data structure) the code for navigating the data structure (GET_NEXT, etc) exists as macros. After all, all we’re doing is accessing a pointer in a structure. The idea of making a function call to do this is nutso.

The btree code though for navigation is much more complex - it has to deal with a wide range of cases and often contains while() loops (“get smallest element”, etc). I want to be able to put this code in the conditional clause of a while() loop (i.e. get_by_position_and_then_directon()) but you can’t put a while() inside the conditional clause of a while() - which means you cannot use macros - which means you MUST write functions and you MUST use inlining.

There’s no getting away from it.

Musing

So, I’ve kept the btree navigation code as functions. They actually usually do quite a lot of work, so it’s not so bad.

What I’m thinking about now is, well, so, the lists and btree store keys and values. You can’t change the key, but you can change the value. That’s fine.

What about the non-random(ish) access data structures? the ringbuffer, the queue, the stack, etc - should they store only values, or keys as well? they won’t use the keys, but what I’m thinking is that in an application, a user may have a structure which is being passed about through many data structures and it would be very convenient to be able to preserve the key even when it passes through queues and ringbuffers and the like.

It’s fairly unobtrusive, because in almost all cases user are passing in to data structure functions only a pointer to a data structure element, which they prepped beforehand - so can set a key if they want, but they don’t have to - so not setting a key simply means… nothing at all.

This is not the case for the ringbuffer though - users do not see ringbuffer elements, they’re always internal to the data structure; they pass in arguments (value, and then would also be key, which would always be NULL when not used).

There’s also the overhead of copying the key around, but it’ll be on the same cache line as the value, so it’ll basically come from free.

99.9% code complete

Wheush!

I am just about code complete.

There’s one bit of code rearrangement I need to do, dependency stuff with structures, I’ve a hack in place right now - and that’s it. It!

Now I have to bring all the build configuration up to date and make test pass in debug and release on all platforms. After that, the docs need to be done. Then it’s release time.

Oh. I’d like also to add a mailing list to the site, but they’re impossible to set up.

tests pass on debug and release on 32 bit ARM

pi@raspberrypi /tmp/temp/liblfds/liblfds7.0.0/test/build/linux_usermode_gcc_and_gnumake $ ../../bin/test -v
test 7.0.0 (Release) (Dec  5 2015 01:33:39)
liblfds 7.0.0 (Release, Linux (user-mode), ARM (32-bit), GCC >= 4.7.3) (Dec  5 2015 01:32:48)
pi@raspberrypi /tmp/temp/liblfds/liblfds7.0.0/test/build/linux_usermode_gcc_and_gnumake $ ../../bin/test -r

Test Iteration 01
=================

Abstraction Atomic Tests
========================
Atomic add...passed
Atomic CAS...passed
Atomic DCAS...passed
Atomic exchange...passed

Binary Tree (add-only, unbalanced) Tests
========================================
Alignment...passed
Fail and overwrite on existing key...passed
Random adds and walking (fail on existing key)...passed
Random adds and walking (overwrite on existing key)...passed

Freelist Tests
==============
Alignment...passed
Popping...passed
Pushing...passed
Pushing array...passed
Popping and pushing (5 seconds)...passed
Rapid popping and pushing (10 seconds)...passed

Hash (add-only) Tests
=====================
Fail and overwrite on existing key...passed
Random adds and get (fail on existing key)...passed
Random adds, get and iterate (overwrite on existing key)...passed
Iterate...passed

List (add-only, singly-linked) Tests
====================================
Alignment...passed
New ordered...passed
New ordered with cursor (5 seconds)...passed

List (add-only, singly-linked) Tests
====================================
Alignment...passed
New start...passed
New end...passed
New after...passed

Queue Tests
===========
Alignment...passed
Enqueuing...passed
Dequeuing...passed
Enqueuing and dequeuing (5 seconds)...passed
Rapid enqueuing and dequeuing (5 seconds)...passed

Queue (bounded, single consumer, single producer) Tests
=======================================================
Enqueuing...passed
Dequeuing...passed
Enqueuing and dequeuing (8 seconds)...passed

Ringbuffer Tests
================
Reading and writing (5 seconds)...passed

Stack Tests
===========
Alignment...passed
Popping...passed
Pushing...passed
Pushing array...passed
Popping and pushing (5 seconds)...passed
Rapid popping and pushing (5 seconds)...passed

2015-12-05

Factorized the porting abstraction layer

I realised last night that the porting abstraction layer would only work on x86, x64 and ARM.

All the other processor types which in principle work (when using GCC this is) simply were not present in the porting layer code.

So I need to implement them - but this led to an issue; the porting abstraction layer code was unravalled, to make it easy to understand. With another four or five processors, though, where every processor needs two versions (GCC < 4.7.3 and GCC >= 4.7.3) and most of the processors have 32 and 64 bit versions - it wasn’t going to fly.

The unravelled layout is from 6/6.1.1, a long time ago (after they were released) I factorized, and it turned out to come down to processor, compiler and operating system.

So I’ve gone back to this.

So, that’s fine, I need to add some boilerplate code to perform automated checking of what the user has or has not implemented, but that’s routine. The porting guide needs to be rewritten.

I’m going to do a bunch of work on the data structure docs now though, they’re stable now and they need a bunch of work done to them.

2015-12-06

Update

Have finished the second pass of the liblfds porting guide.

Now the big work; all the API pages.

2015-12-07

Update

Writing quality docs takes time.

I think I can do at most two APIs per day. There’s about ten to do…

Then there’s the test porting guide.

Then I need to update all build config.

Then I can release.

Need more get/set macros

Writing the btree docs.

Something I’ve had in the middle of my mind for a while - I’m going to need to write two different get-value and set-value macro pairs, one set which guarantees by the time it returns from the set all readers will be able to see the new value (i.e. set uses an atomic set, get issues a load barrier), and another which does not offer this guarantee (set issues a store barrier, get issues a load barrier).

Update

Let’s see…

Removed the key hash/compare function over-rides from the non-init() functions. Their use is going to be so rare they don’t carry their own weight, and people can add them very easily.

The extra set macro didn’t happen. Think about it - you can atomically store (exchange), use a store barrier, or do nothing. What does a store barrier get you? it get you ordering… BUT ONLY A PER THREAD BASIS. Writes, if/when they do go out, are in order - sure - but only with respect to writes from the same thread. Fat lot of use that is! however, I now undersstand the GCC >= 4.7.3 docs on atomic instrincs - they stuff they’re alluding to is the difference between no barriers, barriers or atomic. They’re TOTALLY TOALLY UTTERLY TOTALLY UTTERLY HUMUNGOUSLY UNCLEAR - unless you already know what they’re talking about, in which case you can figure out what they’re getting at.

Spent five hours so far today on docs, feel like I’ve got bugger all done - some new pages for btree enums, the btree query page, that bit of experimentation with a new set. Ten more APIs to go…

Update

Huh, the mediawiki is now at 499 articles :-)

So, been a full complete long day of doc writing.

Second pass done for btree, freelist, liblfds, hash and stack.

Tomoz it’ll be the lists, the queues and the ringbuffer.

Then I guess the test porting guide.

Then third pass for everything… uuhhhhh!

2015-12-10

Update

Getting on.

The liblfds API has been renamed misc, it’s more self-explanatory.

All the init() functions are back to being init_valid_on_current_logical_core().

User callbacks no longer receive a user state argument, rather, there is a macro for getting the user state value from the data structure state.

Key compare functions and hash functions no longer receive a user state argument - what they do is too small and atomic for this.

The key compare function (and the hash function) has the const qualifier now, so it has the same prototype as the qsort() callback.

The queue cleanup function has an extra argument, a flag, which is raised when the dummy element is given (there’s no other way for the user to know the key and value in that element are invalid).

The ringbuffer cleanup function has an extra argument, which indicates if the element is unread.

Docs are coming along nicely.

I expect to release inside a week now.

2015-12-11

Update

Gettin’ on.

Currently focusing on the stack docs. Making them perfect, or as perfect as they can reasonably be. They’re going to be the template for the other pages. So I’m working to get them right, really right (and so also to have made all the code base changes which come from docs) and then to iterate over all the other doc pages and produce them correctly in basically one more pass.

A lot of the time editing is actually spent waiting for the mediawiki to load and save pages - to be more efficient, you really need to cut down the number of passes.

2015-12-12

Update

Amazin’ really how wrong things can be and you don’t notice until almost the last moment.

The way I had the build directories named, and the detection of OS platforms, was wrong.

I realised because I came to make a freestanding build with GCC. The build directory really is about toolchains only - it’s about getting a build going, even if the build fails (because the porting layer is missing).

The porting layer is separate and different from the build tools, i.e. GCC and gnumake will give you build on any platforms, say an RTOS with no hosted implementation - but there’s no porting layer for that platform.

Of course I’ve always known this, but it turns out it’s not actually really what had been done.

So now the build dirs are named after the toolchain/system header requirements, and there’s a new build, “gcc_gnumake”, which uses “-ffreestanding -nodefaultlibs -nostdinc -nostdlib”, i.e. the real McCoy. The hosted implementation is almost the same, in fact - all it takes is <assert.h>. I thought seriously about dropping asserts, to simplify build, but they’re just too useful for debugging.

2015-12-14

Update

As ever, working on the docs.

Freelist, Misc, Ringbuffer and Stack are now good enough to release.

That leaves both lists, both queues and the hash.

The porting and usage guides are good enough to release, although the porting doc for test revealed the need for some work on the test abstraction layer.

The list, queue and hash docs will be completed tomorrow.

Once the final code work is done, I can then update the build configurations. There’s this… dual convergence, going on - the docs and the code - change one and you have to change the other, so doing work on either means you need to change the other. The goal is to get to the point where you need to change neither, then you can release.

new intro paragraph

I’ve written a new intro paragraph, for readers who do not know what lock-free data structures are and do. Love it :-)

Lock-free data structures are thread-safe and interrupt-safe (i.e. the same data structure instance can be safely used both inside and outside of an interrupt handler), never sleep (and so are safe for kernel use where sleeping is not permitted), operate without context switches, cannot fail (no need to handle error cases, as there are none), perform and scale literally orders of magnitude better than locking data structures, and ‘’liblfds’’ itself (as of release 7.0.0) is implemented such that it performs no allocations and compiles not just on a freestanding C implementation, but on a bare C implementation.

2015-12-16

Update

Well, yesterday was a blow-out - I’ve ODed on docs.

Today was good though, haven’t touched the docs, been working on code.

It’s been a day’s worth of tidying up - getting enum value names right, checking all the structure aligns, realised I’d blundered with atomic isolation for I still need a “double” atomic isolation on CAS architectures, normalized the API names for adding data elements to a structure (it’s now “insert”, where-ever it makes sense - still push/pop enqueue/dequeue for freelist/stack and the queues), and so much other stuff I can’t even remember - oh, changed backoff config so the number of NOPS per timeslot is hardcoded in a #define for CAS and DWCAS. Having them configurable just didn’t make sense - you can hardcode it if you’re on one platform, but chances are your software will run on a variety of different versions of an architecure, and with different clock speeds. There’s no real use in run-time configurability. Also, by being varibles in memory, it’s possible that the backoff code might need to do a TLB lookup and memory access! which blows that whole thing out of the water.

I feel I’m very close now to actually really being code complete, with nothing else in my mind that needs to be done.

2015-12-17

Ci20 / MIPS32

Just ordered a Ci20, a dual-core MIPS32 dev board, runs Linux.

I think I can boot it to Android too, which would give me some capability to compile and test for that platform.

btree API renames

Changed from the whole position/direction nonmenclature to absolute position/relative position nonmenclature .

2015-12-20

Update

So, well, time to ’fess up.

Few days ago I was looking at the queue code, and the way the ABA counter were being handled looked like it needed to be looked over. Then I realised there was something very odd going on - I was seemingly setting (and by expensive atomic adds!) ABA counters which were later overwritten with new values before added to the queue.

Then after some time thinking about this (and getting a new laptop, which I’ve spent a day or so, so far, configuring) I realised how I was handling the ABA counters in the freelist and stack was completely bonkers - I was using an atomic add on a per-state variable, but in fact no atomic add is needed at all, as you can use the counter in the top pointer, and increment it in the DWCAS. In fact, I’d really got the wrong end of the stick…

So now the freelist and stack no longer use atomic add, and in fact their elements are a bunch smaller - their next pointer is just a pointer, no counter, and none of the varibles in it are cache line/ERG aligned/padded.

I’m now looking over the queue code to make it right too.

I’ve also spent some time now reviewing all the code for proper barrier use - and here again, I had blundered, and was thinking about barrier use in not quite the right way; now I think I am getting it right, really right, and that’s led to a bunch more load barriers in the add-only code, and also shifted around where the store barriers are.

Once the queue is sorted out, I’ve one or two more minor bit of code checking to do, then it’s back to docs.

I’ve passed the “mid-December” estimate, but it’ll be out before the end of the month.

2015-12-21

Updatez

Finally have my head around the M&S queue ABA stuff.
Atomic add is now no longer needed - all the data structures get ABA from their CAS operations - so it’s been removed. Performance should improve by maybe a third…!
Reversed the arg order for CAS to GCC style (destination, compare, new_destination). Used to be MS style (destination, new_destination, compare) which is nuts.

Now I just need to think about this…

http://polyglotplayground.com/2015/04/30/Problem-with-Michael-Scott-Lock-Free-Queue/

Whoaz…

the M&S design flaw

So, for now at any rate, I’m using the PRNG to generate an initial value for a queue element next pointer counter.

The basic problem is that in the queue, elements have a next pointer, which has a counter - because it is the subject of DWCAS - unlike the stack/freelist, where the elements only have next pointers.

Because it has a counter, when an element leaves the queue, it takes that counter with it - but it is actually only valid for the queue the element just came from - but it CAN end up being mistaken for a valid value in a new queue the element in enqueued to.

2015-12-22

Possible typo/bug in the Michael and Scott queue white paper psuedo-code

Okay, so, I’m about to make quite a claim - and I do not make it lightly, and I make it in full knowledge that I am extremely likely to be completely wrong. M&S are experts in the field, and I am not.

This is the dequeue psuedo-code from their white paper.

dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
 D1:   loop              // Keep trying until Dequeue is done
 D2:      head = Q->Head         // Read Head
 D3:      tail = Q->Tail         // Read Tail
 D4:      next = head.ptr->next      // Read Head.ptr->next
 D5:      if head == Q->Head         // Are head, tail, and next consistent?
 D6:         if head.ptr == tail.ptr // Is queue empty or Tail falling behind?
 D7:            if next.ptr == NULL  // Is queue empty?
 D8:               return FALSE      // Queue is empty, couldn't dequeue
 D9:            endif
D10:            CAS(&Q->Tail, tail, )     // Tail is falling behind.  Try to advance it
D11:         else                 // No need to deal with Tail
D12:            *pvalue = next.ptr->value // Read value before CAS Otherwise, another dequeue might free the next node
D13:            if CAS(&Q->Head, head, )  // Try to swing Head to the next node
D14:               break             // Dequeue is done.  Exit loop
D15:            endif
D16:         endif
D17:      endif
D18:   endloop
D19:   free(head.ptr)            // It is safe now to free the old node
D20:   return TRUE                   // Queue was not empty, dequeue succeeded

Now, note the free on D19. The authors are making the point, as they do in the white paper, that once you’ve dequeued, you’re in the clear - you can free the node.

The typo or bug which I think I see is on line D4.

D4:      next = head.ptr->next      // Read Head.ptr->next

The code says “next = head.ptr->next”, which is using “head”, lower-case “h”. The comment says “Head.ptr->next”, which is using “Head”, upper-case “H”, where as we see on D2, “Head” (upper-case) means Q->Head.

I may be utterly wrong, but I think if the code is used, the free on D19 is broken, because it could lead the code (not the comment - the code) on D4 to access a freed node.

The problem is that “head” (lower-case) is a copy of Q->Head. The white paper states the Q->Head will always point to a valid node (as there is a dummy node in the queue) and that’s fine - but we have taken at line D2 a copy of Q->Head, and we can imagine it points a node, where that node could be by another thread dequeued and freed at D19, before our thread gets to D4 and tries to access that node’s next pointer.

The comment however looks right to me - if we read Q->Head.ptr->next, we’d still be fine, as the if() on D5 still works - but it would mean we could now safely call free on D19.

There is a matching issue in the enqueue, on E5, E6 an E7 - but here the comment and code match up.

Linux

Linux -> no docs, no error messages. Now make it work

I’ve spent the last couple of hours failing to configure Dovecot TLS.

I’ve just set up Thunderbird on my new laptop, Debian, and I’ve disabled the insecure cipher suites.

Now Thunderbird and Dovecot won’t talk, ’cause there’s no shared suites.

I mean, there ARE shared suites. There are tons of the them.

Oh yeah - and Thunderbird doesn’t show an error. It just leaves the “Connected” message showing. It’s only when you poke around, find the error console and look at it, that you discover what’s going on. Dovecot logs one line telling you no shared suites - doesn’t list anything else, like what it thinks it has and what the client thinks it has.

I’ve configured everything to be on, and it still isn’t working. I’m starting to call bullshit on this, and just say that it’s broken and doesn’t work, flat out.

Next I’m going to have to install wireshark, and inspect the POP3 packets, to see what Thunderbird is sending, because there’s no other bloody way to find out.

SERIOUSLY. NO SOFTWARE SHOULD REQUIRE PACKET SNIFFERS TO MAKE IT WORK. WHAT THE HELL ARE YOU GUYS SMOKING?

Ci20 Creator is here!

The Ci20 has arrived - i.e. a genuine MIPS32 platform for test!

It’s plugged in and ready to roll.

I’m just getting the test clean under valgrind for the new queue free test, so I can run that a hundred times or so and see if it throws an error.

Ci20… earlier GCC and barriers

The Ci20, thank God, came with an earlier version of GCC - 4.6.2 to be precise. This provide the old style atomics, and so no compiler barriers - you have to use asm to get a compiler barrier.

Problem is, asm is a block-level element, i.e. it has to end in a semi-colon. You can’t have then separated by comma operators… …and the entire code base currently is written using inline-type elements for barriers, so they can be used in macros in while() loops and so on.

AFAICT, there’s no native GCC < 4.7.3 inline-style compiler barrier. I guess I’m going to have to implement asm compiler barriers as inline functions… gaaaak. Feels risky. Wonder if the compiler is going to get it right?

2015-12-23

Ci20

So, Ci20 is fabulously fantastic. No DWCAS and GCC < 4.7.3.

Done a bunch of fixing and now everything compiles.

Going to ditch the NOP stuff, and just use a volatile counting loop - right now there needs to be a NOP per platform, so you actually need the assembly code for a platform to be able to compile!

Great - Ci20 tests just passed on release. Thankyou Imagination Technologies!

CODE COMPLETE

long exhalated breath…

Okay, so there is actually a certain bit of documentation to complete - second drafts of the hash, queues and lists. They’ve all had their first drafts though and I’m confident enough now that there’s no surprises in there - the APIs as they are will be unchanged after they’re documented.

So I need to do those docs, need to review all the other docs to get them in line with the changes made over the last week or three, and I need to do all the build config work (MSVC solution files, etc).

Then it’s release time.

And THEN it’s back to work - get SMR in, the SMR versions of freelist and stack, then get the benchmark app back into play. THEN it’ll be real new data structures (linked list with real delete) and RPM packaging and a ton of other stuff I can’t remember right now :-)

2015-12-25

Docs

Okay, docs are good enough for release - well, I mean, I’ve got to spend a day or two now getting all the builds working again and running test, once that’s done I think I’ll give the guide documents another look over, coming to them a bit more freshly than now.

As I appreciated before, when I documented release 6, mediawiki is in its native form wholly unsuitable for documentation. The simple lack of a global search-and-replace is enough for that. Maybe there are scripts which can do this… documentation has taken up an improperly lrge effort, in that documentation necessarily takes up a certain effort but I’ve ended up spending a lot more effort than that because of the limitations of mediawiki.

I need to make one final set of code changes now - the COUNT queries all need to be safe to use on an in-use data structure (currently they’re all singlethreaded).

Update

I am now this moment beginning to work on bringing all the build files up to date.

I think it’s a day of work.

I want to do a thread sanitizer run too, it’s been a while since I have. That’ll take an hour or two, as well (it’s quick enough to run, but it takes a while to think through all the results).

2015-12-26

I’m impressed

I’ve been working on getting the build files up to date.

I’ve been working thi morning on the WDK 7.1 build.

I’ve made it compile for ARM and IA64 - i.e. sorted out the porting abstraction layers for these processors, which is really nice.

However, the IA64 build was failing to compile - claimed unreachable code in the ringbuffer init.

I’ve just figured out why - and I’m impressed by the compiler.

IA64 does not support DWCAS. As such, the dummy DWCAS macro is in use. The macro takes an argument, result, which normally is set to 1 or 0, but in the dummy macro, is simply set to itself to remove a compiler warning.

The ringbuffer init, inits and then pushes elements to a freelist. The lfds700_freelist_push() function calls the DWCAS macro, from the inside of a loop, where the loop only ends if result is set to 1.

The compiler noticed result would never be set to 1, and was calling - rightly - unreachable code in the ringbuffer init function! ### Microsoft and pure total mental agony

I hate Microsoft.

Working with MSVC is torture. Working with the web-site is torture.

It takes fucking forever to do anything and it’s either so much work to find out what you need to know, or you can’t find out at all.

So; I’m done with WDK 7.1 builds, now I’m looking at the MSVC builds.

I want to support an ARM build. First question; what’s the earlier version of MSVC which supports ARM? I can’t find out.

It looks like it’s probably MSVC 2013, given what people have posted on various forums (!)

I go to download 2013.

There are EIGHT DIFFERENT VERSIONS, AND NO INFORMATION ON HOW THEY DIFFER.

There are three “express” versions, which I think from prior knowledge are the free versions, and five non-express versions, such as “premium”, “ultimate” and “professional”.

Anyone have any ideas about the difference between premium, ultimate and professional? ’cause there’s fuck all on the download page.

Next question - I know already that the WDK 8.0 will only work with the professional version of 2012.

Is this still the case with 8.1 and 2013? I can’t find out.

Next question - will solution files made with a non-express version load into an express version? I can’t find out.

Next question - which versions of Windows are supported by the different WDKs? I can’t find out, in part because on the download page for WDKs, the link in each WDK’s section, which should according to the text go to the page about that WDK and tell you, in fact ALL go to the WDK 10 download page.

Every time MS ditch a WDK but it is the only WDK for a given OS, I have another platform to maintain - another Window VM with it’s own MSVC and WDK install and it’s own build files. I don’t have this problem on Linux, because there it’s just a matter of having the correct version of GCC and header files.

I hate Microsoft. Working with their tools is pure bloody torture - and I’ve not even spoken here about THE OBSCENE MENTAL TORTURE OF MSVC SOLUTION FILE CONFIGURATION. If it was’t for the fact I want to support people using this platformm, I wouldn’t touch it with a fucking barge pole, in exactly the same way as I wouldn’t stab myself in the eyes with forks, because, you know, IT HURTS.

Addendum - it gets worse. I have just discovered - not on the MS site, or the download page for it which makes no mention of this at all, but on the wikipedia - that 2013 requires Windows 8.1. I am not happy. I have by pure luck avoided wasting quite a bit o effort downloading the installer, setting up a W7 VM and trying to install. I do not appreciate my time being wasted by gross incompetence.

Update

Working on the Linux user-mode builds now.

Have a script which builds every variant and runs test for debug and release.

This script is running at this moment on ARM32 and MIPS32.

When they’re done, I’ll run on x86 and x64.

Once that’s done, Linux kernel-mode.

Looking into seeing if I can get hold of a Windows 8 ISO… sigh.

2015-12-27

Update

Wrote a script which makes every build type and runs test for the debug and release builds.

This passes fully on ARM32 and MIPS32 (GCC and gnumake).

Downloaded a 90-day Windows 8.1 trial ISO and Visual Studio 2013 Professional (I’m hoping it will offer a trial install), and the 8.1 web installer (can’t find an ISO and just like ABN AMRo, THE worst bank in the world, every time I touch MS they mess something up, be it big or small - this time I’m told have to log in, to donwload - only in fact I did not have to do so).

I have to say, most profoundly, Microsoft have made the process of porting to their platform excruiatingly painful and above all TIME CONSUMING. It is SO MUCH HASSLE - and then, even once you’ve done it, there’s little scope for automation, since each WDK has to run in a different VM and I suspect now I may need to support three different WDKs, maybe even four to get Windows 10. It’s a joke. It’s totally insane and I boggle that MS have ended up offering this to developers.

On Linux, I wrote a script in about five mins which built and ran every variant.

Microsoft, it takes DAYS - that’s MULTPLE DAYS - to obtain, install and configure the build environments.

Anyways, ARM32 and MIPS32 are happy. Tomoz I’ll confirm x86 and x64 on Linux, which means all the Linux builds are in shape and ready to go.

Then it’ll be a case of deciding what to do about Windows and the MS problem.

I may for now just update the VS2012 and WDK 8.0 build files. This would mean no ARM32 support on Windows, because I need VS2013 (which means Windows 8.1) and WDK 8.1 to build for ARM. I can do those later - or I could try to do it tomorrow, install 8.1, etc.

I’ve also downloaded an Android image for the Ci20 (MIPS32)! it boots right from the SD card, so I should be able to check the library and test apps build okay on Android (even if it is MIPS and not ARM - Android for the Pi is apparently quite a bit more dicey).

Update

Test runs valgrind clean - almost no work to get back to clean, as I’d done this some time ago.

Added another test for the new ability to free queue elements.

Shortly going to run the full build and tests on linux x64, linux x86 and, hopefully, Android MIPS32.

Windows 8.1

Well.

I’ve installed 8.1.

I’ve installed VS2013.

Know what?

I cannot work out how to start Visual Studio.

Srsly.

Updatez

I think I’m done with Linux build config.

I can’t build/test on x86 any more - Amazon no longer offer x86 machines, far as I can tell.

This leaves MS.

It’s looking like I need build files for both express and professional versions of every current Visual Studio, plus a second set for each of VS+WDK 8.0/8.1/10.0. These build files also are a screaming nightmare to produce. A billion different GUI switches which you have to click on and review - repeated each time for each build variant, which is debug/release, ARM/x86/x64, DLL/LIB, user-mode/kernel - twenty four variants - and I need to do this three times, for MS 2012, 2013 and 2015.

Microsoft, this is absolutely insane. I’m not even going to get angry because this is beyond anger. There are no words to describe how much of a problem this actually is, and this is the environment MS is offering so people can write software on their platform.

2015-12-28

Update

Forgot to sort out the Linux kernel build. Done that.

Okaaaaaaaaaaaaaaaay.

What to to do about Windows?

2015-12-29

7.0.0 release in the morning

Windows - I got hold of an 8.1 eval, installed it (hour or two on a VM) and installed a trial 2013, copied that VM and then installed WDK 8.1 as well. Also copied my base fresh Windows 7 VM, installed VC 2012, copied that again and installed 8.0. Took a day or so, all told.

(Took me all of five minutes to write a script on Linux to build every variant and about ten minutes to run it. I’ve been watching “Yes, Minister” recently. Microsoft are the British Civil Service).

THEN found out ARM support exists in 2012.

Could not however figure out how to build an ARM library using 2013. Keeps telling me desktop apps not supported for ARM. Googling a lot didn’t help. Given up.

Building an ARM lib for the kernel with WDK 8.1 was straightforward. Will try it tomorrow with 2012.

So I pretty much know where I am. Will finish this in the morning and release, and then get on with polishing the docs, and then the next thing will be getting the benchmark app back into play.

7.0.0

It’s out.

Heaven help us all :-)

Home Blog Forum Mailing Lists Documentation GitHub Contact

admin at liblfds dot org