I’ve made a small but important improvement to the unbounded/many/many and unbounded/single/single queue APIs.
The queue elements now have a user_state, which you can get and set.
Those of you (both of you ;-) who have used the unbounded/many/many queue (the single/single is in 7.2.0, which isn’t out yet) will know that the dummy element is a pain in the ass.
When you dequeue, you get a queue element and you get a key/value pair. Problem is, the queue element you’re given is what was the dummy element in the queue; you can reuse it, for sure, but often what you’ll want to do now is put it on a freelist, and for that you need a freelist element. So you may well have defined a struct of your own which is a queue element and a freelist element - only now you have a problem; you just dequeued, fine, you have a queue element, fine, you have a value, fine, the value is your struct with the freelist element and queue element, fine… only it’s not fine. The struct of yours which you have is NOT for the queue element you’ve been handed by the dequeue call - it’s for the next element. So you can use the freelist element in there, but now you’re physically separating the freelist element and the queue element it’s putting on the freelist. This is bad for performance - you get a cache line miss.
Now though, we have a solution. We set the user state for the queue element to be the struct it is in. Bingo. We dequeue, we get the value (which is what we need for the queue to be correct) but we ALSO get back the correct freelist/queue struct for the queue element the dequeue function just gave us.
Problem solved. I need to update the ringbuffer now with this, too, it’ll improve performance - shame there’s no benchmark yet for the ringbuffer; I might need to make one now to see how much difference this makes.
Well, maybe disappointing.
The benchmark has two threads (naturally enough) and we see when they run on seperate physical cores we get about 8.8m ops (the producer thread enqueues 8.8m elements, the consumer thread dequeues 8.8m elements).
This is about the same as the unbounded/many/many running two threads on two physical cores (although the benchmark there is a bit different - there’s one queue, and each thread does an enqueue followed by a dequeue, so one op is an enqueue and a dequeue, and there we get about 5m ops on one core and about 8m on the other.
What I’m thinking though is that single/single will scale, where many/many does not. Many/many are the core counts go up dives into a deep hole in the ground. I think single/single might scale linearly - there will be many queues, inherently, but each one will still manage the 8.8m no matter how many of them are running.
The benchmark needs more enhancement to test this though.
I’ve pinned down the SMR bug.
A thread would miss an SMR generation and by that end up two generations behind, and the code wasn’t coping with this.
Performance hit from SMR is about 50%.
This is appalling - I suspect hazard pointers are about free - but hazard pointers are someone elses invention and I’d need permission to use them. The SMR I have now is purely my own work.
I was wondering how the Black Mesa mod was getting on - if they’d published Xen yet. I googled, went to their site - no, but Black Mesa is now on Steam, 20 euro.
Sounds alright to me. I liked what they did, I love Half-life, I’d buy it.
I go to Steam. It’s selling for 14.99 GBP, not Euro.
Ah, I think. They think I’m in the UK. I’ll change it to Germany so I can pay in euros.
I change to Germany. No problem.
Then I think - ah, wait. The German version is censored, right? I google. Yes, it is.
Okay, no problem, also 20 euro is a bit more than 15 GBP, I’ll change it back to the UK.
Here’s the punch-line, folks!
I CANNOT CHANGE THE COUNTRY AGAIN.
Steam is telling me they will only accept my country as being Germany (presumably based on IP lookup, so if I was still going to purchase, I’d run Steam through Tor with a UK exit node selected).
Important lesson here. One I have already learned with regard to banks and which I now know I need to apply to other organizations - probably all. NEVER TELL ORGANIZATIONS WHERE YOU LIVE. In fact, in general, tell them NOTHING. If you tell a bank you’re in a different country, for example, they will usually go bonkers in some bizzare and totally crazy ways - ways which you could never predict, because they make no sense whatsoever. Just don’t do it.
So I have a UK accout, I want to buy the UK version, etc, etc, etc - buttttttttttttttttttt no. Steam and Valve know better! I told them ONCE I live in Germany, so it MUST BE TRUE. And now I can only change it back when I physically return to the UK and a nice UK IP address.
Reminds me of Skype. One day years ago the money I had in the account disappeared. Only about five quid. I contacted Skype and said I’ve change my password, but there’s been fraud, what do you do?
Their reply was tha they had locked my account until I change my password and provided a copy of my passport or ID to prove who I am (facepalm) and that since it was fraud, it was my fault, not theirs, and they do not give refunds.
So because I said it was fraud, it was absolutely and definitely fraud. And not say, a bug in their accounting software. Total unmitigated fuckwits. This was of course also roughly around the time they stopped being peer-to-peer and became server based, so the NSA could record all the calls, so I had what was probably a lucky escape, as I - as you can imagine given their response - certainly never went to the risk and trouble of sending them a copy of my ID after they needlessly locked the account. In fact I read about that time when I was googling of people who’d had auto-topup on, and had lost their sizeable chunks of their current accounts to fraud. I had never given my password to anyone, of course - it was either a brute force attack which Skype never picked up on, or Skype were hacked.
Back to Valve being fuckwits. I mean, I understand it, I think - I’m sure this is forced on them by Governments. But it means Valve/Steam are now in the catagory of “major bank”, i.e. major fucking morons due to utterly insane behviour from totally unexpected causes, so you can’t trust them with anything. Achievement unlocked.
IN other news, I’m still waiting for my volume unlocked iPod shuffle from the USA, to get around the EU regulation which sharply limits maximum volume.
Pertinent aside : I pay about 6000 euros a month, in total, to the German State, to live here. This is part of what I get.
I’ve been working on SMR.
There’s a bug, and it’s maddening, and I’ve been working it and it alone for a week now. I mean, every spare hour, including all Monday (public holiday).
I’ve finally narrowed it down to this a CAS sometimes (after a few seconds of 100% CPU with four cores) seemingly return 1 for success but not actually changing the target. Yes, really.
This seemed to indicate it was my mistake - I had mis-understood __ATOMIC_RELAXED, and the lack of a compiler barrier might be the cause. However, changing over to __ATOMIC_ACQ_REL didn’t fix the problem.
Now I’m googling for GCC bugs - and I found something unrelated but which finally answers an ages old question of mine as to GCC support for 16-byte CAS on x64.
The bug is a docs bug, and the plantiff says “the docs don’t say a word about 16 byte CAS being supported”. He’s right, too, they don’t. However, this is only half the problem - the GCC atomic intrinsics work on types, not arrays, so you need a 128 bit type. The GCC docs for this are even worse;
Reading that, I come away thinking I need native support for 128 bit types. This is as I have now discovered not the case. x64 has __int128 just fine, and with “-march=native” you can compile a 16 byte GCC atomic CAS just fine.
Niiiiiiiiiiiiiiiiiiice. So all the arrays can go away - except - it’s GCC 4.6.0 and later and what about Microsoft? MS do not have a native 128 bit type.
However, even with keeping the arrays, I can use the GCC atomic CAS for DWCAS on x64 and ARM64 - so I can remove the inline assembly on both platforms (the inline assembly which works for x64 but where on ARM64 I’ve had a first pass at creating it, but not yet having an ARM64 to test on, it doesn’t work).
So I could do that… GCC 4.1.2 was released 13th Feb 2007, 4.6.0 was released March 25, 2011. So it would move the minimum compiler version forward by four years.
I’ve just compiled everything with clang.
Almost effortless - there’s one GCC argument I’m using which is not supported.
I have found however it’s only with the VERY latest clang (2nd Sep 2016 - about a month ago!) that the __atomic* GCC intrinsics are supported. With the version I have with Debian Jessie, 3.8, I only get __sync*. I note the GNUC versions set by clang are 4.1.2, which is exactly right for the __sync* stuff - that’s the version it was introduced. The latest clang (with __atomic*) is 3.9.0.
I’ve not figured out the SMR bug yet. It’s wierd shit. I can read the value before the CAS, get a 1 from the CAS (CAS happened) read the value immediately after and see it’s correct - and then one or two lines later, when I read it again - IN THE SAME THREAD - it’s reverted to the previous value.
As far as I can tell from logging, no other thread has set the value, and also no other thread according to the logic of the code COULD set the value.
I’ve tried with GCC 6.2.0 and I get the same problem. About to try now with clang…
Ah, one other thing, my final comment in the previous post was wrong. I do not need to move on from GCC 4.1.2 - I simply have to write the abstraction layer for the compiler such that I have one for DWCAS for 4.1.2 to 4.6.0 (using inline assembly) and then from 4.6.0 to 4.7.3 I can use __sync* with __int128 and then with 4.7.3 and greater I use __atomic* with __int128.
Either way, I have ARM64 support now, for GCC 4.6.0 and later.
Oh, I also looked at downloading and building all the formally released GCC versions from 4.1.2 onwards. It looks pretty straightforward, except I have to see how much the configure options may have changed over time.
So I expect to end with a script to set up the build environment, which downloads, builds and installs every GCC version. Benchmarking across versions will be very interesting!
I’ve worked out the SMR bug.
Laying in bed at 23:40, taking what I had learned today about the problem and thinking again about the algorithm itself - and there it was. It’s the entry/exit macros for the lock-free sections. They’re reverting the SMR thread state generation counter. I’ve not tested it to prove it, but I am certain.
Have to think how to modify the design now.
What a marvellous and wonderful relief!
Working on the build system.
Writing a script which downloads every release of GCC from the latest back to 4.1.2, building the version in question and then using that build to build itself again (i.e. so I have the given version of the compiler used to build the given version of the compiler, just as in the official releases).
I’m then going to build, test and benchmark against all versions, for every release.
I need to do the same for clang, and that’s more pressing, because the test app goes into a permanent loop with clang 3.8.0 in one the elimination layer freelist tests. It might be a compiler issue, or it might be an EA bug.
Using multiple compilers is a Bloody Good Thing.
So, two lines of work are in progress.
First, setting up every version of GCC, so I can compile, test and benchmark against them all.
Second, Safe Memory Reclamation.
With regard to GCC, as I’ve been progressing with this work, I’ve been learning more about the GCC/compiler eco-system. Stuff I vaguely knew, but which I did not particularly know, and now I am particularly learning about; there are in fact three pieces in this puzzle. There is GCC itself, then there is binutils (which provides the linker and assembler and GCC calls both when compiling C) and then there is libc. Each has their own - long - version history, with lots of releases. A canonical combination might be those versions which were available at the time a given GCC release was made - but this is by no means the and the only possible combination; I think there is a lot of scope for different vesions to be used together.
The script I’m putting together to generate all the GCC versions currently has two steps; build a given release with the local compiler, and then build with that newly build compiler itself, i.e. download 4.1.2, build with the installed GCC (4.9.2), then use the 4.1.2 I now have (which was built with 4.9.2) to build 4.1.2 again - so I have a 4.1.2 which was compiled by 4.1.2.
This means I’m using my local libc (2.19-18, compiled with GCC 4.8.4 on a Linux 3.16.7 system) and my local binutils (2.25).
I don’t actually myself care about libc, because liblfds itself does not use it; nor does libtest or libbenchmark (although they do use libpthread (huh - maybe a fourth piece in the puzzle). Only the test and benchmark veneers use libc, and they use it very lightly - printf and malloc, mainly. Now I think of it they also use libnuma.
I have something of a memory now about now mixing libc versions with different compiler versions. I need to check on this. If it’s true, then I HAVE to recompile libc for every GCC I use. That would then leave the question of whether I can use binutils across GCC versions.
So, basically, pulling on a thread here. I care most about compilation problems, since they are fatal to users, and any user compiling with a given compiler version is likely to be on a coherent system and so will have the appropriate versions of the other libraries and tools. However, I do rather care about getting a sane final compile, so I can benchmark across compiler versions. Actually, I think I also care about libc versions for benchmarking, because I benchmark locking-based mechanisms too.
So, some Googling indicates in theory you should only use a library with the compiler it was compiled with. This is because in theory each compiler has its own ABI. In practise the C ABI is very stable. Nevertheless, I actually want to compile all the different libpthread versions to see how their locking mechanisms vary in performance. (In fact, I’m using GCC 4.9 now with libc compiled by 4.8.4, which looks wrong!)
Also looks like binutils has a relationship to GCC and also libc - certain vesions needed for certain versions.
So; GCC, libc, libnuma, libpthread, binutils.
If I only care about compilation errors, I only need GCC. I don’t even need to bootstrap, really - it’s not very likely there will be compilation problems which occur only if the compiler is compiled with itself.
If I care about benchmarking, then I need the lot.
(More Googling - pthreads is part of libc, apparently - the libc compile I guess produing more than one library).
Moving on - second, Safe Memory Reclamation.
I’ve been trying to get my head around the core implementation of hazard pointers as presented by McKenney, here;
The three core functions are provided, and they’re only a few lines of code each.
I’ve seen the white paper, of course, from Magel. For me, it is utterly impenetrable. I understand - or think I understand - the concept of hazard pointers, and the concept is as simple and straightforward as the white paper is impenetrable. I don’t think much of the white paper.
The code itself - I didn’t get it. Probably part of the problem has been it’s out of context; when you’re dealing with memory barriers and so on, you often need to see what all the code is doing, to be able to follow the paths of execution, to see what gets flushed to memory when, and so on.
However, a day or so ago, I had an insight.
The code itself only issues memory barriers, so I couldn’t understand how it could work. When other threads came to check to see if the per-thread hazard pointers were pointing at something, how could they be guaranteed to have seen the values set by other threads?
Then it came to me - or at least, I think it came to me :-)
The code is using the atomic operations performed BY THE DATA STRUCTURE to flush its writes to memory.
In my epoch-based SMR, on the entry to a read section, I’m issuing my own exchange to guarantee my status flag change it pushed to memory. In fact I need to find a way to use the atomic op performed by the data structure to do that job.
Oh - and there’s a third thing.
I read a post by Drepper, about static linking. He said - never do it. Always used shared objects - and he provided arguments which convinced me; indeed, there is one simple argument which is by itself enough. What happens if you have to issue a security fix? with a shared object, you replace the shared object. With static linking, you have to recompile all the dependent programmes.
With this thought in mind, it seems to me now I am going about versioning in the wrong way. When I release a bugfix, I bump the API name prefix (7.1.0 to 7.1.1), i.e. I never in fact release fixes to any already released libraries. That WAS the idea, in fact - but I’m going about it in the wrong way, because this way forces recompilation of the dependent programmes.
TUrns out GCC when you build it self-bootstraps, AND if you put a binutils source tree in the root of the GCC source, will build that too!
So that leaves libc. Looks like this might be coming together… …time for some chocolate!
I spent today figuring out release dates for GCC, binutils, glibc and numactl, going back to the release of GCC 4.1.2, so I can build the appropriate matching versions of these packages, to build liblfds.
This is the result!
I’ve spent the day trying to compile old versions of GCC.
GCC itself seems okay - but binutils is a freaking catastrophe.
Binutils is basically fucked. AFAICT, all versions compile with warnings, and all versions have warnings-as-errors turned on.
Remember that I’m building binutils through GCC - just putting the binutils source dir into the GCC source dir and letting GCC do the work.
I think now that –disable-werror passed to the GCC ./configure propagates to the binutils ./configure; this has got me to the point now where the binutils (2.17) build seems to be breaking because… it’s trying to call something called “cc1obj” and it can’t - rightly so, because it’s not there - but I think cc1obj is actually the Objective C compiler…
I’ve also found out GCC depends on a couple of external math libraries, which I also need to download and have GCC build, so I need to add them to the version matrix in the docs.
2.27 will build, but it emits its “ar”, etc, into the a directory which is different to the directory GCC looks for them in.
I guess maybe later GCCs look in the right place, so I should stick to the chronologically matching binutils version (2.17). So I go back to 2.17. I install the Objective-C compiler and the “can’t exec cc1obj” bug is fixed. So binutils is in fact dependent on Objective fucking C. It would be nice if I could find some - ANY - documentation on this. We are however talking Linux, which means broken-out-of-the-box-with-no-docs. They say there’s no guarantees with Linux. Believe me, this isn’t true. There really is a guarantee.
So, Objective C installed and now we’re on to the next problem - which turns out to be that modern gnumake has a built-in rule for the suffix “.m”, and that built-in rule is wrong for the gprof makefile.
I mean, shit. If you’re going to fuck it up, fuck it up REAL GOOD.
More later. So I checked the release dates on gnumake. Sure enough, looks like bunutils 2.17 would have been built with 3.80, and 3.80 doesn’t have an implicit rule for “.m”.
So I download 3.80, make it, put it in /usr/local/bin.
Now here’s the thing; which make gives me /usr/local/bin/make.
But “make -v” gives me version 4.0.0 - the make in /usr/bin.
If I rename /usr/bin/make to /usr/bin/make.old, I get a “can’t find file error” when I do “make -v”.
In other words, there is a totally unexpected hidden extra layer masking which make you use. WHY THE FUCK DOES THIS EVEN EXIST?
Why do I even HAVE /local IF THE OS IS GOING TO HARD CODE USE OF /USR/BIN?
It’s bullshit enough trying to compile binutils without the operating fucking system pulling shit like this AS WELL.
I think I’ve hit the end of the road with 2.17. I can’t make it compile. I think it probably it acting properly on the .m files, it’s just that one of those files is actually really broken - or anyway, broken maybe so if you somewhat could telepathically know the right config you could fix it, but that means all seven billion humans on planet Earth can’t fix it.
So I decided to have a look at version 2.17a - maybe it’s a bug fix?
Guess fucking what. When you untar it, it untars to “binutils-2.17”, not “binutils-2.17a”, thus over-writing your existing directory. FUCKING MORONS.
If this is what I can expect from binutils, then I am a fucking idiot for trying to make it compile.
More more later.
The earlier version I can make build is 2.20.
However, this has the same problem (at least with GCC 4.1.2, assuming GCC version is relevant) of emitting its binaries into a directory which is different to the directory GCC is looking in.
I’ll try a later GCC to see if it makes a difference.
So, GCC compiles and binutil compiles. I presume libgmp is compiling too, since it’s in there.
The problem now though is that the GCC docs are fucked. They won’t make.
Turns out the doc files with 4.4.3 (and prolly a bunch of older versions) won’t work with more modern versions of texinfo. Looks like there is a workaround.
Basic lesson here. If your code has a dependency on ANYTHING - A N Y T H I N G - then in a couple of years, your code will not build because of that dependency.
Well fuck me - the workaround does fuck all. I’m so surprised. The idea of the workaround is right though - explain to configure there’s no texinfo and not to make docs.
So I’ve uninstalled texinfo. That seems to have enabled me to move on to the next set of errors, which are that libmpfr is missing - which is a valid error, as I’ve not put that library into the GCC source tree.
I need to figure out which version to use though, and it’s bed time now - it was bed time 1.5 hours ago.
GCC docs say this;
“If you also intend to build binutils (either to upgrade an existing installation or for use in place of the corresponding tools of your OS), unpack the binutils distribution either in the same directory or a separate one.”
And then explain about the separate dir case.
So, what do I think they mean? I think they mean put the root binutils dir in the GCC sources, because otherwise I’ll be spaffing the ton of directories in binutils into the GCC source root and that’s nuts.
Nuts is the order of the day.
Now that wouldn’t be too bad - well, except it is - except that if you in fact do what I did, and put binutils into the GCC source dir, GCC WILL FIND IT, CONFIGURE IT AND BUILD IT. It will ONLY complain at the very end, when it will say it can’t find “ar”, because it’s looking now - finally, despite getting it all right up till now - in the wrong directory.
But it gets worse.
You see, now, having finally worked out how this way of building GCC worked, I now simultanously find out IT DOES NOT WORK.
You see, binutils has an include directory, and so does GCC, and they share some files, and the files differ.
I now understand a post I read on Stackoverflow, where the poster was talking about making softlinks to all the binaries made by binutils.
Open Soure : broken out of the box and no docs.
I’ve improved and extended the GCC article into a general article on building (curse, spit, etc) GCC.
Matrix of GCC/dependency release dates, so you can try to figure out which versions to use, general advice, specific advice and currently explicit and tested instructions to build 4.4.3, which I will (God help me) be extending to cover all the versions of GCC which I can manage to make build.
So, basically two lines of work in the air right now.
First, which is making progress, is building every GCC starting with 4.1.2, first on x64, then ARM32, ARM64 and MIPS32. Once done, then building liblfds with every version of GCC on all those platforms, and making sure test passes and then comparing the benchmarks.
Tell ya somethin’ - GCC 6.x.x takes freaking FOREVER to build. Four core i5 with SSD, takes about two hours.
Second, which is not making much progress, is thinking over the SMR design. I’ve been trying to think up a lock-free epoch based design, which would mean not being forced to process generations sequentially - being able where necessary skip generations, until they became releasable - but I think it’s not possible in any obvious way. The information just can’t be obtained, even more so when thinking in terms of memory barriers only, which depend on the atomic op of the data structure operation to publish.
I see very clearly why now hazard pointers are so good, but I don’t just want to use someone else’s idea - but given the limits imposed by hardware capabilities, it is hard to see any other approach.
I’m still waiting for the PINE64 to arrive, which is a blocker for the next release.
I need also to sort out the mysterious performance behaviour of the unbounded/single/single queue and also try to get the elimination layer on the freelist scaling.
I have also been told (and I think it’s true - I knew, really) that casting away volatile is undefined behaviour. That sucks, because being able to do so gives me the ability to do single-threaded ops on the data structures (have them after such work with a function call flipping back to thread-safe behaviour).
Maybe I can determine behaviour for GCC and clang, and offer it on those compilers.
Well, after building GCC, my tolerance for fucked up builds has massively increased.
I’ve built 6.2.0, 6.1.0 and 5.4.0 now and I thought it prudent to get a glibc build passing, in case there’s something amiss with my GCC builds.
glibc build is fucked, or maybe debian is fucked, but it seems - compared to my new high tolerance - to be dead simple; although I’ve yet to get it working and there’s no end of scope for totally fucked up insanity yet.
Problem is that glibc looks at /usr/include/linux/version.h, to know the verison of ther clib kernel headers, and looks like debian despite being on kernel 4.7.0 now is only shippined clib kernel headers for 3.16.36.
So now off to fucking sort out the fucking fucked up fucking Linux build process once again, and indeed every fucking time I ever use it for anything, ever, amen.
Well, I think I made some progress, but I can’t be sure yet.
I’ve stopped trying to build binutils as part of a combined build with GCC. The docs say you can, but they’re wrong, AFAICT. What you have to do (the docs don’t explain this and it took me the first weekend to find this out) is dump the entire contents of binutils into the GCC dir. Problem is, both have an include dir, and they share some files, and the versions differ.
I read that the GCC files take precedence, but when I tried that, it did not build.
I also tried putting the binutils dir actually in the GCC dir (rather than the contents of the binutils dir) and you can make this work with some hackery - you have to make softlinks in the GCC dir to the build results of the binutils dir (which configures and builds just fine - just it’s output is in the wrong place and you only find out you’ve done the wrong thing during the make - this is completely typical of GCC building) and the get wiped during every build stage, and a build script which is generated is built wrongly, it doesn’t know where nm is, so you need to fix it by hand - but the upshot of this binutils is that although it builds, it’s fucked; when you come to make glibc, you get freaky errors about CFI instructions, and there’s NOTHING in google about what causes that problem…
I think what’s happening is maybe the OS native binutils are being used in part during the build - so the only binutils build I’ve made work is a stand-alone build, making binutils on its own, with the OS native compiler, and then installing it in /usr/local and using update-alternatives to get it into use.
That’s been the main advance I made this weekend (two solid twelve hours days - again).
That lets me build glibc up to a point - it ends up complaining it can’t find “crt1.o”. This was a surprise, as I expected glibc wouldn’t be linking binaries, but it does. But it has already at this point made crt1.o (with the GCC and binutils I’ve made) but it’s not seeing them. However, if I tell it about the stuff it’s made (LIBRARY_PATH, LD_LIBRARY_PATH), what I find is the make seems to be calling iconv, which is then core dumping because it’s using the libc shared object the build has just made…
So, obviously, I’m messing it up somehow, but fuck me if I have any idea where. There are no meaningful docs (there is a page or two about buildin glibc; it’s like having one or two pages explaining say Egyptian hieroglyphs) and making pretty much any chance induces a range of new, bizarre, wholly unexpected build errors - and you haven’t even got ANY build going yet. Like adding “–disable-shared”, which you might think would simply things, causes the make to almost immediately fail due to a missing header file… which apparently is not a problem if you ENABLE shared. How does that even make any sense…
However, I can see that glibc builds fine with the OS native compiler and tools, so the problem is down to me somewhere in how I’m building or intalling GCC, and/or how I build glibc. It’s just that these are complex build systems, with no docs and completely meaingless error messages which are I think often massively downstream of the real problem.
I am basically now of the view that the only people who can make GCC and glibc build are the developers on those two projects.
Home Blog Forum Mailing Lists Documentation GitHub Contactadmin at liblfds dot org