Mailing list blues

The mailing list has been down for a little while – I’ve been too busy to fix it until just now.

It used to work, and then it started throwing an error, and now I’ve debugged it, I *think* (but I could be wrong), Python’s behaviour changed – it became unable to call a function (“email.utils.parseaddr”) because *later* in the same function, the variable “email” would be declared. Of course, if an undeclared variable is used, then it’s an error, but I’m surprised Python confused *the leading part* of a module name with a variable name – and that it did not seem to do so before. But I am Python hacker, I’ve never sat down and really read the docs, so I may be completely wrong and it’s just me, not understanding.

OH YEAH!

MAC                IP              hostname       valid until         manufacturer        
===============================================================================================
00:13:20:fe:2f:1d  10.0.0.6        minnow         2020-06-17 12:31:46 -NA-                
00:1e:06:42:30:d0  10.0.0.4        odroid-buster6 2020-06-17 12:32:14 -NA-                
70:b3:d5:92:f2:e0  10.0.0.7        freedom-u540   2020-06-17 12:36:53 -NA-                
b8:27:eb:b9:81:e4  10.0.0.2        raspberrypi    2020-06-17 12:30:10 -NA-                
d0:31:10:ff:73:17  10.0.0.5        ci20           2020-06-17 12:32:59 -NA-    

Dev board setup update

Three of the five dev boards are now running – ARM64, ARM32, MIPS32. I built the SD card image for RISC-V yesterday and this morning (actually involves building an entire Linux distro!) but it looks like the power button on the SBC is broken so I can’t power the board up right now!

The Intel x86 (actual 32-bit only) SBC I tried powering up a few days ago but it didn’t play ball – that’s next to investigate, while I figure out what to do with the power button on the RISC-V.

Finally, my laptop is working (go figure 🙂 so x86_64 is working.

Green shoots!

Astoundingly, I have begun to make some progress bringing the dev board fleet back on its feet.

I bought a decent little ethernet hub ages ago, and now a few days ago a set of little U/FTP flat ethernet cables (I had some S/FTP which were way too inflexible, because about half the dev boards have their ethernet sockets one way up, the others the other way up), and I’ve hooked up all the cabling and – now the difficult bit – I need to bring each dev board back to life.

My first problem was getting an IP to the dev board. Setting up a static IP on a headless dev board is a huge pain in the ass, because although configuring /etc/interfaces is trivial turning off DHCP when you can’t boot the sdcard is problematic, far as I know.

In the end I finally understood why the recommended method to configure static IP is to use a DHCP server to do it – the boards can all be configured with DHCP, and then they’ll work if they go into a router or into a local subnet hanging off your laptop.

My setup is that my laptop runs a Tor proxy and blocks all outbound access except that through the proxy. So I run a DHCP server now, on the ethernet port, which hooks to the hub, and the five dev boards are plugged into that, the dev boards IP from the DHCP server, and they’re all configured to proxy via the Tor proxy on my machine – beautiful. Nice and secure – the dev boards don’t even *know* my public IP, let alone have an ability to access the internetwebs (except via Tor proxy, which is fine for apt, but wget didn’t want to work first time so I need to look at that).

That’s the plan anyway – and right now I’ve got almost all of it working for the old Raspberry Pi 3. I only need to set up the static DHCP address – everything else is fine.

There is an enormous cost to this though.

Only more modern apts can proxy via socks, so I needed to move from I think it was Jessie (it’s been a while!) to the latest, Buster (and, tragically, Raspbian is now called Raspios – those whom the Gods would destroy, first they make mad).

This means all the GCCs I built natively on the devices will no longer run, because the clib has changed. I spent many month building as many GCC versions as I could – and it could take days to build a single GCC.

Hola

I occasionally have emails from people asking me for the pre-release of 7.2.0, since it comes with ARM64.

I had one such email a couple of days ago, and it inspired me to get working again on liblfds – at least, in parallel with the other stuff that I’m working on (I’ve been writing a book for more than a year).

I’ve changed the ARM64 platform from a PINE64 to an Odroid N2 and this means I need to set up the N2 – get it into the IP network, configure it to be a build client, start building GCCs.

Well, it’s been 2.5 days now – about 20 hours – and I have failed to configure the N2 to have a single, static IP address. I’ve failed to do this on three different distros (Debian Buster, the current Armbian, and the current Ubuntu).

The cables work fine, as does the N2, as the N2 works fine with DHCP when connected to the router.

So putting this one down for a bit.

Now going to see about getting the RISC-V dev board up. I think I can get Debian going, but I’m wondering if I will run into the same problems with IP addresses.

Currently it looks to me like I will need to run a DHCP server to hand out static IP addresses, which is insane, but if madness is the only option remaining, well, what do you do?

Proving correctness

I’ve been working on the new test/benchmark programme.

It’s a lot of work. There’s a -lot- of code.

Problem is, the test code is not serious. All it does is run lots of threads doing lots of work and tries to make things go wrong – there’s no directed effort to cause unusual cases, it’s just run-and-hope – and there’s fairly limited checking you can do at the end of the test to see if data is correct.

It does find bugs, but that’s the most you can say.

I need and I’ve known I’ve needed for a long time a more formal, rigourous test.

This is not a straightforward problem to solve. There’s ongoing research work into how to do this in a timely and practical manner – which is to say, on unmodified or essentially unmodified source code, no translating the source code into a model for a modeller (since such work obviously leads to bugs).

I read a white paper just now which is fairly recent (2014) and what’s being done basically is enumeration of paths of execution, with culling for paths which are equivelent, with the compiler back-end being co-oped to notice when memory accesses occur (although I don’t know how it really knows, since what the hardware does is different to what the compiler does – OTOH, in theory, the world looks normal to a single thread so maybe that all works out, and you the processor ensuring the world looks normal is enough?), and then there’s a correctness checker which I’ve not yet read about, so I’m not sure how it works.

The code is on github, it’s for C++, but you know open source – it doesn’t work. I’m going to need to write my own.

Duration inspiration

So I’ve been banging away at getting the threading test/benchmark code working.

I had originally looked to use the Windows-type createprocess approach in Linux (you can do so) but this is wrong – the way Windows does this is more limited, and I can imagine use cases with Linux where thee more flexible fork() method will be -necessary-. So now Windows will use CreateProcess() (which needs a command line binary to execute) and Linux will use fork(), and that choice then defines the porting abstraction layer.

I also decided to move all memory allocation fully back out to the test/benchmark library caller – there was helper code in the library, but it’s just making life complicated and is potentially going to get in the way for some users.

So I’ve sorted out the fundamentals of the porting abstraction layer.

That leaves the duration problem – bare C89, so no timers. How do we know how long to run tests for? see previous post for details.

What’s just occured to me is to classify each test as slow or fast, and on the command line, the user can specify the number of iterations for each class, to over-ride the defaults.

It’s some complexity, but there has to be some user involvement because there is no way to measure time, so it’s impossible in the code to figure out how long to run a test for.

Update

Working on the test and benchmark programme.

There’s a lot of functionality, so a lot of work.

Currently getting thread-based testing off the ground.

Making progress. Two main things to sort out – how to handle durations and memory allocation.

The standard library isn’t used, so there are no time functions.

Duration is number of iterations.

Problem is, tasks (tests and benchmarks) vary in how long an iteration takes, and will vary as the platform varies.

So some users will want less iterations, some more, and the number of iterations will still need to vary by task.

Memory allocation is a problem for the command line wrapper.

The test and benchmark code itself performs no allocations, but the caller has to.

The test and benchmark programme being general then needs to deal with NUMA and NUMA/shared memory, on Windows and Linux.

So, there’s quite a few variations of allocation methods.

Once that’s airborne, then process-based.

After that, then re-implementation of all tests, and also then a complete set of benchmarks.

At that point I can get back to actually coding lock-free data structures… =-)

(This WordPress post was brought to you by Mousepad.)

Addendum – it’s even worse than I realised. You have to click “Publish” twice to publish.

Minor site update

Finally got rid of the frames.

Moved the site back over to https (had time to sort out the certs from LetsEncrypt).  There’s still a http server for the site, it redirects to https.  I’m fairly sure everything is working.

WordPress updated to v5 and the new editor is absolutely appalling, with no way to use the previous editor.  It’s about unusable.  It uses a tiny part of the screen for editing, the text and cursor keep being moved around when you perform operations in the UI, and the cursor disappears at times, not to mention the usual bizzaro-world totally unexpected jumping around when you’re doing editing and moving the cursor (that’s really the worst thing – the unpredictability and inconsistency of cursor movements – you have to *think* and *pay attention* to *move the cursor*, because it can’t be done on autopilot because the movements are not consistent).  Answer is editing in a real text editor and just pasting here.

I’ve had to write a book, the last few months.

Another month or two to go.

Then back to working on the next release.

Cute little problem

Running into a cute little problem, to do with initializing the test and benchmark programme.

The test and benchmark code is in a library – there’s a command line wrapper provided for convenience. The code is in a library so users on embedded platforms and the like can use the test and benchmark functionality.

The code in the library performs no allocations – the user passes in memory. The user could after all be an embedded platform with no malloc and just global arrays.

The library code is complex enough there needs to be some flexibility in memory allocation, so the user provided memory is the store which is placed into an API and that API offers malloc() like behaviour.

The test and benchmark code, being NUMA aware, needs an allocation on each NUMA node.

Asking the user to do the work to figure out his NUMA arrangement is quite onerous, though – and in fact the porting layer already has this code in it.

So what we really want it to get the topology info from the porting layer.

To do this though we need… some store from the user.

So it kinda looks like first we make a topology instance, and then the user uses this to find out about his NUMA layout and make allocs.

To make a topology instance though the user needs to know how much store to allocate – and that’s the cute little problem.

How do you write code which can either work and do its job, *or*, tell you how many bytes of store it will need to do its job?

Now if the function is “flat”, in that it needs no allocations to *find out* how much store it needs, then it’s straightfoward enough.

However, if the function is “deep”, and it needs allocations, as it goes along, to find out how much store it needs, then life is more complicated – in fact, what must happen is that the user calls the function repeatedly, until he gets the same result twice, and the user needs to pass in each time as much store as the function asked for the time before.

There are Windows functions like this.

Problem is… now it’s quite a bit of extra work, and I’m not sure I’m *getting* very much for all this.