Pi Death

My Raspberry Pi 2 just… died?

It was downloaded GCC 4.9.2 over NFS and then it just… stopped.

And now I can’t ping it.

Reboots not helping.

Hopefully rebuilding the SD card will fix it!

Building GCC, or, how to hack your way through jungle with a blunt keyboard

So, I’ve spent the last ten or so hours dealing with GCC build failures, as they’ve cropped up, on the three dev boards.

None of them have yet successfully built a GCC.

MIPS32 is trying to build 4.2.0 and it’s failing because the system GCC has sufficiently different header files to the 4.2.0 code base there’s a missing struct.

Looks like the way to deal with this is to work your way backwards from your current version, through earlier versions of GCC, until you’re compiling with a version so close to the one you want it can compile.

What this means ideally (and I shall do this) is that if your system has say 4.9.2, then you use that to build 4.9.1 and 4.9.3, and then you use those to build 4.9.0 and 4.9.4 – and so on. (Remember GCC builds itself first with the system compiler, then it builds itself again with that compiler, and then it builds itself *again* with that compiler – so we’re only talking here about the system compiler.)

This means though I will now need to figure out now how to build glibc. That was going to have to happen anyway, but it looks like another long, painful – no, *agonizing* – journey of discovery. In the same way that going to the dentist and having tooth removed is a journey of discovery.

So that’s the blocker for MIPS32.

ARM32 is currently trying also to build 4.2.0, and the problem is a missing header file relating to soft floating point. I wonder now actually if this is also a too-large-a-version-gap problem.

ARM64 has a stranger problem. The “aarch64” architecture was introduced in 4.8.0, so that’s the earliest version I can build on this platform. I run into the problem of not finding sys/cdefs.h – but I had this problem on MIPS32 and ARM32, and solved it by setting C_INCLUDE_PATH. That isn’t working now, *even though the required file is under that tree*. Actually I think this might be because it’s trying to compile something to do with C++ (even though it should only be doing C) and I’ve not set CPATH, only C_INCLUDE_PATH. Trying this fix now.

Build system beauty

All three dev boards are concurrently running the build_gcc.py script.

Turns out running the build directly over NFS was waaaaaaay slow. I mean, I’m only on a 100 mbit ethernet anyway, let alone that it’s also got to go over wifi, over USB, and then to a hard disk rather than flash based media.

So I changed the tmp dir to be local, and then cached the current source dir being built locally, so now I’m basically just using NFS as a store for sources (which I could download from the net, so nothing vital) *but* also as a store for the completed binaries – and that’s like half a gig a shot for the later versions of GCC, so that is vital, because the dev boards typically have 10 GB or less of store (and I want to build clang in the future too, and glibc).

So I’ll leave them compiling overnight. They’re all doing 4.2.0 right now, the earliest (and so smallest) version which I know can compile.

Going to take a long time to compile all the GCCs, I have to say.

Of coruse I could build on my laptop using –host and –target, but I don’t trust it to work, not after the massive problems I’ve had getting anywhere anyway with GCC building.

Anyway, most people will be building with native compilers, so it’s the best choice for me for that reason as well.

Build system update

Installed automake 1.15, which gives aclocal-1.15, and now GCC 6.2.0 builds.

Had a long drawn out “ahhhhhhhhhhhhhhhh” moment earlier. I had downloaded the 20GB of source which is every released version of GCC – in the /tmp dir. I had to reboot, after accidentally removing my normal user from sudo group (and so loosing access to root).

Ahhhhhhhhhhhhhhhhhhhhhh-h-h-h-h…!

The build script downloads the sources, of course. I’ll just need to run it again.

So it’s kinda looking like I can prolly build 4.2.0 up to 6.2.0, inclusive.

4.1.2 is probably going to be a headache, but we’ll see.

I expect to spend the weekend trying to build on the other platforms.

Update

Friend at work suggested using NFS with the dev boards.

I’ve set it up – bit of pain but not much by Linux standards – and tonight I’ll use it to build a GCC, and see how fast it is.

I had to reflash the NAND in the Ci20 to get Debian 8, though – Debian 7 didn’t want to play with my NFS mount, and I figured this was the easiest way. It worked, too.

I’m quite happy I’ve done that too – hadn’t done it before, and the Ci20 supports Android, so I think in theory if I buy another Ci20, I can have an Android platform (although it’ll be MIPS rather than ARM). It’s a shame the Pi never got a real Android port.

I still need a case for the Ci20 :-) the only ones I’ve seen are for 3D printing and they seem to cost like 40 euro or something, which is nuts.

Update

Worked the last two days till 5am. Getting to bed at a sane time tonight ’cause work tomorrow.

Worked on the script to build GCC. It’s done, except for one thing – turns out the source of every published version of GCC comes to about 20 GB of disk. Compiled it will come also to about 20 GB of binaries.

The SD cards I have for the Pi2 and PINE64 are 16 GB each!

So I need a pair of 64 GB cards for them, and another (non-micro) for the Ci20 (it only has I think 8 GB built-in store).

The latest version of GCC I’ve been able to build, so far, is 5.2.0. The later versions need aclocal-1.15, and I have 1.14. So close =-)

I’ve just built 4.2.0, which is the oldest version using make 3.81.

I want to build 4.1.2, since it’s the first version offering atomic intrinsics, but then I’ll need a new make, and I’ll prolly need to try to get it to build, too, since I failed when trying before.

I fixed up the normal build system to use update-alternatives (and set up the various sudoer permissions to let normal users use it from a script, i.e. sans root password) and built 7.2.0 using GCC 5.2.0. It built, no warnings, nice.

Building a late version GCC BTW takes quite a long time. I have four logical cores, I think for me it takes easily more than an hour. There are 52 releases of GCC which have to be built.

Building is much slower on a Pi2…

I’ve not used tried building on the Ci20 or Pi2. I’ve failed to build on the PINE64.

I’ve pretty much proven the x64 platform now and the build code for it, so the next thing to do is get GCC builds working on the other platforms. Once I’ve got a build on each of the other platforms, I can order the bigger SD cards.

Of course, once all this is done I still then need to figure out how to build glibc, so I can actually link safely and then run the test and benchmark binaries.

The payoff then though is awesome : benchmark gnuplots over platforms and over all GCC versions. I can then extend that to clang as well.

Kinda sucks I need to go back to work tomorrow!

Linux is fabulous and appalling

I need to execute sudo over SSH.

Looks like I want to modify the sudoers file. There is a man page for the sudoers file, but I’ve seen encrypted telegrams which made more sense.

I google and find someone who explains it in a line or two (it’s not hard – it’s amazing in fact that people can write page after page of man docs and completely fail to communicate any information *at all*).

I add a file in /etc/sudoers.d and add a line, using visudo.

I save, all seems well.

I later try to run the package manager.

Guess what? authentication as root is now broken.

No error messages, no warnings, no obvious connections. Undoing the changes made to /etc/sudoers.d/ does not fix things.

In general, with Linux, once something works, don’t touch it. The lack of docs (sorry – the lack of docs *not* encrypted with the meaningless-longwinded-arbitrary-insane-scribblings algorithm), warnings, errors or information means that you are taking risks without knowing, and that if a risk does occur, you have no way to fix it and/or you will now spend at least 15 minutes (and maybe an hour, or six) trying to figure out through Google what bizzaro-world dribbling madness just landed on your lap.

update

I’ve spent the day rewriting from scratch the python script to build all GCC versions.

It’s coming along nicely. I sorted out the problem from yesterday – it’s another (I’ve never seen anything but) spurious, meaningless error message. It was actually induced by the experimentation I’d been doing to build glibc. When I removed that, things were “fine” – I say in quoted because I have no idea if what I’m building is sane. If you run the test suite, you always get tons of errors anyway – they’re expected. Expected by the people who know what’s expected, which makes the test suite fractionally useful to the rest of us.

So the script now downloads all the source code, then makes all binutils and installs them, then makes all GCCs (using the right binutils, but the wrong glibc – uses the system glibc – because I’e not figured out (yet?) how to build glibc, since it’s even worse thn GCC, and believe me, that sayin’ something).

I’ve been building all the binutils.

It’s a bloody mess. The problem in particular is the docs. Binutils (like GCC) uses “makeinfo”, from package “texinfo”. Problem is newer texinfos don’t work with older binutils, and there’s no way to turn off doc generation.

So as a result, I cannot build 2.23.2, 2.20.1, 2.19 and 2.18. I also can’t build 2.17, because the build is actually broken.

One thing which is puzzling me is that – well, there seem to be two sources for binutils downloads. There’s ftp.gnu.org, which you’d think was canonical, only it’s missing some versions, which I have found over at ftp://sourceware.org/pub/binutils/releases/. Thing is, both of them have releases with the same versions but where there’s a second file with an “a” suffix, i.e. “2.19.1” and “2.19.1a”.

I can find no information at all as to what the “a” is supposed to mean. I suspect it may mean “we buggered up the first build, here’s the real one”, in which case “a” releases should always supersede. I think it’s not very likely this will solve the makeinfo problem though.

GCC actually suffers from the same problem, but if texinfo is not present, it doesn’t try to build the docs, so you can work around the problem (you just don’t get any docs).

Maybe the end of the line for GCC

So, I’ve been running my GCC build script on ARM64, just for 4.9.2. That script works on Debian on x64. It doesn’t work on ARM64. There’s a wierd error, fairly far into the build process, that it can’t find pthreads.h (which is present, and in the usual place).

Googling just leads to a bunch of other people, over the course of many years, saying they’ve found the same problem and are just about at their wits end (because there’s no apparent cause, and so no apparent fix).

I’ve spent six weeks getting that build script to the point it’s at on x64, and now I run it on ARM64, it doesn’t work.

My line of thought now runs like this : it is not possible to build GCC.

Because GCC cannot be built, it is only possible to test code with the GCC version on the build platforms you have. You can support no other versions, because you have to access to them, so you can’t compile with them.

That’s completely intolerable. No serious software package can be presented to users in this way – where the supported compiler versions are not under the control of the package developers.

I had a look at the build instructions for clang, and they look normal and sane. Thank God for choice!

Update

So, I started writing a HTTP server. Single thread, async I/O for network and disk. Simple, right? wrong. Linux async I/O support is a fabulous mess. I *think* I can just about get away with it, with two threads, one for epoll and one for io_getevents (kernel AIO). I *think* the kernel version of AIO will work on regular files (but I’m not certain, actually).

The HTTP 1.0 spec is easy to implement – but there’s no pipelining, and pipelining matters. I looked at the 2.0 spec, and it looks complicated. One of the beautiful things about 1.0 was that it was *simple*. I think 1.1 allows clients to send say chunked encoding, so already implementation is a headache. So we’ll see where I go with this.

So, given that I’ve got to read the HTTP specs, what I’m actually working on now is compiling all the GCC versions on the various build platforms. It’s a lot of work – GCCs seem to come in tranches, so there’s going to be a number of separate sets of build problems to deal with.

Right now I’m compiling 4.9.2 on ARM64. Will make sure -march=native works. Then it’ll be time to build all the versions and finding out about the build problems.

And then doing it all again on ARM32 and MIPS32 (but it should be pretty easy by then, hopefully they’ll run through without any extra work).