I’ve spent more or less the last few days of spare time trying to figure out why gcc-4.3 built kernels were so unhappy on parisc. Basically the symptom was that any IPv4 networking operation would only complete once and a while, for example, ping would drop 90% of packets. This turned out to be a really troublesome bug to track down, and the fix was only 7 characters long.
gcc-4.2 wasn’t problematic, so that provided an interesting base. A bit of thinking ruled out some obvious parts of the kernel that would likely not be an issue. For instance, ARP appeared to be fine, so it likely wasn’t an issue at the network driver level. So rebuilding net/built-in.o on gcc-4.2, copying it to the gcc-4.3 tree, and rebuilding the tree using it resulted in a working kernel. Ok, excellent, we know it’s likely an issue in net/, what else can we rule out, and what do we know? ICMP is affected, so it’s probably not a TCP problem… Rebuilding the kernel with IPv6 enabled, and testing ping6? Ok, works. So it looks like an IPv4 issue. Test that assertion… yup, net/ipv4/built-in.o compiled with gcc-4.2 works.
Ok, peachy. What now? Bisecting the contents of the directory results in ip_output.c being the problematic file… Not all that helpfully, the differences between the code generated by 4.2 and 4.3 are extensive. Well, ok, let’s try turning off a variety of the added compiler options… nope, no luck, but the file works when compiled at -O0.
Next, I bisected the file until I found the problematic function (unfortunately for me starting in the middle, it ended up being the first function in the file.) Which was in a chain of inline functions, eventually calling some architecture specific inline assembly for ip_fast_csum. Ok, that looks like our canary in the coal mine… Look it up… SIGH. The routine touches memory, but didn’t have “memory” in the list of clobbers.
Adding the clobber and everything magically works again. Wasn’t that a party?
[ Of course, this makes things sound almost easy, except that at around 10 minutes to boot a test kernel, reboot, and boot a working kernel, this gets extremely tedious. I also didn't bother to talk about how many hours I probably wasted twiddling compiler flags trying to figure out which optimization pass might be broken... ]
Later, on the trail, a spotted towhee burst out of a tree and flew
past us. Then a small woodpecker emerged from the
same cluster of branches the towhee had just left. As we drew nearer
we could hear quite a commotion up in the branches ... a dozen or more
small birds, mostly chickadees, chattering and darting in and out
like bees around a hive. It seemed centered on ... that unmoving
spot there ... wait, doesn't it look a bit owl-shaped to you?
And as long as I'm posting nature pictures, the bullfrogs are back
at the Walden West Scum Lake. Just floatin' there, though ... they
weren't making any noise or moving around.