[PATCH 00/62] initrd: remove classic initrd support

Rob Landley rob at landley.net
Fri Sep 19 04:10:22 AEST 2025


On 9/17/25 13:00, Andy Lutomirski wrote:
> On Mon, Sep 15, 2025 at 10:09 AM Rob Landley <rob at landley.net> wrote:
> 
>> While you're at it, could you fix static/builtin initramfs so PID 1 has
>> a valid stdin/stdout/stderr?
>>
>> A static initramfs won't create /dev/console if the embedded initramfs
>> image doesn't contain it, which a non-root build can't mknod, so the
>> kernel plumbing won't see it dev in the directory we point it at unless
>> we build with root access.
> 
> I have no current insight as to whether there's a kernel issue here,

They fixed the behavior in one codepath. They left it broken in the 
other codepath. The kernel's behavior is inconsistent.

Look:

$ mkdir sub; cc --static -xc - <<<'int main() {puts("hello\n");if 
(fork()) reboot(0x01234567); for(;;);}' -o sub/init
$ (cd sub; cpio -o -H newc <<<init | gzip) > sub.cpio.gz
$ make allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n <<<'PANIC_TIMEOUT=1 
RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT SERIAL_8250 
SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER' | sed 
's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot 
-append console=ttyS0 -initrd sub.cpio.gz

You get a "hello" output near the end there. (You can add "quiet" to the 
-append but given that qemu can't NOT output its bios spam there's not 
much point.)

Now add INITRAMFS_SOURCE="sub" to the config and remove -initrd 
sub.cpio.gz from the qemu invocation:

$ make clean allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n 
<<<'PANIC_TIMEOUT=1 RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT 
SERIAL_8250 SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER 
INITRAMFS_SOURCE="sub"' | sed 's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot 
-append 'console=ttyS0'

No "hello" output, but it DOES shut down cleanly instead of giving you a 
panic trace so you know it ran the init binary.

All that changed was statically linking the initramfs instead of feeding 
it in through the initrd mechanism: the kernel behaves differently in 
those two codepaths, as I explained in the message you replied to.

(The above instructions assume an x86-64 host toolchain, poke me if you 
want arm64 instead...)

> but why are you trying to put actual device nodes in an actual
> filesystem as part of a build process?

I'm not. Doing that would require root access on the build machine to 
mknod in "sub" directory above. I build new images WITHOUT root access 
on the host.

There used to be a way to feed a the kernel config a text file listing 
what to make in the cpio file instead of just pointing it at a 
directory, and my old Aboriginal Linux build used that mechanism 
(generating such a file by hand, borrowing the kernel infrastructure but 
driving it manually) 15 years ago:

https://landley.net/aboriginal/about.html

https://github.com/landley/aboriginal/blob/master/sources/functions.sh#L403

But kernel commit 469e87e89fd6 broke that mechanism because somebody 
dunning-krugered it away ("I don't understand why we need this therefore 
nobody needs it"). I had a patch to unbreak it for a while:

https://landley.net/bin/mkroot/0.8.10/linux-patches/0011-gen_init_cpio-regression.patch

But as with so many patches, lkml wasn't interested. (I mostly post them 
so when copyright trolls try to rattle sabers I can point to an lkml web 
archive entry that got ignored, and explain precisely HOW much bad PR 
they're in for when they proceed.)

And again: you ONLY need this for static initramfs. Dynamic initramfs 
has code create /dev/console (at boot time, not build time):

https://github.com/torvalds/linux/blob/v6.16/init/noinitramfs.c#L27

That code ONLY gets called for the external initrd loader, it does NOT 
get called when a static initramfs image built into the kernel has a 
runnable /init. This is an inconsistency in the kernel behavior, which 
is what I'm objecting to.

> It's extremely straightforward
> to emit devices nodes in cpio format, and IMO it's far *more*
> straightforward to do that than to make a whole directory, try to get
> all the modes right, and cpio it up.

You mean like commit 595a22acee26 from 2017?

> I wrote an absolutely trivial tool for this several years ago:
> 
> https://github.com/amluto/virtme/blob/master/virtme/cpiowriter.py

Let's see, I wrote the initramfs documentation in 2005:

https://lwn.net/Articles/157676/

Was already correcting kernel developers on how it actually worked 
(rather than theoretically worked) in 2006:

https://lkml.iu.edu/hypermail//linux/kernel/0603.2/2760.html

I added tmpfs support to it in 2013 (because nobody else had bothered 
for EIGHT YEARS):

https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html

I've maintained my own cpio implementation in toybox for over a decade:

https://github.com/landley/toybox/commit/a2d558151a63

The successor to aboriginal (above) is a 400 line bash script that 
builds a dozen archtectures that each boot to a shell prompt in qemu:

https://github.com/landley/toybox/blob/master/mkroot/mkroot.sh
https://landley.net/bin/mkroot/latest/

With automated regression test infrastructure to boot them all under 
qemu and confirm that it runs, the clocks are set right, the network 
works, and it can read from -hda:

https://github.com/landley/toybox/blob/master/mkroot/testroot.sh

So yes I _can_ create my own bespoke C program to modify the file in 
arbitrary ways, I have my reasons not to do that, and have thought about 
them for a while now.

> it would be barely more complicated to strip the trailer off an cpio
> file from some other source, add some device nodes, and stick the
> trailer back on.

So you're unaware that the kernel accepts concatenated archives, and you 
can just cat together two cpio.gz files and they'll extract. (In gzip 
anyway, I haven't tested the other compression formats. That's why I 
needed to do https://github.com/landley/toybox/commit/dafb9211c777 and 
95a15d238120 by the way.)

The problem is there's no portable existing userspace tool to create a 
cpio archive from non-filesystem data. Partly because there WAS a 
mechanism built into the kernel... until that guy broke it in 2020. When 
I'm making a squashfs I've got the -p option (presumably modeled on what 
the kernel used to do before it broke), but the host cpio hasn't got a 
way to specify that and adding my own bespoke format to toybox... I'm 
still trying to get 
https://lists.gnu.org/archive/html/coreutils/2023-08/msg00009.html into 
coreutils. (Alas lkml isn't the only 30 year old community that's gotten 
stiff and hard of hearing.)

I could emit cpio contents with xxd -r from a HERE document hexdump or 
something to append to the generated file, but xxd isn't installed by 
default on debian and echo \x is WAY ugly, and "here's a giant hex dump 
you're not expected to understand" isn't really something I want to add 
to an otherwise understandable build. Writing, building, and running my 
own bespoke tool in C to do it isn't really an improvement over the hexdump.

The kernel ALMOST already does this. The code just needs to be 
refactored a bit, preferably so there aren't two codepaths each with 
half the testing.

> But it's also really, really, really easy to emit an
> entire, functioning cpio-formatted initramfs from plain user code with
> no filesystem manipulation at all.  This also makes that portion of
> the build reproducible, which is worth quite a bit IMO.

Sigh. When I started working on reproducible builds they weren't called 
that yet, but I don't think digging for more links would help here. I 
did do a rollup of what I'm trying to accomplish 5 years ago though 
http://lists.landley.net/pipermail/toybox-landley.net/2020-July/011898.html 
and long long ago, there was https://landley.net/aboriginal/history.html 
and...

Query: is your "plain user code" built with "cc"? Do you reliably have a 
"cc" link, or do you need to explicitly say "gcc" or "clang"? The kernel 
needs to do the latter for some reason, and my patch to GET to the 
kernel to at least _try_ "cc" before falling back to the others was 
explicitly rejected...

> --Andy

Rob


More information about the Linuxppc-dev mailing list