[PATCH 00/62] initrd: remove classic initrd support
Rob Landley
rob at landley.net
Fri Sep 19 04:10:22 AEST 2025
On 9/17/25 13:00, Andy Lutomirski wrote:
> On Mon, Sep 15, 2025 at 10:09 AM Rob Landley <rob at landley.net> wrote:
>
>> While you're at it, could you fix static/builtin initramfs so PID 1 has
>> a valid stdin/stdout/stderr?
>>
>> A static initramfs won't create /dev/console if the embedded initramfs
>> image doesn't contain it, which a non-root build can't mknod, so the
>> kernel plumbing won't see it dev in the directory we point it at unless
>> we build with root access.
>
> I have no current insight as to whether there's a kernel issue here,
They fixed the behavior in one codepath. They left it broken in the
other codepath. The kernel's behavior is inconsistent.
Look:
$ mkdir sub; cc --static -xc - <<<'int main() {puts("hello\n");if
(fork()) reboot(0x01234567); for(;;);}' -o sub/init
$ (cd sub; cpio -o -H newc <<<init | gzip) > sub.cpio.gz
$ make allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n <<<'PANIC_TIMEOUT=1
RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT SERIAL_8250
SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER' | sed
's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot
-append console=ttyS0 -initrd sub.cpio.gz
You get a "hello" output near the end there. (You can add "quiet" to the
-append but given that qemu can't NOT output its bios spam there's not
much point.)
Now add INITRAMFS_SOURCE="sub" to the config and remove -initrd
sub.cpio.gz from the qemu invocation:
$ make clean allnoconfig KCONFIG_ALLCONFIG=<(tr ' ' \\n
<<<'PANIC_TIMEOUT=1 RD_GZIP BINFMT_ELF BLK_DEV_INITRD EARLY_PRINTK 64BIT
SERIAL_8250 SERIAL_8250_CONSOLE UNWINDER_FRAME_POINTER
INITRAMFS_SOURCE="sub"' | sed 's/^/CONFIG_/;/=/!s/$/=y/')
$ make -j $(nproc)
$ qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -no-reboot
-append 'console=ttyS0'
No "hello" output, but it DOES shut down cleanly instead of giving you a
panic trace so you know it ran the init binary.
All that changed was statically linking the initramfs instead of feeding
it in through the initrd mechanism: the kernel behaves differently in
those two codepaths, as I explained in the message you replied to.
(The above instructions assume an x86-64 host toolchain, poke me if you
want arm64 instead...)
> but why are you trying to put actual device nodes in an actual
> filesystem as part of a build process?
I'm not. Doing that would require root access on the build machine to
mknod in "sub" directory above. I build new images WITHOUT root access
on the host.
There used to be a way to feed a the kernel config a text file listing
what to make in the cpio file instead of just pointing it at a
directory, and my old Aboriginal Linux build used that mechanism
(generating such a file by hand, borrowing the kernel infrastructure but
driving it manually) 15 years ago:
https://landley.net/aboriginal/about.html
https://github.com/landley/aboriginal/blob/master/sources/functions.sh#L403
But kernel commit 469e87e89fd6 broke that mechanism because somebody
dunning-krugered it away ("I don't understand why we need this therefore
nobody needs it"). I had a patch to unbreak it for a while:
https://landley.net/bin/mkroot/0.8.10/linux-patches/0011-gen_init_cpio-regression.patch
But as with so many patches, lkml wasn't interested. (I mostly post them
so when copyright trolls try to rattle sabers I can point to an lkml web
archive entry that got ignored, and explain precisely HOW much bad PR
they're in for when they proceed.)
And again: you ONLY need this for static initramfs. Dynamic initramfs
has code create /dev/console (at boot time, not build time):
https://github.com/torvalds/linux/blob/v6.16/init/noinitramfs.c#L27
That code ONLY gets called for the external initrd loader, it does NOT
get called when a static initramfs image built into the kernel has a
runnable /init. This is an inconsistency in the kernel behavior, which
is what I'm objecting to.
> It's extremely straightforward
> to emit devices nodes in cpio format, and IMO it's far *more*
> straightforward to do that than to make a whole directory, try to get
> all the modes right, and cpio it up.
You mean like commit 595a22acee26 from 2017?
> I wrote an absolutely trivial tool for this several years ago:
>
> https://github.com/amluto/virtme/blob/master/virtme/cpiowriter.py
Let's see, I wrote the initramfs documentation in 2005:
https://lwn.net/Articles/157676/
Was already correcting kernel developers on how it actually worked
(rather than theoretically worked) in 2006:
https://lkml.iu.edu/hypermail//linux/kernel/0603.2/2760.html
I added tmpfs support to it in 2013 (because nobody else had bothered
for EIGHT YEARS):
https://lkml.iu.edu/hypermail/linux/kernel/1306.3/04204.html
I've maintained my own cpio implementation in toybox for over a decade:
https://github.com/landley/toybox/commit/a2d558151a63
The successor to aboriginal (above) is a 400 line bash script that
builds a dozen archtectures that each boot to a shell prompt in qemu:
https://github.com/landley/toybox/blob/master/mkroot/mkroot.sh
https://landley.net/bin/mkroot/latest/
With automated regression test infrastructure to boot them all under
qemu and confirm that it runs, the clocks are set right, the network
works, and it can read from -hda:
https://github.com/landley/toybox/blob/master/mkroot/testroot.sh
So yes I _can_ create my own bespoke C program to modify the file in
arbitrary ways, I have my reasons not to do that, and have thought about
them for a while now.
> it would be barely more complicated to strip the trailer off an cpio
> file from some other source, add some device nodes, and stick the
> trailer back on.
So you're unaware that the kernel accepts concatenated archives, and you
can just cat together two cpio.gz files and they'll extract. (In gzip
anyway, I haven't tested the other compression formats. That's why I
needed to do https://github.com/landley/toybox/commit/dafb9211c777 and
95a15d238120 by the way.)
The problem is there's no portable existing userspace tool to create a
cpio archive from non-filesystem data. Partly because there WAS a
mechanism built into the kernel... until that guy broke it in 2020. When
I'm making a squashfs I've got the -p option (presumably modeled on what
the kernel used to do before it broke), but the host cpio hasn't got a
way to specify that and adding my own bespoke format to toybox... I'm
still trying to get
https://lists.gnu.org/archive/html/coreutils/2023-08/msg00009.html into
coreutils. (Alas lkml isn't the only 30 year old community that's gotten
stiff and hard of hearing.)
I could emit cpio contents with xxd -r from a HERE document hexdump or
something to append to the generated file, but xxd isn't installed by
default on debian and echo \x is WAY ugly, and "here's a giant hex dump
you're not expected to understand" isn't really something I want to add
to an otherwise understandable build. Writing, building, and running my
own bespoke tool in C to do it isn't really an improvement over the hexdump.
The kernel ALMOST already does this. The code just needs to be
refactored a bit, preferably so there aren't two codepaths each with
half the testing.
> But it's also really, really, really easy to emit an
> entire, functioning cpio-formatted initramfs from plain user code with
> no filesystem manipulation at all. This also makes that portion of
> the build reproducible, which is worth quite a bit IMO.
Sigh. When I started working on reproducible builds they weren't called
that yet, but I don't think digging for more links would help here. I
did do a rollup of what I'm trying to accomplish 5 years ago though
http://lists.landley.net/pipermail/toybox-landley.net/2020-July/011898.html
and long long ago, there was https://landley.net/aboriginal/history.html
and...
Query: is your "plain user code" built with "cc"? Do you reliably have a
"cc" link, or do you need to explicitly say "gcc" or "clang"? The kernel
needs to do the latter for some reason, and my patch to GET to the
kernel to at least _try_ "cc" before falling back to the others was
explicitly rejected...
> --Andy
Rob
More information about the Linuxppc-dev
mailing list