[PATCH kernel] prom_init: Fetch flatten device tree from the system firmware
Stewart Smith
stewart at linux.ibm.com
Fri May 3 10:10:57 AEST 2019
David Gibson <david at gibson.dropbear.id.au> writes:
> On Wed, May 01, 2019 at 01:42:21PM +1000, Alexey Kardashevskiy wrote:
>> At the moment, on 256CPU + 256 PCI devices guest, it takes the guest
>> about 8.5sec to fetch the entire device tree via the client interface
>> as the DT is traversed twice - for strings blob and for struct blob.
>> Also, "getprop" is quite slow too as SLOF stores properties in a linked
>> list.
>>
>> However, since [1] SLOF builds flattened device tree (FDT) for another
>> purpose. [2] adds a new "fdt-fetch" client interface for the OS to fetch
>> the FDT.
>>
>> This tries the new method; if not supported, this falls back to
>> the old method.
>>
>> There is a change in the FDT layout - the old method produced
>> (reserved map, strings, structs), the new one receives only strings and
>> structs from the firmware and adds the final reserved map to the end,
>> so it is (fw reserved map, strings, structs, reserved map).
>> This still produces the same unflattened device tree.
>>
>> This merges the reserved map from the firmware into the kernel's reserved
>> map. At the moment SLOF generates an empty reserved map so this does not
>> change the existing behaviour in regard of reservations.
>>
>> This supports only v17 onward as only that version provides dt_struct_size
>> which works as "fdt-fetch" only produces v17 blobs.
>>
>> If "fdt-fetch" is not available, the old method of fetching the DT is used.
>>
>> [1] https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=e6fc84652c9c00
>> [2] https://git.qemu.org/?p=SLOF.git;a=commit;h=ecda95906930b80
>>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>
> Hrm. I've gotta say I'm not terribly convinced that it's worth adding
> a new interface we'll need to maintain to save 8s on a somewhat
> contrived testcase.
256CPUs aren't that many anymore though. Although I guess that many PCI
devices is still a little uncommon.
A 4 socket POWER8 or POWER9 can easily be that large, and a small test
kernel/userspace will boot in ~2.5-4 seconds. So it's possible that
the device tree fetch could be surprisingly non-trivial percentage of boot
time at least on some machines.
--
Stewart Smith
OPAL Architect, IBM.
More information about the Linuxppc-dev
mailing list