[PATCH kernel] prom_init: Fetch flatten device tree from the system firmware

Mon May 6 12:21:28 AEST 2019

On 03/05/2019 12:35, David Gibson wrote:
> On Fri, May 03, 2019 at 10:10:57AM +1000, Stewart Smith wrote:
>> David Gibson <david at gibson.dropbear.id.au> writes:
>>> On Wed, May 01, 2019 at 01:42:21PM +1000, Alexey Kardashevskiy wrote:
>>>> At the moment, on 256CPU + 256 PCI devices guest, it takes the guest
>>>> about 8.5sec to fetch the entire device tree via the client interface
>>>> as the DT is traversed twice - for strings blob and for struct blob.
>>>> Also, "getprop" is quite slow too as SLOF stores properties in a linked
>>>> list.
>>>>
>>>> However, since [1] SLOF builds flattened device tree (FDT) for another
>>>> purpose. [2] adds a new "fdt-fetch" client interface for the OS to fetch
>>>> the FDT.
>>>>
>>>> This tries the new method; if not supported, this falls back to
>>>> the old method.
>>>>
>>>> There is a change in the FDT layout - the old method produced
>>>> (reserved map, strings, structs), the new one receives only strings and
>>>> structs from the firmware and adds the final reserved map to the end,
>>>> so it is (fw reserved map, strings, structs, reserved map).
>>>> This still produces the same unflattened device tree.
>>>>
>>>> This merges the reserved map from the firmware into the kernel's reserved
>>>> map. At the moment SLOF generates an empty reserved map so this does not
>>>> change the existing behaviour in regard of reservations.
>>>>
>>>> This supports only v17 onward as only that version provides dt_struct_size
>>>> which works as "fdt-fetch" only produces v17 blobs.
>>>>
>>>> If "fdt-fetch" is not available, the old method of fetching the DT is used.
>>>>
>>>> [1] https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=e6fc84652c9c00
>>>> [2] https://git.qemu.org/?p=SLOF.git;a=commit;h=ecda95906930b80
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>>>
>>> Hrm.  I've gotta say I'm not terribly convinced that it's worth adding
>>> a new interface we'll need to maintain to save 8s on a somewhat
>>> contrived testcase.
>>
>> 256CPUs aren't that many anymore though. Although I guess that many PCI
>> devices is still a little uncommon.
> 
> Yeah, it was the PCI devices I was meaning, not the cpus.

Each node (device, cpu, memory/numa) has a dozen of properties, so any
500 nodes will slow booting down more or less equally.

> 
>> A 4 socket POWER8 or POWER9 can easily be that large, and a small test
>> kernel/userspace will boot in ~2.5-4 seconds. So it's possible that
>> the device tree fetch could be surprisingly non-trivial percentage of boot
>> time at least on some machines.
>>
>>
> 

-- 
Alexey