[SLOF] [Qemu-ppc] [PATCH v4] board-qemu: add private hcall to inform host on "phandle" update

Mon Sep 11 04:14:14 AEST 2017

On 09/09/17 07:48, David Gibson wrote:

> On Fri, Sep 08, 2017 at 03:00:36PM +0100, Mark Cave-Ayland wrote:
>> On 08/09/17 14:20, Greg Kurz wrote:
>>> On Fri, 8 Sep 2017 13:51:24 +0100
>>> Mark Cave-Ayland <mark.cave-ayland at ilande.co.uk> wrote:
>>>
>>>> On 08/09/17 12:59, David Gibson wrote:
>>>>
>>>>>> If you're looking for a way to reference a node outside of OF then the
>>>>>> only way to consistently do this is via an OF path. What if when the DT
>>>>>> blob for PHB was created in QEMU you create a fake interrupt-parent-path
>>>>>> string property containing the OF path to the interrupt controller, and
>>>>>> move the generation of interrupt-map to SLOF?  
>>>>>   
>>>>>> In SLOF you could then do something like below to get the phandle from
>>>>>> the OF path:
>>>>>> "interrupt-parent-path" get-package-property dev ihandle>phandle
>>>>>> and from there, substituting the phandle into interrupt-map is trivial.  
>>>>>
>>>>> Nope.  At the time of hotplug, SLOF no longer exists - it's handed
>>>>> over to the guest.  
>>>>
>>>> Yes, I understand that. This would be the process for getting the
>>>> initial DT information to SLOF to generate interrupt-map upon boot.
>>>>
>>>>>> Similarly for the guest, it should be easy to iterate over the kernel DT
>>>>>> to locate the interrupt controller device based upon OF path, and then
>>>>>> use the interrupt-map information to update its routing information for
>>>>>> the hotplugged PHB accordingly.  
>>>>>
>>>>> That requires a non-PAPR-compliant guest change.  Existing guests
>>>>> already support this when running under PowerVM.  
>>>>
>>>> My understanding from the thread was that hotplugging PHBs is a new
>>>> feature? In that case the transition is simple: if the
>>>
>>> The feature is mentioned in the PAPR spec but not yet implemented in QEMU.
>>
>> Meh. So in that case if this hacking of phandles is already part of the
>> PAPR specification, I guess we are too late :(
> 
> Well, yes and no.  In PAPR the hotplug handling is framed in terms of
> RTAS requests - the runtime portion of the guest firmware.
> 
> PowerVM has its own (proprietary) guest OF implementation.  Its
> version of RTAS is a reasonably substantial piece of software that has
> access to the device tree built by the boot-time portion of OF.  That
> way its able to generate suitable DT fragments for plugged PHBs,
> including phandle referencees.
> 
> Now hotplug clearly requires communication with the hypervisor, not
> just guest firmware; and in fact that's true of nearly everything RTAS
> does.  How the RTAS <-> hypervisor communication happens is not
> specified by PAPR, and I don't know how the PowerVM implementation
> does so.
> 
> For qemu/KVM, we decided - and I'm confident we were right to do so -
> that having separate hypervisor <-> RTAS and RTAS <-> guest OS
> protocols was silly.  So, our RTAS is a miniscule (literally 20 bytes
> long) shim which simply forwards all RTAS requests to the hypervisor
> (i.e. qemu).
> 
> This makes life much easier: it means we don't need to invent an
> RTAS<->hypervisor protocol (for this and many other situations.  It
> means we don't need to worry about updating such a protocol in sync
> between the components.  It means we don't need a complicated piece of
> RTAS code to be compiled with a guest-targetting toolchain.  It means
> we don't need to jump through toolchain hoops to make code that's
> relocatable and callable using the somewhat weird conventions that
> RTAS uses.
> 
> But, it means the RTAS calls implemented in qemu don't have access to
> the ouput-from-SLOF version of the device tree.
> 
> So, how do we address that?
> 
> One option is the one proposed earlier in the thread: a special
> hypercall lets OF update qemu with the phandles of nodes as it
> allocates them.  For now - and very likely, forever - the changed
> phandles between the qemu generated "seed" tree and the OF-output tree
> are the only changes that matter to us.
> 
> Another approach would be to snapshot the OF tree at the point we
> instantiate RTAS.  We could either do that by having another special
> hcall which lets OF report the whole revised tree to qemu.  Or we
> could just have it dump it as FDT at a known location within the RTAS
> blob (expanding it as necessary, obviously).

Thanks for detailed explanation. I think having access to the DT would
be the safest option because some device properties may be generated
outside of QEMU within OF, e.g. by FCode ROMs which can also make
changes to the DT. And presumably this would then work regardless of the
device being hotplugged.

ATB,

Mark.