[SLOF] [Qemu-ppc] [PATCH v4] board-qemu: add private hcall to inform host on "phandle" update

David Gibson david at gibson.dropbear.id.au
Sat Sep 9 16:48:33 AEST 2017

On Fri, Sep 08, 2017 at 03:00:36PM +0100, Mark Cave-Ayland wrote:
> On 08/09/17 14:20, Greg Kurz wrote:
> > On Fri, 8 Sep 2017 13:51:24 +0100
> > Mark Cave-Ayland <mark.cave-ayland at ilande.co.uk> wrote:
> > 
> >> On 08/09/17 12:59, David Gibson wrote:
> >>
> >>>> If you're looking for a way to reference a node outside of OF then the
> >>>> only way to consistently do this is via an OF path. What if when the DT
> >>>> blob for PHB was created in QEMU you create a fake interrupt-parent-path
> >>>> string property containing the OF path to the interrupt controller, and
> >>>> move the generation of interrupt-map to SLOF?  
> >>>   
> >>>> In SLOF you could then do something like below to get the phandle from
> >>>> the OF path:
> >>>> "interrupt-parent-path" get-package-property dev ihandle>phandle
> >>>> and from there, substituting the phandle into interrupt-map is trivial.  
> >>>
> >>> Nope.  At the time of hotplug, SLOF no longer exists - it's handed
> >>> over to the guest.  
> >>
> >> Yes, I understand that. This would be the process for getting the
> >> initial DT information to SLOF to generate interrupt-map upon boot.
> >>
> >>>> Similarly for the guest, it should be easy to iterate over the kernel DT
> >>>> to locate the interrupt controller device based upon OF path, and then
> >>>> use the interrupt-map information to update its routing information for
> >>>> the hotplugged PHB accordingly.  
> >>>
> >>> That requires a non-PAPR-compliant guest change.  Existing guests
> >>> already support this when running under PowerVM.  
> >>
> >> My understanding from the thread was that hotplugging PHBs is a new
> >> feature? In that case the transition is simple: if the
> > 
> > The feature is mentioned in the PAPR spec but not yet implemented in QEMU.
> Meh. So in that case if this hacking of phandles is already part of the
> PAPR specification, I guess we are too late :(

Well, yes and no.  In PAPR the hotplug handling is framed in terms of
RTAS requests - the runtime portion of the guest firmware.

PowerVM has its own (proprietary) guest OF implementation.  Its
version of RTAS is a reasonably substantial piece of software that has
access to the device tree built by the boot-time portion of OF.  That
way its able to generate suitable DT fragments for plugged PHBs,
including phandle referencees.

Now hotplug clearly requires communication with the hypervisor, not
just guest firmware; and in fact that's true of nearly everything RTAS
does.  How the RTAS <-> hypervisor communication happens is not
specified by PAPR, and I don't know how the PowerVM implementation
does so.

For qemu/KVM, we decided - and I'm confident we were right to do so -
that having separate hypervisor <-> RTAS and RTAS <-> guest OS
protocols was silly.  So, our RTAS is a miniscule (literally 20 bytes
long) shim which simply forwards all RTAS requests to the hypervisor
(i.e. qemu).

This makes life much easier: it means we don't need to invent an
RTAS<->hypervisor protocol (for this and many other situations.  It
means we don't need to worry about updating such a protocol in sync
between the components.  It means we don't need a complicated piece of
RTAS code to be compiled with a guest-targetting toolchain.  It means
we don't need to jump through toolchain hoops to make code that's
relocatable and callable using the somewhat weird conventions that
RTAS uses.

But, it means the RTAS calls implemented in qemu don't have access to
the ouput-from-SLOF version of the device tree.

So, how do we address that?

One option is the one proposed earlier in the thread: a special
hypercall lets OF update qemu with the phandles of nodes as it
allocates them.  For now - and very likely, forever - the changed
phandles between the qemu generated "seed" tree and the OF-output tree
are the only changes that matter to us.

Another approach would be to snapshot the OF tree at the point we
instantiate RTAS.  We could either do that by having another special
hcall which lets OF report the whole revised tree to qemu.  Or we
could just have it dump it as FDT at a known location within the RTAS
blob (expanding it as necessary, obviously).

David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/slof/attachments/20170909/b592aa16/attachment.sig>

More information about the SLOF mailing list