[SLOF] [Qemu-ppc] [PATCH v4] board-qemu: add private hcall to inform host on "phandle" update

David Gibson david at gibson.dropbear.id.au
Wed Sep 27 16:15:43 AEST 2017

On Sun, Sep 10, 2017 at 07:14:14PM +0100, Mark Cave-Ayland wrote:
> On 09/09/17 07:48, David Gibson wrote:
> > On Fri, Sep 08, 2017 at 03:00:36PM +0100, Mark Cave-Ayland wrote:
> >> On 08/09/17 14:20, Greg Kurz wrote:
> >>> On Fri, 8 Sep 2017 13:51:24 +0100
> >>> Mark Cave-Ayland <mark.cave-ayland at ilande.co.uk> wrote:
> >>>
> >>>> On 08/09/17 12:59, David Gibson wrote:
> >>>>
> >>>>>> If you're looking for a way to reference a node outside of OF then the
> >>>>>> only way to consistently do this is via an OF path. What if when the DT
> >>>>>> blob for PHB was created in QEMU you create a fake interrupt-parent-path
> >>>>>> string property containing the OF path to the interrupt controller, and
> >>>>>> move the generation of interrupt-map to SLOF?  
> >>>>>   
> >>>>>> In SLOF you could then do something like below to get the phandle from
> >>>>>> the OF path:
> >>>>>> "interrupt-parent-path" get-package-property dev ihandle>phandle
> >>>>>> and from there, substituting the phandle into interrupt-map is trivial.  
> >>>>>
> >>>>> Nope.  At the time of hotplug, SLOF no longer exists - it's handed
> >>>>> over to the guest.  
> >>>>
> >>>> Yes, I understand that. This would be the process for getting the
> >>>> initial DT information to SLOF to generate interrupt-map upon boot.
> >>>>
> >>>>>> Similarly for the guest, it should be easy to iterate over the kernel DT
> >>>>>> to locate the interrupt controller device based upon OF path, and then
> >>>>>> use the interrupt-map information to update its routing information for
> >>>>>> the hotplugged PHB accordingly.  
> >>>>>
> >>>>> That requires a non-PAPR-compliant guest change.  Existing guests
> >>>>> already support this when running under PowerVM.  
> >>>>
> >>>> My understanding from the thread was that hotplugging PHBs is a new
> >>>> feature? In that case the transition is simple: if the
> >>>
> >>> The feature is mentioned in the PAPR spec but not yet implemented in QEMU.
> >>
> >> Meh. So in that case if this hacking of phandles is already part of the
> >> PAPR specification, I guess we are too late :(
> > 
> > Well, yes and no.  In PAPR the hotplug handling is framed in terms of
> > RTAS requests - the runtime portion of the guest firmware.
> > 
> > PowerVM has its own (proprietary) guest OF implementation.  Its
> > version of RTAS is a reasonably substantial piece of software that has
> > access to the device tree built by the boot-time portion of OF.  That
> > way its able to generate suitable DT fragments for plugged PHBs,
> > including phandle referencees.
> > 
> > Now hotplug clearly requires communication with the hypervisor, not
> > just guest firmware; and in fact that's true of nearly everything RTAS
> > does.  How the RTAS <-> hypervisor communication happens is not
> > specified by PAPR, and I don't know how the PowerVM implementation
> > does so.
> > 
> > For qemu/KVM, we decided - and I'm confident we were right to do so -
> > that having separate hypervisor <-> RTAS and RTAS <-> guest OS
> > protocols was silly.  So, our RTAS is a miniscule (literally 20 bytes
> > long) shim which simply forwards all RTAS requests to the hypervisor
> > (i.e. qemu).
> > 
> > This makes life much easier: it means we don't need to invent an
> > RTAS<->hypervisor protocol (for this and many other situations.  It
> > means we don't need to worry about updating such a protocol in sync
> > between the components.  It means we don't need a complicated piece of
> > RTAS code to be compiled with a guest-targetting toolchain.  It means
> > we don't need to jump through toolchain hoops to make code that's
> > relocatable and callable using the somewhat weird conventions that
> > RTAS uses.
> > 
> > But, it means the RTAS calls implemented in qemu don't have access to
> > the ouput-from-SLOF version of the device tree.
> > 
> > So, how do we address that?
> > 
> > One option is the one proposed earlier in the thread: a special
> > hypercall lets OF update qemu with the phandles of nodes as it
> > allocates them.  For now - and very likely, forever - the changed
> > phandles between the qemu generated "seed" tree and the OF-output tree
> > are the only changes that matter to us.
> > 
> > Another approach would be to snapshot the OF tree at the point we
> > instantiate RTAS.  We could either do that by having another special
> > hcall which lets OF report the whole revised tree to qemu.  Or we
> > could just have it dump it as FDT at a known location within the RTAS
> > blob (expanding it as necessary, obviously).
> Thanks for detailed explanation. I think having access to the DT would
> be the safest option because some device properties may be generated
> outside of QEMU within OF, e.g. by FCode ROMs which can also make
> changes to the DT. And presumably this would then work regardless of the
> device being hotplugged.

Heh, so Greg's voted for the existing phandle-update proposal based on
simplicity, you've voted for the give DT to qemu proposal based on

I've also discussed this with Alexey and Michael Ellerman who had some
thoughts.  Here's what I propose as our way forwards.

On the qemu side

1. Implement a new vendor specific hcall KVMPPC_H_UPDATE_DT which
   takes a single parameter, pointing at an FDT blob.

2. On H_UPDATE_DT, qemu will sanity check the provided fdt blob and
   store it away in the MachineState (this will need to be migrated).

3. When qemu needs the xics phandle for PHB hotplug it will look it up
   in the DT supplied to it by H_UPDATE_DT, if available.  If it's not
   available it will fall back to PHANDLE_XICP and hope for the best
   (this is for corner/test cases where the user bypasses SLOF and
    boots directly into a kernel - kvm-unit-tests may want to use this

4. If we ever need more information from the DT which SLOF might have
   altered in qemu, we can get it from the same place

On the guest side

1. Typically the last thing the guest does with OF before killing it
   off is to call the (quiesce) word.

2. At quiesce time, SLOF will linearize it's current version of the DT
   and submit it to H_UPDATE_DT

David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/slof/attachments/20170927/46f8e269/attachment.sig>

More information about the SLOF mailing list