[RFC/PATCH 2/3] pseries/kvm: Clear PSSCR[ESL|EC] bits before guest entry

David Gibson david at gibson.dropbear.id.au
Wed Apr 8 12:29:57 AEST 2020


On Tue, Apr 07, 2020 at 06:55:26PM +0530, Gautham R Shenoy wrote:
> Hello David,
> 
> On Mon, Apr 06, 2020 at 07:58:19PM +1000, David Gibson wrote:
> > On Fri, Apr 03, 2020 at 03:01:03PM +0530, Gautham R Shenoy wrote:
> > > On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote:
> > > > Gautham R. Shenoy's on March 31, 2020 10:10 pm:
> > > > > From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
> > > > > 
> > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the
> > > > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before
> > > > > scheduling in the guest vCPU.
> > > > > 
> > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits
> > > > > set. This patch changes the behaviour to enter the guest with
> > > > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we
> > > > > unconditionally clear these bits. Ideally this should be done
> > > > > conditionally on platforms where the guest stop instruction has no
> > > > > Bugs (starting POWER9 DD2.3).
> > > > 
> > > > How will guests know that they can use this facility safely after your
> > > > series? You need both DD2.3 and a patched KVM.
> > > 
> > > 
> > > Yes, this is something that isn't addressed in this series (mentioned
> > > in the cover letter), which is a POC demonstrating that the stop0lite
> > > state in guest works.
> > > 
> > > However, to answer your question, this is the scheme that I had in
> > > mind :
> > > 
> > > OPAL:
> > >    On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest"
> > > 
> > > Hypervisor Kernel:
> > >     1. If "idle-stop-guest" dt-cpu-feature is discovered, then
> > >        we set bool enable_guest_stop = true;
> > > 
> > >     2. During KVM guest entry, clear PSSCR[ESL|EC] iff
> > >        enable_guest_stop == true.
> > > 
> > >     3. In kvm_vm_ioctl_check_extension(), for a new capability
> > >        KVM_CAP_STOP, return true iff enable_guest_top == true.
> > > 
> > > QEMU:
> > >    Check with the hypervisor if KVM_CAP_STOP is present. If so,
> > >    indicate the presence to the guest via device tree.
> > 
> > Nack.  Presenting different capabilities to the guest depending on
> > host capabilities (rather than explicit options) is never ok.  It
> > means that depending on the system you start on you may or may not be
> > able to migrate to other systems that you're supposed to be able to,
> 
> I agree that blocking migration for the unavailability of this feature
> is not desirable. Could you point me to some other capabilities in KVM
> which have been implemented via explicit options?

TBH, most of the options for the 'pseries' machine type are in this
category: cap-vsx, cap-dfp, cap-htm, a bunch related to various
Spectre mitigations, cap-hpt-max-page-size (maximum page size for hash
guests), cap-nested-hv, cap-large-decr, cap-fwnmi, resize-hpt (HPT
resizing extension), ic-mode (which irq controllers are available to
the guest).

> The ISA 3.0 allows the guest to execute the "stop" instruction.

So, this was a bug in DD2.2's implementation of the architecture?

> If the
> Hypervisor hasn't cleared the PSSCR[ESL|EC] then, guest executing the
> "stop" instruction in the causes a Hypervisor Facility Unavailable
> Exception, thus giving the hypervisor a chance to emulate the
> instruction. However, in the current code, when the hypervisor
> receives this exception, it sends a PROGKILL to the guest which
> results in crashing the guest.
> 
> Patch 1 of this series emulates wakeup from the "stop"
> instruction. Would the following scheme be ok?
> 
> OPAL:
> 	On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest"
> 
> Hypervisor Kernel:
> 
> 	   If "idle-stop-guest" dt feature is available, then, before
> 	   entering the guest, the hypervisor clears the PSSCR[EC|ESL]
> 	   bits allowing the guest to safely execute stop instruction.
> 
> 	   If "idle-stop-guest" dt feature is not available, then, the
> 	   Hypervisor sets the PSSCR[ESL|EC] bits, thereby causing a
> 	   guest "stop" instruction execution to trap back into the
> 	   hypervisor. We then emulate a wakeup from the stop
> 	   instruction (Patch 1 of this series).
> 
> Guest Kernel:
>       If (cpu_has_feature(CPU_FTR_ARCH_300)) only then use the
>       stop0lite cpuidle state.
> 
> This allows us to migrate the KVM guest across any POWER9
> Hypervisor. The minimal addition that the Hypervisor would need is
> Patch 1 of this series.

That could be workable.  Some caveats, though:

 * How does the latency of the trap-and-emulate compare to the guest
   using H_CEDE in the first place?  i.e. how big a negative impact
   will this have for guests running on DD2.2 hosts?

 * We'll only be able to enable this in a new qemu machine type
   version (say, pseries-5.1.0).  Otherwise a guest could start
   thinking it can use stop states, then be migrated to an older qemu
   or host kernel without the support and crash.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20200408/02d2c3b8/attachment-0002.sig>


More information about the Linuxppc-dev mailing list