[RFC/PATCH 2/3] pseries/kvm: Clear PSSCR[ESL|EC] bits before guest entry

Gautham R Shenoy ego at linux.vnet.ibm.com
Fri Apr 3 20:31:03 AEDT 2020


On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote:
> Gautham R. Shenoy's on March 31, 2020 10:10 pm:
> > From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
> > 
> > ISA v3.0 allows the guest to execute a stop instruction. For this, the
> > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before
> > scheduling in the guest vCPU.
> > 
> > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits
> > set. This patch changes the behaviour to enter the guest with
> > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we
> > unconditionally clear these bits. Ideally this should be done
> > conditionally on platforms where the guest stop instruction has no
> > Bugs (starting POWER9 DD2.3).
> 
> How will guests know that they can use this facility safely after your
> series? You need both DD2.3 and a patched KVM.


Yes, this is something that isn't addressed in this series (mentioned
in the cover letter), which is a POC demonstrating that the stop0lite
state in guest works.

However, to answer your question, this is the scheme that I had in
mind :

OPAL:
   On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest"

Hypervisor Kernel:
    1. If "idle-stop-guest" dt-cpu-feature is discovered, then
       we set bool enable_guest_stop = true;

    2. During KVM guest entry, clear PSSCR[ESL|EC] iff
       enable_guest_stop == true.

    3. In kvm_vm_ioctl_check_extension(), for a new capability
       KVM_CAP_STOP, return true iff enable_guest_top == true.

QEMU:
   Check with the hypervisor if KVM_CAP_STOP is present. If so,
   indicate the presence to the guest via device tree.

Guest Kernel:
   Check for the presence of guest stop state support in
   device-tree. If available, enable the stop0lite in the cpuidle
   driver. 
   

We still have a challenge of migrating a guest which started on a
hypervisor supporting guest stop state to a hypervisor without it.
The target hypervisor should atleast have Patch 1 of this series, so
that we don't crash the guest.

> 
> > 
> > Signed-off-by: Gautham R. Shenoy <ego at linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/kvm/book3s_hv.c            |  2 +-
> >  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++------------
> >  2 files changed, 14 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index cdb7224..36d059a 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> >  	mtspr(SPRN_IC, vcpu->arch.ic);
> >  	mtspr(SPRN_PID, vcpu->arch.pid);
> >  
> > -	mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
> > +	mtspr(SPRN_PSSCR, (vcpu->arch.psscr  & ~(PSSCR_EC | PSSCR_ESL)) |
> >  	      (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
> >  
> >  	mtspr(SPRN_HFSCR, vcpu->arch.hfscr);
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index dbc2fec..c2daec3 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> >  	mtspr	SPRN_PID, r7
> >  	mtspr	SPRN_WORT, r8
> >  BEGIN_FTR_SECTION
> > +	/* POWER9-only registers */
> > +	ld	r5, VCPU_TID(r4)
> > +	ld	r6, VCPU_PSSCR(r4)
> > +	lbz	r8, HSTATE_FAKE_SUSPEND(r13)
> > +	lis 	r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */
> > +	andc	r6, r6, r7
> > +	rldimi	r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
> > +	ld	r7, VCPU_HFSCR(r4)
> > +	mtspr	SPRN_TIDR, r5
> > +	mtspr	SPRN_PSSCR, r6
> > +	mtspr	SPRN_HFSCR, r7
> > +FTR_SECTION_ELSE
> 
> Why did you move these around? Just because the POWER9 section became
> larger than the other?

Yes.

> 
> That's a real wart in the instruction patching implementation, I think
> we can fix it by padding with nops in the macros.
> 
> Can you just add the additional required nops to the top branch without
> changing them around for this patch, so it's easier to see what's going
> on? The end result will be the same after patching. Actually changing
> these around can have a slight unintended consequence in that code that
> runs before features were patched will execute the IF code. Not a
> problem here, but another reason why the instruction patching 
> restriction is annoying.

Sure, I will repost this patch with additional nops instead of
moving them around.

> 
> Thanks,
> Nick
> 
> >  	/* POWER8-only registers */
> >  	ld	r5, VCPU_TCSCR(r4)
> >  	ld	r6, VCPU_ACOP(r4)
> > @@ -833,18 +845,7 @@ BEGIN_FTR_SECTION
> >  	mtspr	SPRN_CSIGR, r7
> >  	mtspr	SPRN_TACR, r8
> >  	nop
> > -FTR_SECTION_ELSE
> > -	/* POWER9-only registers */
> > -	ld	r5, VCPU_TID(r4)
> > -	ld	r6, VCPU_PSSCR(r4)
> > -	lbz	r8, HSTATE_FAKE_SUSPEND(r13)
> > -	oris	r6, r6, PSSCR_EC at h	/* This makes stop trap to HV */
> > -	rldimi	r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
> > -	ld	r7, VCPU_HFSCR(r4)
> > -	mtspr	SPRN_TIDR, r5
> > -	mtspr	SPRN_PSSCR, r6
> > -	mtspr	SPRN_HFSCR, r7
> > -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
> > +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
> >  8:
> >  
> >  	ld	r5, VCPU_SPRG0(r4)
> > -- 
> > 1.9.4
> > 
> > 

--
Thanks and Regards
gautham.


More information about the Linuxppc-dev mailing list