[RFC/PATCH 2/3] pseries/kvm: Clear PSSCR[ESL|EC] bits before guest entry
Gautham R Shenoy
ego at linux.vnet.ibm.com
Fri Apr 3 20:31:03 AEDT 2020
On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote:
> Gautham R. Shenoy's on March 31, 2020 10:10 pm:
> > From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
> >
> > ISA v3.0 allows the guest to execute a stop instruction. For this, the
> > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before
> > scheduling in the guest vCPU.
> >
> > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits
> > set. This patch changes the behaviour to enter the guest with
> > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we
> > unconditionally clear these bits. Ideally this should be done
> > conditionally on platforms where the guest stop instruction has no
> > Bugs (starting POWER9 DD2.3).
>
> How will guests know that they can use this facility safely after your
> series? You need both DD2.3 and a patched KVM.
Yes, this is something that isn't addressed in this series (mentioned
in the cover letter), which is a POC demonstrating that the stop0lite
state in guest works.
However, to answer your question, this is the scheme that I had in
mind :
OPAL:
On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest"
Hypervisor Kernel:
1. If "idle-stop-guest" dt-cpu-feature is discovered, then
we set bool enable_guest_stop = true;
2. During KVM guest entry, clear PSSCR[ESL|EC] iff
enable_guest_stop == true.
3. In kvm_vm_ioctl_check_extension(), for a new capability
KVM_CAP_STOP, return true iff enable_guest_top == true.
QEMU:
Check with the hypervisor if KVM_CAP_STOP is present. If so,
indicate the presence to the guest via device tree.
Guest Kernel:
Check for the presence of guest stop state support in
device-tree. If available, enable the stop0lite in the cpuidle
driver.
We still have a challenge of migrating a guest which started on a
hypervisor supporting guest stop state to a hypervisor without it.
The target hypervisor should atleast have Patch 1 of this series, so
that we don't crash the guest.
>
> >
> > Signed-off-by: Gautham R. Shenoy <ego at linux.vnet.ibm.com>
> > ---
> > arch/powerpc/kvm/book3s_hv.c | 2 +-
> > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++------------
> > 2 files changed, 14 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index cdb7224..36d059a 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> > mtspr(SPRN_IC, vcpu->arch.ic);
> > mtspr(SPRN_PID, vcpu->arch.pid);
> >
> > - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC |
> > + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) |
> > (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG));
> >
> > mtspr(SPRN_HFSCR, vcpu->arch.hfscr);
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index dbc2fec..c2daec3 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> > mtspr SPRN_PID, r7
> > mtspr SPRN_WORT, r8
> > BEGIN_FTR_SECTION
> > + /* POWER9-only registers */
> > + ld r5, VCPU_TID(r4)
> > + ld r6, VCPU_PSSCR(r4)
> > + lbz r8, HSTATE_FAKE_SUSPEND(r13)
> > + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */
> > + andc r6, r6, r7
> > + rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
> > + ld r7, VCPU_HFSCR(r4)
> > + mtspr SPRN_TIDR, r5
> > + mtspr SPRN_PSSCR, r6
> > + mtspr SPRN_HFSCR, r7
> > +FTR_SECTION_ELSE
>
> Why did you move these around? Just because the POWER9 section became
> larger than the other?
Yes.
>
> That's a real wart in the instruction patching implementation, I think
> we can fix it by padding with nops in the macros.
>
> Can you just add the additional required nops to the top branch without
> changing them around for this patch, so it's easier to see what's going
> on? The end result will be the same after patching. Actually changing
> these around can have a slight unintended consequence in that code that
> runs before features were patched will execute the IF code. Not a
> problem here, but another reason why the instruction patching
> restriction is annoying.
Sure, I will repost this patch with additional nops instead of
moving them around.
>
> Thanks,
> Nick
>
> > /* POWER8-only registers */
> > ld r5, VCPU_TCSCR(r4)
> > ld r6, VCPU_ACOP(r4)
> > @@ -833,18 +845,7 @@ BEGIN_FTR_SECTION
> > mtspr SPRN_CSIGR, r7
> > mtspr SPRN_TACR, r8
> > nop
> > -FTR_SECTION_ELSE
> > - /* POWER9-only registers */
> > - ld r5, VCPU_TID(r4)
> > - ld r6, VCPU_PSSCR(r4)
> > - lbz r8, HSTATE_FAKE_SUSPEND(r13)
> > - oris r6, r6, PSSCR_EC at h /* This makes stop trap to HV */
> > - rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG
> > - ld r7, VCPU_HFSCR(r4)
> > - mtspr SPRN_TIDR, r5
> > - mtspr SPRN_PSSCR, r6
> > - mtspr SPRN_HFSCR, r7
> > -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
> > +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
> > 8:
> >
> > ld r5, VCPU_SPRG0(r4)
> > --
> > 1.9.4
> >
> >
--
Thanks and Regards
gautham.
More information about the Linuxppc-dev
mailing list