[PATCH v4 29/46] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C
Alexey Kardashevskiy
aik at ozlabs.ru
Fri Apr 2 15:36:32 AEDT 2021
On 01/04/2021 21:35, Nicholas Piggin wrote:
> Excerpts from Alexey Kardashevskiy's message of April 1, 2021 3:30 pm:
>>
>>
>> On 3/23/21 12:02 PM, Nicholas Piggin wrote:
>>> Almost all logic is moved to C, by introducing a new in_guest mode that
>>> selects and branches very early in the interrupt handler to the P9 exit
>>> code.
>
> [...]
>
>>> +/*
>>> + * kvmppc_p9_exit_hcall and kvmppc_p9_exit_interrupt are branched to from
>>> + * above if the interrupt was taken for a guest that was entered via
>>> + * kvmppc_p9_enter_guest().
>>> + *
>>> + * This code recovers the host stack and vcpu pointer, saves all GPRs and
>>> + * CR, LR, CTR, XER as well as guest MSR and NIA into the VCPU, then re-
>>> + * establishes the host stack and registers to return from the
>>> + * kvmppc_p9_enter_guest() function.
>>
>> What does "this code" refer to? If it is the asm below, then it does not
>> save CTR, it is in the c code. Otherwise it is confusing (to me) :)
>
> Yes you're right, CTR is saved in C.
>
>>> + */
>>> +.balign IFETCH_ALIGN_BYTES
>>> +kvmppc_p9_exit_hcall:
>>> + mfspr r11,SPRN_SRR0
>>> + mfspr r12,SPRN_SRR1
>>> + li r10,0xc00
>>> + std r10,HSTATE_SCRATCH0(r13)
>>> +
>>> +.balign IFETCH_ALIGN_BYTES
>>> +kvmppc_p9_exit_interrupt:
>
> [...]
>
>>> +static inline void slb_invalidate(unsigned int ih)
>>> +{
>>> + asm volatile("slbia %0" :: "i"(ih));
>>> +}
>>
>> This one is not used.
>
> It gets used in a later patch, I guess I should move it there.
>
> [...]
>
>>> +int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu)
>>> +{
>>> + u64 *exsave;
>>> + unsigned long msr = mfmsr();
>>> + int trap;
>>> +
>>> + start_timing(vcpu, &vcpu->arch.rm_entry);
>>> +
>>> + vcpu->arch.ceded = 0;
>>> +
>>> + WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_HV);
>>> + WARN_ON_ONCE(!(vcpu->arch.shregs.msr & MSR_ME));
>>> +
>>> + mtspr(SPRN_HSRR0, vcpu->arch.regs.nip);
>>> + mtspr(SPRN_HSRR1, (vcpu->arch.shregs.msr & ~MSR_HV) | MSR_ME);
>>> +
>>> + /*
>>> + * On POWER9 DD2.1 and below, sometimes on a Hypervisor Data Storage
>>> + * Interrupt (HDSI) the HDSISR is not be updated at all.
>>> + *
>>> + * To work around this we put a canary value into the HDSISR before
>>> + * returning to a guest and then check for this canary when we take a
>>> + * HDSI. If we find the canary on a HDSI, we know the hardware didn't
>>> + * update the HDSISR. In this case we return to the guest to retake the
>>> + * HDSI which should correctly update the HDSISR the second time HDSI
>>> + * entry.
>>> + *
>>> + * Just do this on all p9 processors for now.
>>> + */
>>> + mtspr(SPRN_HDSISR, HDSISR_CANARY);
>>> +
>>> + accumulate_time(vcpu, &vcpu->arch.guest_time);
>>> +
>>> + local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_GUEST_HV_FAST;
>>> + kvmppc_p9_enter_guest(vcpu);
>>> + // Radix host and guest means host never runs with guest MMU state
>>> + local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_NONE;
>>> +
>>> + accumulate_time(vcpu, &vcpu->arch.rm_intr);
>>> +
>>> + /* Get these from r11/12 and paca exsave */
>>> + vcpu->arch.shregs.srr0 = mfspr(SPRN_SRR0);
>>> + vcpu->arch.shregs.srr1 = mfspr(SPRN_SRR1);
>>> + vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
>>> + vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
>>> +
>>> + /* 0x2 bit for HSRR is only used by PR and P7/8 HV paths, clear it */
>>> + trap = local_paca->kvm_hstate.scratch0 & ~0x2;
>>> + if (likely(trap > BOOK3S_INTERRUPT_MACHINE_CHECK)) {
>>> + exsave = local_paca->exgen;
>>> + } else if (trap == BOOK3S_INTERRUPT_SYSTEM_RESET) {
>>> + exsave = local_paca->exnmi;
>>> + } else { /* trap == 0x200 */
>>> + exsave = local_paca->exmc;
>>> + }
>>> +
>>> + vcpu->arch.regs.gpr[1] = local_paca->kvm_hstate.scratch1;
>>> + vcpu->arch.regs.gpr[3] = local_paca->kvm_hstate.scratch2;
>>> + vcpu->arch.regs.gpr[9] = exsave[EX_R9/sizeof(u64)];
>>> + vcpu->arch.regs.gpr[10] = exsave[EX_R10/sizeof(u64)];
>>> + vcpu->arch.regs.gpr[11] = exsave[EX_R11/sizeof(u64)];
>>> + vcpu->arch.regs.gpr[12] = exsave[EX_R12/sizeof(u64)];
>>> + vcpu->arch.regs.gpr[13] = exsave[EX_R13/sizeof(u64)];
>>> + vcpu->arch.ppr = exsave[EX_PPR/sizeof(u64)];
>>> + vcpu->arch.cfar = exsave[EX_CFAR/sizeof(u64)];
>>> + vcpu->arch.regs.ctr = exsave[EX_CTR/sizeof(u64)];
>>> +
>>> + vcpu->arch.last_inst = KVM_INST_FETCH_FAILED;
>>> +
>>> + if (unlikely(trap == BOOK3S_INTERRUPT_MACHINE_CHECK)) {
>>> + vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
>>> + vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
>>> + kvmppc_realmode_machine_check(vcpu);
>>> +
>>> + } else if (unlikely(trap == BOOK3S_INTERRUPT_HMI)) {
>>> + kvmppc_realmode_hmi_handler();
>>> +
>>> + } else if (trap == BOOK3S_INTERRUPT_H_EMUL_ASSIST) {
>>> + vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
>>> +
>>> + } else if (trap == BOOK3S_INTERRUPT_H_DATA_STORAGE) {
>>> + vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
>>> + vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
>>> + vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
>>> +
>>> + } else if (trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
>>> + vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
>>> +
>>> + } else if (trap == BOOK3S_INTERRUPT_H_FAC_UNAVAIL) {
>>> + vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
>>> +
>>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>> + /*
>>> + * Softpatch interrupt for transactional memory emulation cases
>>> + * on POWER9 DD2.2. This is early in the guest exit path - we
>>> + * haven't saved registers or done a treclaim yet.
>>> + */
>>> + } else if (trap == BOOK3S_INTERRUPT_HV_SOFTPATCH) {
>>> + vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
>>> +
>>> + /*
>>> + * The cases we want to handle here are those where the guest
>>> + * is in real suspend mode and is trying to transition to
>>> + * transactional mode.
>>> + */
>>> + if (local_paca->kvm_hstate.fake_suspend &&
>>> + (vcpu->arch.shregs.msr & MSR_TS_S)) {
>>> + if (kvmhv_p9_tm_emulation_early(vcpu)) {
>>> + /* Prevent it being handled again. */
>>> + trap = 0;
>>> + }
>>> + }
>>> +#endif
>>> + }
>>> +
>>> + radix_clear_slb();
>>> +
>>> + __mtmsrd(msr, 0);
>>
>>
>> The asm code only sets RI but this potentially sets more bits including
>> MSR_EE, is it expected to be 0 when __kvmhv_vcpu_entry_p9() is called?
>
> Yes.
>
>>> + mtspr(SPRN_CTRLT, 1);
>>
>> What is this for? ISA does not shed much light:
>> ===
>> 63 RUN This bit controls an external I/O pin.
>> ===
>
> I don't think it even does that these days. It interacts with the PMU.
> I was looking whether it's feasible to move it into PMU code entirely,
> but apparently some tool or something might sample it. I'm a bit
> suspicious about that because an untrusted guest could be running and
> claim not to so I don't know what said tool really achieves, but I'll
> go through that fight another day.
>
> But KVM has to set it to 1 at exit because Linux host has it set to 1
> except in CPU idle.
It this CTRLT setting a new thing or the asm does it too? I could not
spot it.
>>
>>
>>> +
>>> + accumulate_time(vcpu, &vcpu->arch.rm_exit);
>>
>> This should not compile without CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
>
> It has an ifdef wrapper so it should work (it does on my local tree
> which is slightly newer than what you have but I don't think I fixed
> anything around this recently).
You are absolutely right, my bad.
>
>>> +
>>> + end_timing(vcpu);
>>> +
>>> + return trap;
>>
>>
>> The asm does "For hash guest, read the guest SLB and save it away", this
>> code does not. Is this new fast-path-in-c only for radix-on-radix or
>> hash VMs are supported too?
>
> That asm code does not run for "guest_exit_short_path" case (aka the
> p9 path aka the fast path).
>
> Upstream code only supports radix host and radix guest in this path.
> The old path supports hash and radix. That's unchanged with this patch.
>
> After the series, the new path supports all P9 modes (hash/hash,
> radix/radix, and radix/hash), and the old path supports P7 and P8 only.
Thanks for clarification. Besides that CTRLT, I checked if the new c
code matches the old asm code (which made diving into ISA incredible fun
:) ) so fwiw
Reviewed-by: Alexey Kardashevskiy <aik at ozlabs.ru>
I'd really like to see longer commit logs clarifying all intended
changes but it is probably just me.
>
> Thanks,
> Nick
>
--
Alexey
More information about the Linuxppc-dev
mailing list