[PATCH v4 29/46] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C

Alexey Kardashevskiy aik at ozlabs.ru
Fri Apr 2 15:36:32 AEDT 2021



On 01/04/2021 21:35, Nicholas Piggin wrote:
> Excerpts from Alexey Kardashevskiy's message of April 1, 2021 3:30 pm:
>>
>>
>> On 3/23/21 12:02 PM, Nicholas Piggin wrote:
>>> Almost all logic is moved to C, by introducing a new in_guest mode that
>>> selects and branches very early in the interrupt handler to the P9 exit
>>> code.
> 
> [...]
> 
>>> +/*
>>> + * kvmppc_p9_exit_hcall and kvmppc_p9_exit_interrupt are branched to from
>>> + * above if the interrupt was taken for a guest that was entered via
>>> + * kvmppc_p9_enter_guest().
>>> + *
>>> + * This code recovers the host stack and vcpu pointer, saves all GPRs and
>>> + * CR, LR, CTR, XER as well as guest MSR and NIA into the VCPU, then re-
>>> + * establishes the host stack and registers to return from  the
>>> + * kvmppc_p9_enter_guest() function.
>>
>> What does "this code" refer to? If it is the asm below, then it does not
>> save CTR, it is in the c code. Otherwise it is confusing (to me) :)
> 
> Yes you're right, CTR is saved in C.
> 
>>> + */
>>> +.balign	IFETCH_ALIGN_BYTES
>>> +kvmppc_p9_exit_hcall:
>>> +	mfspr	r11,SPRN_SRR0
>>> +	mfspr	r12,SPRN_SRR1
>>> +	li	r10,0xc00
>>> +	std	r10,HSTATE_SCRATCH0(r13)
>>> +
>>> +.balign	IFETCH_ALIGN_BYTES
>>> +kvmppc_p9_exit_interrupt:
> 
> [...]
> 
>>> +static inline void slb_invalidate(unsigned int ih)
>>> +{
>>> +	asm volatile("slbia %0" :: "i"(ih));
>>> +}
>>
>> This one is not used.
> 
> It gets used in a later patch, I guess I should move it there.
> 
> [...]
> 
>>> +int __kvmhv_vcpu_entry_p9(struct kvm_vcpu *vcpu)
>>> +{
>>> +	u64 *exsave;
>>> +	unsigned long msr = mfmsr();
>>> +	int trap;
>>> +
>>> +	start_timing(vcpu, &vcpu->arch.rm_entry);
>>> +
>>> +	vcpu->arch.ceded = 0;
>>> +
>>> +	WARN_ON_ONCE(vcpu->arch.shregs.msr & MSR_HV);
>>> +	WARN_ON_ONCE(!(vcpu->arch.shregs.msr & MSR_ME));
>>> +
>>> +	mtspr(SPRN_HSRR0, vcpu->arch.regs.nip);
>>> +	mtspr(SPRN_HSRR1, (vcpu->arch.shregs.msr & ~MSR_HV) | MSR_ME);
>>> +
>>> +	/*
>>> +	 * On POWER9 DD2.1 and below, sometimes on a Hypervisor Data Storage
>>> +	 * Interrupt (HDSI) the HDSISR is not be updated at all.
>>> +	 *
>>> +	 * To work around this we put a canary value into the HDSISR before
>>> +	 * returning to a guest and then check for this canary when we take a
>>> +	 * HDSI. If we find the canary on a HDSI, we know the hardware didn't
>>> +	 * update the HDSISR. In this case we return to the guest to retake the
>>> +	 * HDSI which should correctly update the HDSISR the second time HDSI
>>> +	 * entry.
>>> +	 *
>>> +	 * Just do this on all p9 processors for now.
>>> +	 */
>>> +	mtspr(SPRN_HDSISR, HDSISR_CANARY);
>>> +
>>> +	accumulate_time(vcpu, &vcpu->arch.guest_time);
>>> +
>>> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_GUEST_HV_FAST;
>>> +	kvmppc_p9_enter_guest(vcpu);
>>> +	// Radix host and guest means host never runs with guest MMU state
>>> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_NONE;
>>> +
>>> +	accumulate_time(vcpu, &vcpu->arch.rm_intr);
>>> +
>>> +	/* Get these from r11/12 and paca exsave */
>>> +	vcpu->arch.shregs.srr0 = mfspr(SPRN_SRR0);
>>> +	vcpu->arch.shregs.srr1 = mfspr(SPRN_SRR1);
>>> +	vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
>>> +	vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
>>> +
>>> +	/* 0x2 bit for HSRR is only used by PR and P7/8 HV paths, clear it */
>>> +	trap = local_paca->kvm_hstate.scratch0 & ~0x2;
>>> +	if (likely(trap > BOOK3S_INTERRUPT_MACHINE_CHECK)) {
>>> +		exsave = local_paca->exgen;
>>> +	} else if (trap == BOOK3S_INTERRUPT_SYSTEM_RESET) {
>>> +		exsave = local_paca->exnmi;
>>> +	} else { /* trap == 0x200 */
>>> +		exsave = local_paca->exmc;
>>> +	}
>>> +
>>> +	vcpu->arch.regs.gpr[1] = local_paca->kvm_hstate.scratch1;
>>> +	vcpu->arch.regs.gpr[3] = local_paca->kvm_hstate.scratch2;
>>> +	vcpu->arch.regs.gpr[9] = exsave[EX_R9/sizeof(u64)];
>>> +	vcpu->arch.regs.gpr[10] = exsave[EX_R10/sizeof(u64)];
>>> +	vcpu->arch.regs.gpr[11] = exsave[EX_R11/sizeof(u64)];
>>> +	vcpu->arch.regs.gpr[12] = exsave[EX_R12/sizeof(u64)];
>>> +	vcpu->arch.regs.gpr[13] = exsave[EX_R13/sizeof(u64)];
>>> +	vcpu->arch.ppr = exsave[EX_PPR/sizeof(u64)];
>>> +	vcpu->arch.cfar = exsave[EX_CFAR/sizeof(u64)];
>>> +	vcpu->arch.regs.ctr = exsave[EX_CTR/sizeof(u64)];
>>> +
>>> +	vcpu->arch.last_inst = KVM_INST_FETCH_FAILED;
>>> +
>>> +	if (unlikely(trap == BOOK3S_INTERRUPT_MACHINE_CHECK)) {
>>> +		vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
>>> +		vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
>>> +		kvmppc_realmode_machine_check(vcpu);
>>> +
>>> +	} else if (unlikely(trap == BOOK3S_INTERRUPT_HMI)) {
>>> +		kvmppc_realmode_hmi_handler();
>>> +
>>> +	} else if (trap == BOOK3S_INTERRUPT_H_EMUL_ASSIST) {
>>> +		vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
>>> +
>>> +	} else if (trap == BOOK3S_INTERRUPT_H_DATA_STORAGE) {
>>> +		vcpu->arch.fault_dar = exsave[EX_DAR/sizeof(u64)];
>>> +		vcpu->arch.fault_dsisr = exsave[EX_DSISR/sizeof(u64)];
>>> +		vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
>>> +
>>> +	} else if (trap == BOOK3S_INTERRUPT_H_INST_STORAGE) {
>>> +		vcpu->arch.fault_gpa = mfspr(SPRN_ASDR);
>>> +
>>> +	} else if (trap == BOOK3S_INTERRUPT_H_FAC_UNAVAIL) {
>>> +		vcpu->arch.hfscr = mfspr(SPRN_HFSCR);
>>> +
>>> +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>> +	/*
>>> +	 * Softpatch interrupt for transactional memory emulation cases
>>> +	 * on POWER9 DD2.2.  This is early in the guest exit path - we
>>> +	 * haven't saved registers or done a treclaim yet.
>>> +	 */
>>> +	} else if (trap == BOOK3S_INTERRUPT_HV_SOFTPATCH) {
>>> +		vcpu->arch.emul_inst = mfspr(SPRN_HEIR);
>>> +
>>> +		/*
>>> +		 * The cases we want to handle here are those where the guest
>>> +		 * is in real suspend mode and is trying to transition to
>>> +		 * transactional mode.
>>> +		 */
>>> +		if (local_paca->kvm_hstate.fake_suspend &&
>>> +				(vcpu->arch.shregs.msr & MSR_TS_S)) {
>>> +			if (kvmhv_p9_tm_emulation_early(vcpu)) {
>>> +				/* Prevent it being handled again. */
>>> +				trap = 0;
>>> +			}
>>> +		}
>>> +#endif
>>> +	}
>>> +
>>> +	radix_clear_slb();
>>> +
>>> +	__mtmsrd(msr, 0);
>>
>>
>> The asm code only sets RI but this potentially sets more bits including
>> MSR_EE, is it expected to be 0 when __kvmhv_vcpu_entry_p9() is called?
> 
> Yes.
> 
>>> +	mtspr(SPRN_CTRLT, 1);
>>
>> What is this for? ISA does not shed much light:
>> ===
>> 63 RUN This  bit  controls  an  external  I/O  pin.
>> ===
> 
> I don't think it even does that these days. It interacts with the PMU.
> I was looking whether it's feasible to move it into PMU code entirely,
> but apparently some tool or something might sample it. I'm a bit
> suspicious about that because an untrusted guest could be running and
> claim not to so I don't know what said tool really achieves, but I'll
> go through that fight another day.
> 
> But KVM has to set it to 1 at exit because Linux host has it set to 1
> except in CPU idle.


It this CTRLT setting a new thing or the asm does it too? I could not 
spot it.

>>
>>
>>> +
>>> +	accumulate_time(vcpu, &vcpu->arch.rm_exit);
>>
>> This should not compile without CONFIG_KVM_BOOK3S_HV_EXIT_TIMING.
> 
> It has an ifdef wrapper so it should work (it does on my local tree
> which is slightly newer than what you have but I don't think I fixed
> anything around this recently).


You are absolutely right, my bad.

> 
>>> +
>>> +	end_timing(vcpu);
>>> +
>>> +	return trap;
>>
>>
>> The asm does "For hash guest, read the guest SLB and save it away", this
>> code does not. Is this new fast-path-in-c only for radix-on-radix or
>> hash VMs are supported too?
> 
> That asm code does not run for "guest_exit_short_path" case (aka the
> p9 path aka the fast path).
> 
> Upstream code only supports radix host and radix guest in this path.
> The old path supports hash and radix. That's unchanged with this patch.
> 
> After the series, the new path supports all P9 modes (hash/hash,
> radix/radix, and radix/hash), and the old path supports P7 and P8 only.


Thanks for clarification. Besides that CTRLT, I checked if the new c 
code matches the old asm code (which made diving into ISA incredible fun 
:) ) so fwiw

Reviewed-by: Alexey Kardashevskiy <aik at ozlabs.ru>


I'd really like to see longer commit logs clarifying all intended 
changes but it is probably just me.


> 
> Thanks,
> Nick
> 

-- 
Alexey


More information about the Linuxppc-dev mailing list