[PATCH v2 23/37] KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C

Sat Feb 27 11:21:04 AEDT 2021

Excerpts from Fabiano Rosas's message of February 27, 2021 8:37 am:
> Nicholas Piggin <npiggin at gmail.com> writes:
> 
> Hi, thanks for this. It helped me clarify a bunch of details that I
> haven't understood while reading the asm code.

Yes, me too :)

>> +/*
>> + * void kvmppc_p9_enter_guest(struct vcpu *vcpu);
>> + *
>> + * Enter the guest on a ISAv3.0 or later system where we have exactly
>> + * one vcpu per vcore, and both the host and guest are radix, and threads
>> + * are set to "indepdent mode".
>> + */
>> +.balign	IFETCH_ALIGN_BYTES
>> +_GLOBAL(kvmppc_p9_enter_guest)
>> +EXPORT_SYMBOL_GPL(kvmppc_p9_enter_guest)
>> +	mflr	r0
>> +	std	r0,PPC_LR_STKOFF(r1)
>> +	stdu	r1,-SFS(r1)
>> +
>> +	std	r1,HSTATE_HOST_R1(r13)
>> +	std	r3,STACK_SLOT_VCPU(r1)
> 
> The vcpu was stored on the paca previously. I don't get the change,
> could you explain?

The stack is a nicer place to keep things. We only need one place to 
save the stack, then most things can come from there. There's actually 
more paca stuff we could move into local variables or onto the stack
too.

It was probably done like this because PR KVM which probably can't 
access the stack in real mode when running in an LPAR, and came across 
to the HV code that way. New path doesn't require it.

>> +kvmppc_p9_exit_interrupt:
>> +	std     r1,HSTATE_SCRATCH1(r13)
>> +	std     r3,HSTATE_SCRATCH2(r13)
>> +	ld	r1,HSTATE_HOST_R1(r13)
>> +	ld	r3,STACK_SLOT_VCPU(r1)
>> +
>> +	std	r9,VCPU_CR(r3)
>> +
>> +1:
>> +	std	r11,VCPU_PC(r3)
>> +	std	r12,VCPU_MSR(r3)
>> +
>> +	reg = 14
>> +	.rept	18
>> +	std	reg,__VCPU_GPR(reg)(r3)
>> +	reg = reg + 1
>> +	.endr
>> +
>> +	/* r1, r3, r9-r13 are saved to vcpu by C code */
> 
> If we just saved r1 and r3, why don't we put them in the vcpu right now?
> That would avoid having the C code knowing about scratch areas.

Putting it in C avoids having the asm code know about scratch areas.

>> @@ -4429,11 +4432,19 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
>>  		else
>>  			r = kvmppc_run_vcpu(vcpu);
>>
>> -		if (run->exit_reason == KVM_EXIT_PAPR_HCALL &&
>> -		    !(vcpu->arch.shregs.msr & MSR_PR)) {
>> -			trace_kvm_hcall_enter(vcpu);
>> -			r = kvmppc_pseries_do_hcall(vcpu);
>> -			trace_kvm_hcall_exit(vcpu, r);
>> +		if (run->exit_reason == KVM_EXIT_PAPR_HCALL) {
>> +			if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
>> +				/*
>> +				 * Guest userspace executed sc 1, reflect it
>> +				 * back as a privileged program check interrupt.
>> +				 */
>> +				kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
>> +				r = RESUME_GUEST;
> 
> This is in conflict with this snippet in kvmppc_handle_exit_hv:
> 
> 	case BOOK3S_INTERRUPT_SYSCALL:
> 	{
> 		/* hcall - punt to userspace */
> 		int i;
> 
> 		/* hypercall with MSR_PR has already been handled in rmode,
> 		 * and never reaches here.
> 		 */
> 
> That function already queues some 0x700s so maybe we could move this one
> in there as well.

I don't think it conflicts, but I think perhaps it should go in the 
patch which removed the real mode handlers.

kvmppc_handle_exit_hv is used by both HV paths so for now it's a bit 
neater to try get things into the same state, but we could do this in a 
later patch perhaps.

>> +	 *
>> +	 * To work around this we put a canary value into the HDSISR before
>> +	 * returning to a guest and then check for this canary when we take a
>> +	 * HDSI. If we find the canary on a HDSI, we know the hardware didn't
>> +	 * update the HDSISR. In this case we return to the guest to retake the
>> +	 * HDSI which should correctly update the HDSISR the second time HDSI
>> +	 * entry.
>> +	 *
>> +	 * Just do this on all p9 processors for now.
>> +	 */
>> +	mtspr(SPRN_HDSISR, HDSISR_CANARY);
>> +
>> +	accumulate_time(vcpu, &vcpu->arch.guest_time);
>> +
>> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_GUEST_HV_FAST;
>> +	kvmppc_p9_enter_guest(vcpu);
>> +	// Radix host and guest means host never runs with guest MMU state
>> +	local_paca->kvm_hstate.in_guest = KVM_GUEST_MODE_NONE;
>> +
>> +	accumulate_time(vcpu, &vcpu->arch.rm_intr);
>> +
>> +	/* Get these from r11/12 and paca exsave */
>> +	vcpu->arch.shregs.srr0 = mfspr(SPRN_SRR0);
>> +	vcpu->arch.shregs.srr1 = mfspr(SPRN_SRR1);
>> +	vcpu->arch.shregs.dar = mfspr(SPRN_DAR);
>> +	vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR);
>> +
>> +	trap = local_paca->kvm_hstate.scratch0 & ~0x2;
> 
> Couldn't we return the trap from kvmppc_p9_enter_guest? Seems like a
> nice pattern to have and avoids exposing the IVEC+0x2 magic which is hidden
> in asm already.

I use the 0x2 test in C in a later patch.

The IVEC+0x2 thing might just go away entirely though for new HV path, 
but we'd still clear it in C on the principle of minimal asm code.

>> +	radix_clear_slb();
>> +
>> +	__mtmsrd(msr, 0);
> 
> Isn't this the same as mtmsr(msr) ?

Yes. For 64s you can use the __ variant though. We use it in other 
places in this function with L=1 so it doesn't make sense to mix 
variants.

Thanks,
Nick

> 
>> +
>> +	accumulate_time(vcpu, &vcpu->arch.rm_exit);
>> +
>> +	end_timing(vcpu);
>> +
>> +	return trap;
>> +}
>> +EXPORT_SYMBOL_GPL(__kvmhv_vcpu_entry_p9);
> 
> <snip>
>