[PATCH v3 19/41] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

Nicholas Piggin npiggin at gmail.com
Tue Mar 23 00:15:30 AEDT 2021


Excerpts from Alexey Kardashevskiy's message of March 22, 2021 5:30 pm:
> 
> 
> On 06/03/2021 02:06, Nicholas Piggin wrote:
>> In the interest of minimising the amount of code that is run in
>> "real-mode", don't handle hcalls in real mode in the P9 path.
>> 
>> POWER8 and earlier are much more expensive to exit from HV real mode
>> and switch to host mode, because on those processors HV interrupts get
>> to the hypervisor with the MMU off, and the other threads in the core
>> need to be pulled out of the guest, and SLBs all need to be saved,
>> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
>> in host mode. Hash guests also require a lot of hcalls to run. The
>> XICS interrupt controller requires hcalls to run.
>> 
>> By contrast, POWER9 has independent thread switching, and in radix mode
>> the hypervisor is already in a host virtual memory mode when the HV
>> interrupt is taken. Radix + xive guests don't need hcalls to handle
>> interrupts or manage translations.
>> 
>> So it's much less important to handle hcalls in real mode in P9.
> 
> So acde25726bc6034b (which added if(kvm_is_radix(vcpu->kvm))return 
> H_TOO_HARD) can be reverted, pretty much?

Yes. Although that calls attention to the fact I missed doing
a P9 h_random handler in this patch. I'll fix that, then I think
acde2572 could be reverted entirely.

[...]

>>   	} else {
>>   		kvmppc_xive_push_vcpu(vcpu);
>>   		trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
>> -		kvmppc_xive_pull_vcpu(vcpu);
>> +		/* H_CEDE has to be handled now, not later */
>> +		/* XICS hcalls must be handled before xive is pulled */
>> +		if (trap == BOOK3S_INTERRUPT_SYSCALL &&
>> +		    !(vcpu->arch.shregs.msr & MSR_PR)) {
>> +			unsigned long req = kvmppc_get_gpr(vcpu, 3);
>>   
>> +			if (req == H_CEDE) {
>> +				kvmppc_cede(vcpu);
>> +				kvmppc_xive_cede_vcpu(vcpu); /* may un-cede */
>> +				kvmppc_set_gpr(vcpu, 3, 0);
>> +				trap = 0;
>> +			}
>> +			if (req == H_EOI || req == H_CPPR ||
> 
> else if (req == H_EOI ... ?

Hummm, sure.

[...]

>> +void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu)
>> +{
>> +	void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr;
>> +
>> +	if (!esc_vaddr)
>> +		return;
>> +
>> +	/* we are using XIVE with single escalation */
>> +
>> +	if (vcpu->arch.xive_esc_on) {
>> +		/*
>> +		 * If we still have a pending escalation, abort the cede,
>> +		 * and we must set PQ to 10 rather than 00 so that we don't
>> +		 * potentially end up with two entries for the escalation
>> +		 * interrupt in the XIVE interrupt queue.  In that case
>> +		 * we also don't want to set xive_esc_on to 1 here in
>> +		 * case we race with xive_esc_irq().
>> +		 */
>> +		vcpu->arch.ceded = 0;
>> +		/*
>> +		 * The escalation interrupts are special as we don't EOI them.
>> +		 * There is no need to use the load-after-store ordering offset
>> +		 * to set PQ to 10 as we won't use StoreEOI.
>> +		 */
>> +		__raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_10);
>> +	} else {
>> +		vcpu->arch.xive_esc_on = true;
>> +		mb();
>> +		__raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00);
>> +	}
>> +	mb();
> 
> 
> Uff. Thanks for cut-n-pasting the comments, helped a lot to match this c 
> to that asm!

Glad it helped.

>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_xive_cede_vcpu);
>> +
>>   /*
>>    * This is a simple trigger for a generic XIVE IRQ. This must
>>    * only be called for interrupts that support a trigger page
>> @@ -2106,6 +2140,32 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>>   	return 0;
>>   }
>>   
>> +int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
>> +{
>> +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> 
> 
> Can a XIVE enabled guest issue these hcalls? Don't we want if 
> (!kvmppc_xics_enabled(vcpu)) and
>   if (xics_on_xive()) here, as kvmppc_rm_h_xirr() have? Some of these 
> hcalls do write to XIVE registers but some seem to change 
> kvmppc_xive_vcpu. Thanks,

Yes I think you're right, good catch. I'm not completely sure about all 
the xive and xics modes but a guest certainly can make any kind of hcall 
it likes and we have to sanity check it.

We want to take the hcall here (in replacement of the real mode hcalls)
with the same condition. So it would be:

        if (!kvmppc_xics_enabled(vcpu))
                return H_TOO_HARD;
        if (!xics_on_xive())
		return H_TOO_HARD;
	
	[ ... process xive_vm_h_xirr / cppr / eoi / etc ]

Right?

Thanks,
Nick

> 
> 
> 
> 
>> +
>> +	switch (req) {
>> +	case H_XIRR:
>> +		return xive_vm_h_xirr(vcpu);
>> +	case H_CPPR:
>> +		return xive_vm_h_cppr(vcpu, kvmppc_get_gpr(vcpu, 4));
>> +	case H_EOI:
>> +		return xive_vm_h_eoi(vcpu, kvmppc_get_gpr(vcpu, 4));
>> +	case H_IPI:
>> +		return xive_vm_h_ipi(vcpu, kvmppc_get_gpr(vcpu, 4),
>> +					  kvmppc_get_gpr(vcpu, 5));
>> +	case H_IPOLL:
>> +		return xive_vm_h_ipoll(vcpu, kvmppc_get_gpr(vcpu, 4));
>> +	case H_XIRR_X:
>> +		xive_vm_h_xirr(vcpu);
>> +		kvmppc_set_gpr(vcpu, 5, get_tb() + vc->tb_offset);
>> +		return H_SUCCESS;
>> +	}
>> +
>> +	return H_UNSUPPORTED;
>> +}
>> +EXPORT_SYMBOL_GPL(kvmppc_xive_xics_hcall);
>> +
>>   int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
>>   {
>>   	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>> 
> 
> -- 
> Alexey
> 


More information about the Linuxppc-dev mailing list