[PATCH v7 4/6] KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Fri May 7 20:30:08 AEST 2021


Nicholas Piggin <npiggin at gmail.com> writes:


...

> + */
>> +long do_h_rpt_invalidate_pat(struct kvm_vcpu *vcpu, unsigned long lpid,
>> +			     unsigned long type, unsigned long pg_sizes,
>> +			     unsigned long start, unsigned long end)
>> +{
>> +	struct kvm_nested_guest *gp;
>> +	long ret;
>> +	unsigned long psize, ap;
>> +
>> +	/*
>> +	 * If L2 lpid isn't valid, we need to return H_PARAMETER.
>> +	 *
>> +	 * However, nested KVM issues a L2 lpid flush call when creating
>> +	 * partition table entries for L2. This happens even before the
>> +	 * corresponding shadow lpid is created in HV which happens in
>> +	 * H_ENTER_NESTED call. Since we can't differentiate this case from
>> +	 * the invalid case, we ignore such flush requests and return success.
>> +	 */
>> +	gp = kvmhv_find_nested(vcpu->kvm, lpid);
>> +	if (!gp)
>> +		return H_SUCCESS;
>> +
>> +	/*
>> +	 * A flush all request can be handled by a full lpid flush only.
>> +	 */
>> +	if ((type & H_RPTI_TYPE_NESTED_ALL) == H_RPTI_TYPE_NESTED_ALL)
>> +		return do_tlb_invalidate_nested_all(vcpu, lpid, RIC_FLUSH_ALL);
>> +
>> +#if 0
>> +	/*
>> +	 * We don't need to handle a PWC flush like process table here,
>> +	 * because intermediate partition scoped table in nested guest doesn't
>> +	 * really have PWC. Only level we have PWC is in L0 and for nested
>> +	 * invalidate at L0 we always do kvm_flush_lpid() which does
>> +	 * radix__flush_all_lpid(). For range invalidate at any level, we
>> +	 * are not removing the higher level page tables and hence there is
>> +	 * no PWC invalidate needed.
>> +	 */
>> +	if (type & H_RPTI_TYPE_PWC) {
>> +		ret = do_tlb_invalidate_nested_all(vcpu, lpid, RIC_FLUSH_PWC);
>> +		if (ret)
>> +			return H_P4;
>> +	}
>> +#endif
>
> I think removing this #if 0 and the unnecessary code is fine, just a bit 
> more explanation in the comment would help. And "doesn't really" implies
> it sort of might a little bit, I think what you want is "really doesn't" 
> :)

yes.

>
> As I understand it, the L0 does not cache any intermediate levels of the
> nested guest's partition scope at all. Only the nested HV's pte entries
> are copied into the shadow page table, so we only care if the PTEs are
> changed, and the PWCs that the processor creates for the shadow page
> table are managed by the kvmppc_unmap_pte() etc functions... I think?

That is correct. The reason I added the comment there was to clarify why
the PWC type is not handled in case of partition scoped invalidate
similar to process scoped invalidate. The code fragment was left as an
indication of what should happen theoretically.

All higher levels of guest (L1, L2.. etc) have partition tables that are
not really used for hardware page table walk. H_RPT_INVALIDATE hcall is used as
a hint to free those page table entries. L0 on receiving the hcall will
forward the same to higher levels guest which after invalidating its
shadow pte will further issue the H_RPT_INVALIDATE hcall to clear
parition scoped entries of the current guest.

If it is a range TLB flush, we just clear the shadow pte, higher levels
page tables are not modified and hence no PWC flush is required.

If it is full lpid flush because of RIC=1/2 or because range is 0 -> -1
we do free the full partition table and does a kvmhv_flush_lpid()
which will eventually ends up calling radix__flush_all_lpid(). 

These function names are kept in the comment so that a new person
looking at the code can easily follow the code path. 


-aneesh


More information about the Linuxppc-dev mailing list