[PATCH v7 3/6] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE

Thu May 6 16:31:28 AEST 2021

On Thu, May 06, 2021 at 03:45:21PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of May 6, 2021 1:46 am:
> >  
> > +static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
> > +				    unsigned long id, unsigned long target,
> > +				    unsigned long type, unsigned long pg_sizes,
> > +				    unsigned long start, unsigned long end)
> > +{
> > +	unsigned long psize;
> > +	struct mmu_psize_def *def;
> > +
> > +	if (!kvm_is_radix(vcpu->kvm))
> > +		return H_UNSUPPORTED;
> > +
> > +	if (end < start)
> > +		return H_P5;
> > +
> > +	/*
> > +	 * Partition-scoped invalidation for nested guests.
> > +	 * Not yet supported
> > +	 */
> > +	if (type & H_RPTI_TYPE_NESTED)
> > +		return H_P3;
> > +
> > +	/*
> > +	 * Process-scoped invalidation for L1 guests.
> > +	 */
> > +	for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
> > +		def = &mmu_psize_defs[psize];
> > +		if (!(pg_sizes & def->h_rpt_pgsize))
> > +			continue;
> 
> Not that it really matters but why did you go this approach rather than
> use a bitmask iteration over h_rpt_pgsize?

If you are asking why I am not just looping over the hcall argument
@pg_sizes bitmask then, I was doing that in my earlier version. But
David suggested that it would be good to have page size encodings
of H_RPT_INVALIDATE within mmu_pgsize_defs[]. Based on this, I am
populating mmu_pgsize_defs[] during radix page size initialization
and using that here to check for those page sizes that have been set
in @pg_sizes.

> 
> I would actually prefer to put this loop into the TLB invalidation code
> itself.

Yes, I could easily move it there.

> 
> The reason is that not all flush types are based on page size. You only
> need to do IS=1/2/3 flushes once and it takes out all page sizes.

I see. So we have to do explicit flushing for different page sizes
only if we are doing range based invalidation (IS=0). For rest of
the cases (IS=1/2/3), that's not necessary.

> 
> You don't need to do all these optimisations right now, but it would
> be good to make them possible to implement.

Sure.

> > +void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid,
> > +			     unsigned long type, unsigned long page_size,
> > +			     unsigned long psize, unsigned long start,
> > +			     unsigned long end)
> > +{
> > +	/*
> > +	 * A H_RPTI_TYPE_ALL request implies RIC=3, hence
> > +	 * do a single IS=1 based flush.
> > +	 */
> > +	if ((type & H_RPTI_TYPE_ALL) == H_RPTI_TYPE_ALL) {
> > +		_tlbie_pid_lpid(pid, lpid, RIC_FLUSH_ALL);
> > +		return;
> > +	}
> > +
> > +	if (type & H_RPTI_TYPE_PWC)
> > +		_tlbie_pid_lpid(pid, lpid, RIC_FLUSH_PWC);
> > +
> > +	if (start == 0 && end == -1) /* PID */
> > +		_tlbie_pid_lpid(pid, lpid, RIC_FLUSH_TLB);
> > +	else /* EA */
> > +		_tlbie_va_range_lpid(start, end, pid, lpid, page_size,
> > +				     psize, false);
> 
> At least one thing that is probably needed is to use the 
> single_page_flush_ceiling to flip the va range flush over to a pid 
> flush, so the guest can't cause problems in the hypervisor with an 
> enormous range.

Yes, makes sense. I shall do this and the above as later optimizations.

Regards,
Bharata.