[PATCH kernel v5 10/10] KVM: PPC: VFIO: Add in-kernel acceleration for VFIO

Tue Feb 28 11:54:32 AEDT 2017

On Mon, Feb 27, 2017 at 02:20:13PM +1100, Alexey Kardashevskiy wrote:
> On 27/02/17 12:53, David Gibson wrote:
> > On Fri, Feb 24, 2017 at 02:43:05PM +1100, Alexey Kardashevskiy wrote:
> >> On 24/02/17 14:36, David Gibson wrote:
> >>> On Fri, Feb 24, 2017 at 02:29:14PM +1100, Alexey Kardashevskiy wrote:
> >>>> On 24/02/17 13:14, David Gibson wrote:
> >>>>> On Wed, Feb 22, 2017 at 07:21:33PM +1100, Alexey Kardashevskiy
> >> wrote:
> > [snip]
> >>>>>> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
> >>>>>> +		struct iommu_table *tbl, unsigned long entry)
> >>>>>> +{
> >>>>>> +	enum dma_data_direction dir = DMA_NONE;
> >>>>>> +	unsigned long hpa = 0;
> >>>>>> +	long ret;
> >>>>>> +
> >>>>>> +	if (iommu_tce_xchg_rm(tbl, entry, &hpa, &dir))
> >>>>>> +		return H_HARDWARE;
> >>>>>
> >>>>> To avoid a double WARN() (and to make the warnings easier to
> >>>>> understand) I'd suggest putting a WARN_ON() here, rather than in the
> >>>>> callers when they receieve an H_HARDWARE.  IIUC this really shouldn't
> >>>>> ever happen, and it certainly can't be the guest's fault?
> >>>>
> >>>>
> >>>> Makes sense.
> >>>
> >>> I guess it might want WARN_ON_ONCE() to avoid spamming the user with
> >>> errors for every TCE, though.
> >>
> >>
> >> We do not expect this to happen at all :) I can convert all of them to
> >> _ONCE really as the purpose of WARN_ON is mostly to document what we do not
> >> expect.
> > 
> > Sure, seems reasonable.
> > 
> > [snip]
> >>>>>> @@ -220,9 +338,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>>>>>  {
> >>>>>>  	struct kvmppc_spapr_tce_table *stt;
> >>>>>>  	long i, ret = H_SUCCESS;
> >>>>>> -	unsigned long tces, entry, ua = 0;
> >>>>>> +	unsigned long tces, entry, ua = 0, tce, gpa;
> >>>>>>  	unsigned long *rmap = NULL;
> >>>>>>  	bool prereg = false;
> >>>>>> +	struct kvmppc_spapr_tce_iommu_table *stit;
> >>>>>>  
> >>>>>>  	stt = kvmppc_find_table(vcpu->kvm, liobn);
> >>>>>>  	if (!stt)
> >>>>>> @@ -287,12 +406,24 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
> >>>>>>  	}
> >>>>>>  
> >>>>>>  	for (i = 0; i < npages; ++i) {
> >>>>>> -		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
> >>>>>> +		tce = be64_to_cpu(((u64 *)tces)[i]);
> >>>>>>  
> >>>>>>  		ret = kvmppc_tce_validate(stt, tce);
> >>>>>>  		if (ret != H_SUCCESS)
> >>>>>>  			goto unlock_exit;
> >>>>>>  
> >>>>>> +		gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
> >>>>>> +
> >>>>>> +		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
> >>>>>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
> >>>>>> +					stit->tbl, entry + i, gpa,
> >>>>>> +					iommu_tce_direction(tce));
> >>>>>> +			if (WARN_ON_ONCE(ret == H_HARDWARE))
> >>>>>
> >>>>> I don't think you need the WARN() here - the only H_HARDWARE failure
> >>>>> path in iommu_map() already includes a WARN().
> >>>>
> >>>>
> >>>> True, I can drop it here.
> >>>>
> >>>>
> >>>>>
> >>>>>> +				kvmppc_rm_clear_tce(stit->tbl, entry);
> >>>>>> +			else if (ret != H_SUCCESS)
> >>>>>> +				goto unlock_exit;
> >>>>>
> >>>>> It's also not clear to me why the H_HARDWARE error path clears the
> >>>>> entry, but the other failure paths don't.  Or why an H_HARDWARE will
> >>>>> result in continuing to set the rest of the TCEs, but other failures
> >>>>> won't.
> >>>>
> >>>>
> >>>> The idea was that other failures still have some chance that handling may
> >>>> succeed in virtual mode or via QEMU, H_HARDWARE is fatal.
> >>>
> >>> Um... yes.. but the logic seems to be backwards for that: on
> >>> H_HARDWARE you warn and keep going, on other errors you bail out
> >>> entirely.
> >>
> >> By "fatal" I means fatal for this particular hardware TCE(s), no hope in
> >> trying this particular TCE in virtual mode.
> > 
> > Ok... still not following why that means the "fatal" error results in
> > continuing to attempt for the rest of the updated TCEs, whereas the
> > "non fatal" one bails out.
> 
> I was applying the principle that if after all checks done we still cannot
> update the hardware table, then just clear the TCE and move on. Or I
> misunderstood the idea?

*Still* not seeing why if we cannot update the hardware table we keep
trying with the rest of the entries, but on other failures we don't.

> > Especially since the bail out will only go
> > to virtual mode if ret == H_TOO_HARD, which it isn't clear is the only
> > possibility.
> 
> 
> H_TOO_HARD goes to virtual mode, H_TOO_HARD in virtual goes to the
> userspace (QEMU).
> 
> Will "if (WARN_ON_ONCE(ret != H_SUCCESS && ret != H_TOO_HARD))" make more
> sense?

Probably, but depends what's in the if.

> > 
> >>>> I am just not sure if H_PARAMETER is what I want to return at [1], to make
> >>>> the calling code simplier, I could return H_HARDWARE there as well (instead
> >>>> of H_PARAMETER).
> >>>
> >>> That sounds right, IIUC the gpa to ua translation shouldn't ever
> >>> fail because of something the guest did. 
> >>
> >>
> >> The guest can easily pass bad TCE/GPA which is not in any registered slot.
> >> So it is rather H_PARAMETER.
> > 
> > Ah, yes.
> > 
> >>> So I'd expect either
> >>> H_HARDWARE, or H_TOO_HARD (if there's some hope that virtual mode can
> >>> make the translation when real mode couldn't).
> >>
> >> No, virtual mode uses the exact same helper.
> > 
> > Ok.
> > 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20170228/8df31a53/attachment.sig>