[PATCH kernel] powerpc/pseries/iommu: Add cond_resched() for huge updates

Alexey Kardashevskiy aik at ozlabs.ru
Wed Jul 24 00:50:13 AEST 2019

On 23/07/2019 21:46, Michael Ellerman wrote:
> Alexey Kardashevskiy <aik at ozlabs.ru> writes:
>> Mapping ~5.000.000 TCEs currently takes about 40s; this is the amount
>> required for a 300GB VM with 64k IOMMU page size. Anything bigger than
>> this produces RCU stall warnings.
> OK. Are we sure we're not doing anything stupid in that code to make it
> go that slowly?

Each tce_setrange_multi_pSeriesLP() is a hypercall and KVM (if it is KVM
and the call was not bounced to QEMU which I believe does not happen)
walks through all 512 TCEs in the request and does TCE Kill for each of
those TCEs which in turn are OPAL calls, and not just one per TCE but
two - one for PHB's TCE and one for NPU's TCE. And I have a test patch
to do TCE kills in powernv (not in OPAL), 40s figure was taken with this

So I agree it should be faster but it won't be 1-2s and for longer
operations we will need this resched.

>> This adds cond_resched() to allow the scheduler to do context switching
>> when it decides to.
>> This loop is called from dma_set_mask() which is a sleepable context.
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>> ---
>>  arch/powerpc/platforms/pseries/iommu.c | 1 +
>>  1 file changed, 1 insertion(+)
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>> index 889dc2e44b89..2b8de822272f 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -459,6 +459,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn,
>>  static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
>>  		unsigned long num_pfn, void *arg)
>>  {
>> +	cond_resched();
>>  	return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
>>  }
> Why there and not in tce_setrange_multi_pSeriesLP() ?

The other caller is iommu_mem_notifier and I am unsure about locking
there, there may be nasty surprises.

> I'm not sure what the maximum granularity walk_system_ram_range() will
> ever call us with is.

Contiguous blocks, which are too big. You're right, this resched better
go to tce_setrange_multi_pSeriesLP.


More information about the Linuxppc-dev mailing list