[PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation

Alexey Kardashevskiy aik at ozlabs.ru
Mon Feb 22 16:24:44 AEDT 2021



On 18/02/2021 06:32, Leonardo Bras wrote:
> On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
>> Most platforms allocate IOMMU table structures (specifically it_map)
>> at the boot time and when this fails - it is a valid reason for panic().
>>
>> However the powernv platform allocates it_map after a device is returned
>> to the host OS after being passed through and this happens long after
>> the host OS booted. It is quite possible to trigger the it_map allocation
>> panic() and kill the host even though it is not necessary - the host OS
>> can still use the DMA bypass mode (requires a tiny fraction of it_map's
>> memory) and even if that fails, the host OS is runnnable as it was without
>> the device for which allocating it_map causes the panic.
>>
>> Instead of immediately crashing in a powernv/ioda2 system, this prints
>> an error and continues. All other platforms still call panic().
>>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> 
> Hello Alexey,
> 
> This looks like a good change, that passes panic() decision to platform
> code. Everything looks pretty straightforward, but I have a question
> regarding this:
> 
>> @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
>>   		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
>>   		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
>>   	}
>> -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
>> -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
>>
>> +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
>> +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
>> +	else
>> +		rc = -ENOMEM;
>>   	if (rc) {
>> -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
>> -				rc);
>> +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
>>   		iommu_tce_table_put(tbl);
>> -		return rc;
>> +		tbl = NULL; /* This clears iommu_table_base below */
>>   	}
>> -
>>   	if (!pnv_iommu_bypass_disabled)
>>   		pnv_pci_ioda2_set_bypass(pe, true);
>>   
>>
> 
> If I could understand correctly, previously if iommu_init_table() did
> not panic(), and pnv_pci_ioda2_set_window() returned something other
> than 0, it would return rc in the if (rc) clause, but now it does not
> happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.
> 
> Is that desired?


Yes. A PE (==device, pretty much) has 2 DMA windows:
- the default one which requires some RAM to operate
- a bypass mode which tells the hardware that PCI addresses are 
statically mapped to RAM 1:1.

This bypass mode does not require extra memory to work and is used in 
the most cases on the bare metal as long as the device supports 64bit 
DMA which is everything except GPUs. Since it is cheap to enable and 
this what we prefer anyway, no urge to fail.


> As far as I could see, returning rc there seems a good procedure after
> iommu_init_table returning -ENOMEM.

This change is intentional and yes it could be done by a separate patch 
but I figured there is no that much value in splitting.



-- 
Alexey


More information about the Linuxppc-dev mailing list