[PATCH kernel 2/2] powerpc/iommu: Do not immediately panic when failed IOMMU table allocation

Leonardo Bras leobras.c at gmail.com
Tue Feb 23 05:39:34 AEDT 2021


On Mon, 2021-02-22 at 16:24 +1100, Alexey Kardashevskiy wrote:
> 
> On 18/02/2021 06:32, Leonardo Bras wrote:
> > On Tue, 2021-02-16 at 14:33 +1100, Alexey Kardashevskiy wrote:
> > > Most platforms allocate IOMMU table structures (specifically it_map)
> > > at the boot time and when this fails - it is a valid reason for panic().
> > > 
> > > However the powernv platform allocates it_map after a device is returned
> > > to the host OS after being passed through and this happens long after
> > > the host OS booted. It is quite possible to trigger the it_map allocation
> > > panic() and kill the host even though it is not necessary - the host OS
> > > can still use the DMA bypass mode (requires a tiny fraction of it_map's
> > > memory) and even if that fails, the host OS is runnnable as it was without
> > > the device for which allocating it_map causes the panic.
> > > 
> > > Instead of immediately crashing in a powernv/ioda2 system, this prints
> > > an error and continues. All other platforms still call panic().
> > > 
> > > Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> > 
> > Hello Alexey,
> > 
> > This looks like a good change, that passes panic() decision to platform
> > code. Everything looks pretty straightforward, but I have a question
> > regarding this:
> > 
> > > @@ -1930,16 +1931,16 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
> > >   		res_start = pe->phb->ioda.m32_pci_base >> tbl->it_page_shift;
> > >   		res_end = min(window_size, SZ_4G) >> tbl->it_page_shift;
> > >   	}
> > > -	iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end);
> > > -	rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> > > 
> > > +	if (iommu_init_table(tbl, pe->phb->hose->node, res_start, res_end))
> > > +		rc = pnv_pci_ioda2_set_window(&pe->table_group, 0, tbl);
> > > +	else
> > > +		rc = -ENOMEM;
> > >   	if (rc) {
> > > -		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n",
> > > -				rc);
> > > +		pe_err(pe, "Failed to configure 32-bit TCE table, err %ld\n", rc);
> > >   		iommu_tce_table_put(tbl);
> > > -		return rc;
> > > +		tbl = NULL; /* This clears iommu_table_base below */
> > >   	}
> > > -
> > >   	if (!pnv_iommu_bypass_disabled)
> > >   		pnv_pci_ioda2_set_bypass(pe, true);
> > >   
> > > 
> > > 
> > > 
> > > 
> > 
> > If I could understand correctly, previously if iommu_init_table() did
> > not panic(), and pnv_pci_ioda2_set_window() returned something other
> > than 0, it would return rc in the if (rc) clause, but now it does not
> > happen anymore, going through if (!pnv_iommu_bypass_disabled) onwards.
> > 
> > Is that desired?
> 
> 
> Yes. A PE (==device, pretty much) has 2 DMA windows:
> - the default one which requires some RAM to operate
> - a bypass mode which tells the hardware that PCI addresses are 
> statically mapped to RAM 1:1.
> 
> This bypass mode does not require extra memory to work and is used in 
> the most cases on the bare metal as long as the device supports 64bit 
> DMA which is everything except GPUs. Since it is cheap to enable and 
> this what we prefer anyway, no urge to fail.
> 
> 
> > As far as I could see, returning rc there seems a good procedure after
> > iommu_init_table returning -ENOMEM.
> 
> This change is intentional and yes it could be done by a separate patch 
> but I figured there is no that much value in splitting.

Ok then, thanks for clarifying.
FWIW:

Reviewed-by: Leonardo Bras <leobras.c at gmail.com>




More information about the Linuxppc-dev mailing list