[PATCH kernel] powerpc/powernv/ioda2: Update iommu table base on ownership change
aik at ozlabs.ru
Wed Feb 22 14:05:15 AEDT 2017
On 22/02/17 10:28, Gavin Shan wrote:
> On Tue, Feb 21, 2017 at 01:41:31PM +1100, Alexey Kardashevskiy wrote:
>> On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
>> our case), a device needs an iommu_table pointer set via
>> The codeflow is:
>> - pnv_pci_ioda2_setup_dma_pe()
>> - pnv_pci_ioda2_setup_default_config()
>> - pnv_ioda_setup_bus_dma() 
>> pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
>> pnv_pci_ioda2_setup_default_config() does default DMA setup,
>> pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
>> PEs as bus PEs except NPU), walks through all underlying buses and
>> devices, adds all devices to an IOMMU group and sets iommu_table.
>> On IODA2, when VFIO is used, it takes ownership over a PE which means it
>> removes all tables and creates new ones (with a possibility of sharing
>> them among PEs). So when the ownership is returned from VFIO to
>> the kernel, the iommu_table pointer written to a device at  is
>> stale and needs an update.
>> This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
>> (in fact re-adds as it used to be there a while ago for different
>> reasons) to tell the helper if a device needs to be added to
>> an IOMMU group with an iommu_table update or just the latter.
>> This calls pnv_ioda_setup_bus_dma(..., false) from
>> pnv_ioda2_release_ownership() so when the ownership is restored,
>> 32bit DMA can work again for a device. This does the same thing
>> on obtaining ownership as the iommu_table point is stale at this point
>> anyway and it is safer to have NULL there.
>> We did not hit this earlier as all tested devices in recent years were
>> only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
>> which uses both 32bit and 64bit DMA access and it has not been tested
>> with VFIO much.
>> Cc: Gavin Shan <gwshan at linux.vnet.ibm.com>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> Acked-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
> One thing would be improved in future, which isn't relevant to
> this patch if my understanding is correct enough: The TCE table for
> DMA32 space created during system boot is destroyed when VFIO takes
> the ownership. The same TCE table (same level, page size, window size
> etc) is created and associated to the PE again. Some CPU cycles would
> be saved if the original table is picked up without creating a new one.
It is not necessary same levels or window size, could be something
different. Also carrying a table will just make code bit more complicated
and it is complicated enough already - we need to consider very possible
case of IOMMU tables sharing.
> The involved function is pnv_pci_ioda2_create_table(). Its primary work
> is to allocate pages from buddy.
It allocates pages via alloc_pages_node(), not buddy.
> It's usually fast if there are enough
> free pages. Otherwise, it would be relatively slow. It also has the risk
> to fail the allocation. I guess it's not bad to save CPU cycles in this
> critical (maybe hot?) path.
It is not a critical path - it happens on a guest (re)boot only.
More information about the Linuxppc-dev