[PATCH kernel] powerpc/powernv/ioda2: Update iommu table base on ownership change

Gavin Shan gwshan at linux.vnet.ibm.com
Wed Feb 22 15:17:56 AEDT 2017

On Wed, Feb 22, 2017 at 02:05:15PM +1100, Alexey Kardashevskiy wrote:
>On 22/02/17 10:28, Gavin Shan wrote:
>> On Tue, Feb 21, 2017 at 01:41:31PM +1100, Alexey Kardashevskiy wrote:

[The subsequent discussion isn't related to the patch itself anymore]

>> One thing would be improved in future, which isn't relevant to
>> this patch if my understanding is correct enough: The TCE table for
>> DMA32 space created during system boot is destroyed when VFIO takes
>> the ownership. The same TCE table (same level, page size, window size
>> etc) is created and associated to the PE again. Some CPU cycles would
>> be saved if the original table is picked up without creating a new one.
>It is not necessary same levels or window size, could be something
>different. Also carrying a table will just make code bit more complicated
>and it is complicated enough already - we need to consider very possible
>case of IOMMU tables sharing.

Right after host boots up and VFIO isn't involved yet, each PE is associated
with a DMA32 space (0 - 2G) and the IO page size is 4KB. If the whole (window)
size, IO page size or levels are changed after the PE is released from guest
to host, it's not so much reasonable as the device (including its TCE table)
needs to be restored to previous state. Or we are talking about different
DMA space (TCE tables)?

Regarding the possiblity of sharing IOMMU tables, I don't quite understand.
Do you mean the situation of multiple functions adapter, some of them are
passed to guest and the left owned by host? I don't see how it works from
the DMA path. Would you please explain a bit?

>> The involved function is pnv_pci_ioda2_create_table(). Its primary work
>> is to allocate pages from buddy.
>It allocates pages via alloc_pages_node(), not buddy.

page allocator maybe? It's fetching page from PCP (PerCPU Pages) or buddy's
freelist depending on the requested size.

>> It's usually fast if there are enough
>> free pages. Otherwise, it would be relatively slow. It also has the risk
>> to fail the allocation. I guess it's not bad to save CPU cycles in this
>> critical (maybe hot?) path.
>It is not a critical path - it happens on a guest (re)boot only.

My point is: it sounds nice if less time needs for guest to (re)boot. I
don't know how much time could be saved though.


More information about the Linuxppc-dev mailing list