[PATCH kernel RFC 0/4] powerpc/powenv/ioda: Allow huge DMA window at 4GB
Alexey Kardashevskiy
aik at ozlabs.ru
Mon Dec 2 16:58:15 AEDT 2019
On 02/12/2019 16:36, Alistair Popple wrote:
> On Monday, 2 December 2019 12:59:49 PM AEDT Alexey Kardashevskiy wrote:
>> Here is an attempt to support bigger DMA space for devices
>> supporting DMA masks less than 59 bits (GPUs come into mind
>> first). POWER9 PHBs have an option to map 2 windows at 0
>> and select a windows based on DMA address being below or above
>> 4GB.
>>
>> This adds the "iommu=iommu_bypass" kernel parameter and
>
> Would it be possible to just enable this by default if the platform supports
> it? Are there any downsides?
It changes the second DMA window location which is now assumed by QEMU
to be at 0x800.0000.0000.0000 and I do not see an easy way to work
around this.
For example, we start QEMU without VFIO but with emulated XHCI which
will ask for DDW, we (QEMU) have to pick a window location but then we
have to stick to it and if a user later hotplugs an VFIO-PCI, that
physical IOMMU has to support the previously selected DMA window
address; otherwise hotplug is going to fail.
The question is how to tell QEMU about this new offset and what we do
about migration from P8 (which let's say did have a VFIO device which we
unplug before the migration) to P9 with a prospect of hotplugging an
VFIO device but this time with this GTE4GB bit set.
> Adding it as an option seems like it would make
> things harder to support and reduces the amount of testing/use it would get.
Yeah, this why this is an RFC...
>> supports VFIO+pseries machine - current this requires telling
>> upstream+unmodified QEMU about this via
>> -global spapr-pci-host-bridge.dma64_win_addr=0x100000000
>> or per-phb property. 4/4 advertises the new option but
>> there is no automation around it in QEMU (should it be?).
>>
>> For now it is either 1<<59 or 4GB mode; dynamic switching is
>> not supported (could be via sysfs).
>>
>> This is based on sha1
>> a6ed68d6468b Linus Torvalds "Merge tag 'drm-next-2019-11-27' of git://
> anongit.freedesktop.org/drm/drm".
>
> Are you sure?
Almost. It should have been HEAD^^^^^..HEAD instead of HEAD^^^^..HEAD :)
I've posted 00/4 to the thread now, sorry about that. Thanks,
> I am getting the following rejected hunk trying to apply the
> first patch in the series:
>
> --- arch/powerpc/platforms/powernv/pci-ioda.c
> +++ arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2349,15 +2349,10 @@ static void pnv_pci_ioda2_set_bypass(struct
> pnv_ioda_pe *pe, bool enable)
> pe->tce_bypass_enabled = enable;
> }
>
> -static long pnv_pci_ioda2_create_table(struct iommu_table_group *table_group,
> - int num, __u32 page_shift, __u64 window_size, __u32 levels,
> +static long pnv_pci_ioda2_create_table(int nid, int num, __u64 bus_offset,
> + __u32 page_shift, __u64 window_size, __u32 levels,
> bool alloc_userspace_copy, struct iommu_table **ptbl)
> {
> - struct pnv_ioda_pe *pe = container_of(table_group, struct pnv_ioda_pe,
> - table_group);
> - int nid = pe->phb->hose->node;
> - __u64 bus_offset = num ?
> - pe->table_group.tce64_start : table_group->tce32_start;
> long ret;
> struct iommu_table *tbl;
>
> - Alistair
>
>> Please comment. Thanks.
>>
>>
>>
>> Alexey Kardashevskiy (4):
>> powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>> powerpc/powernv/ioda: Allow smaller TCE table levels
>> powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>> vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
>>
>> arch/powerpc/include/asm/iommu.h | 1 +
>> arch/powerpc/include/asm/opal-api.h | 11 +-
>> arch/powerpc/include/asm/opal.h | 2 +
>> arch/powerpc/platforms/powernv/pci.h | 1 +
>> include/uapi/linux/vfio.h | 2 +
>> arch/powerpc/platforms/powernv/opal-call.c | 2 +
>> arch/powerpc/platforms/powernv/pci-ioda-tce.c | 4 +-
>> arch/powerpc/platforms/powernv/pci-ioda.c | 219 ++++++++++++++----
>> drivers/vfio/vfio_iommu_spapr_tce.c | 10 +-
>> 9 files changed, 202 insertions(+), 50 deletions(-)
>>
>>
>
>
>
>
--
Alexey
More information about the Linuxppc-dev
mailing list