[PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
Alexey Kardashevskiy
aik at ozlabs.ru
Mon May 22 10:08:48 AEST 2023
Hi Gaurav,
Sorry I missed this. Please share the link to the your fix, I do not see
it in my mail. In general, the problem can probably be solved by using
huge pages (anything more than 64K) only for 1:1 mapping.
On 03/05/2023 13:25, Gaurav Batra wrote:
> Hello Alexey,
>
> I recently joined IOMMU team. There was a bug reported by test team
> where Mellanox driver was timing out during configuration. I proposed a
> fix for the same, which is below in the email.
>
> You suggested a fix for Srikar's reported problem. Basically, both these
> fixes will resolve Srikar and Mellanox driver issues. The problem is
> with 2MB DDW.
>
> Since you have extensive knowledge of IOMMU design and code, in your
> opinion, which patch should we adopt?
>
> Thanks a lot
>
> Gaurav
>
> On 4/20/23 2:45 PM, Gaurav Batra wrote:
>> Hello Michael,
>>
>> I was looking into the Bug: 199106
>> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>>
>> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>>
>> I tested, Alexey's patch and it fixes the issue with Mellanox driver.
>> The down side
>>
>> to Alexey's fix is that even a small memory request by the driver will
>> be aligned up
>>
>> to 2MB. In my test, the Mellanox driver is issuing multiple requests
>> of 64K size.
>>
>> All these will get aligned up to 2MB, which is quite a waste of
>> resources.
>>
>>
>> In any case, both the patches work. Let me know which approach you
>> prefer. In case
>>
>> we decide to go with my patch, I just realized that I need to fix
>> nio_pages in
>>
>> iommu_free_coherent() as well.
>>
>>
>> Thanks,
>>
>> Gaurav
>>
>> On 4/20/23 10:21 AM, Michael Ellerman wrote:
>>> Gaurav Batra <gbatra at linux.vnet.ibm.com> writes:
>>>> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
>>>> page should be the offset of the page relative to the 2MB TCE. The code
>>>> was incorrectly setting the DMA address to the beginning of the TCE
>>>> range.
>>>>
>>>> Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
>>>> ethernet port, when DMA window is backed by 2MB TCEs.
>>> I assume this is similar or related to the bug Srikar reported?
>>>
>>> https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/
>>>
>>> In that thread Alexey suggested a patch, have you tried his patch? He
>>> suggested rounding up the allocation size, rather than adjusting the
>>> dma_handle.
>>>
>>>> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
>>> That's not the right syntax, it's described in the documentation how to
>>> generate it.
>>>
>>> It should be:
>>>
>>> Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M
>>> IOMMU page size")
>>>
>>> cheers
>>>
>>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>>> index ee95937bdaf1..ca57526ce47a 100644
>>>> --- a/arch/powerpc/kernel/iommu.c
>>>> +++ b/arch/powerpc/kernel/iommu.c
>>>> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct
>>>> iommu_table *tbl,
>>>> /* Convert entry to a dma_addr_t */
>>>> entry += tbl->it_offset;
>>>> dma_addr = entry << tbl->it_page_shift;
>>>> - dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
>>>> + dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>>>> DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n",
>>>> npages, entry, dma_addr);
>>>> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev,
>>>> struct iommu_table *tbl,
>>>> unsigned int order;
>>>> unsigned int nio_pages, io_order;
>>>> struct page *page;
>>>> + int tcesize = (1 << tbl->it_page_shift);
>>>> size = PAGE_ALIGN(size);
>>>> order = get_order(size);
>>>> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>>> struct iommu_table *tbl,
>>>> memset(ret, 0, size);
>>>> /* Set up tces to cover the allocated range */
>>>> - nio_pages = size >> tbl->it_page_shift;
>>>> + nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
>>>> +
>>>> io_order = get_iommu_order(size, tbl);
>>>> mapping = iommu_alloc(dev, tbl, ret, nio_pages,
>>>> DMA_BIDIRECTIONAL,
>>>> mask >> tbl->it_page_shift, io_order, 0);
>>>> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev,
>>>> struct iommu_table *tbl,
>>>> free_pages((unsigned long)ret, order);
>>>> return NULL;
>>>> }
>>>> - *dma_handle = mapping;
>>>> +
>>>> + *dma_handle = mapping | ((u64)ret & (tcesize - 1));
>>>> return ret;
>>>> }
>>>> --
--
Alexey
More information about the Linuxppc-dev
mailing list