[PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

Alexey Kardashevskiy aik at ozlabs.ru
Mon May 22 10:08:48 AEST 2023


Hi Gaurav,

Sorry I missed this. Please share the link to the your fix, I do not see 
it in my mail. In general, the problem can probably be solved by using 
huge pages (anything more than 64K) only for 1:1 mapping.


On 03/05/2023 13:25, Gaurav Batra wrote:
> Hello Alexey,
> 
> I recently joined IOMMU team. There was a bug reported by test team 
> where Mellanox driver was timing out during configuration. I proposed a 
> fix for the same, which is below in the email.
> 
> You suggested a fix for Srikar's reported problem. Basically, both these 
> fixes will resolve Srikar and Mellanox driver issues. The problem is 
> with 2MB DDW.
> 
> Since you have extensive knowledge of IOMMU design and code, in your 
> opinion, which patch should we adopt?
> 
> Thanks a lot
> 
> Gaurav
> 
> On 4/20/23 2:45 PM, Gaurav Batra wrote:
>> Hello Michael,
>>
>> I was looking into the Bug: 199106 
>> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>>
>> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>>
>> I tested, Alexey's patch and it fixes the issue with Mellanox driver. 
>> The down side
>>
>> to Alexey's fix is that even a small memory request by the driver will 
>> be aligned up
>>
>> to 2MB. In my test, the Mellanox driver is issuing multiple requests 
>> of 64K size.
>>
>> All these will get aligned up to 2MB, which is quite a waste of 
>> resources.
>>
>>
>> In any case, both the patches work. Let me know which approach you 
>> prefer. In case
>>
>> we decide to go with my patch, I just realized that I need to fix 
>> nio_pages in
>>
>> iommu_free_coherent() as well.
>>
>>
>> Thanks,
>>
>> Gaurav
>>
>> On 4/20/23 10:21 AM, Michael Ellerman wrote:
>>> Gaurav Batra <gbatra at linux.vnet.ibm.com> writes:
>>>> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
>>>> page should be the offset of the page relative to the 2MB TCE. The code
>>>> was incorrectly setting the DMA address to the beginning of the TCE
>>>> range.
>>>>
>>>> Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV
>>>> ethernet port, when DMA window is backed by 2MB TCEs.
>>> I assume this is similar or related to the bug Srikar reported?
>>>
>>> https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/
>>>
>>> In that thread Alexey suggested a patch, have you tried his patch? He
>>> suggested rounding up the allocation size, rather than adjusting the
>>> dma_handle.
>>>
>>>> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
>>> That's not the right syntax, it's described in the documentation how to
>>> generate it.
>>>
>>> It should be:
>>>
>>>    Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M 
>>> IOMMU page size")
>>>
>>> cheers
>>>
>>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>>> index ee95937bdaf1..ca57526ce47a 100644
>>>> --- a/arch/powerpc/kernel/iommu.c
>>>> +++ b/arch/powerpc/kernel/iommu.c
>>>> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, struct 
>>>> iommu_table *tbl,
>>>>           /* Convert entry to a dma_addr_t */
>>>>           entry += tbl->it_offset;
>>>>           dma_addr = entry << tbl->it_page_shift;
>>>> -        dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
>>>> +        dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>>>>             DBG("  - %lu pages, entry: %lx, dma_addr: %lx\n",
>>>>                   npages, entry, dma_addr);
>>>> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, 
>>>> struct iommu_table *tbl,
>>>>       unsigned int order;
>>>>       unsigned int nio_pages, io_order;
>>>>       struct page *page;
>>>> +    int tcesize = (1 << tbl->it_page_shift);
>>>>         size = PAGE_ALIGN(size);
>>>>       order = get_order(size);
>>>> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, 
>>>> struct iommu_table *tbl,
>>>>       memset(ret, 0, size);
>>>>         /* Set up tces to cover the allocated range */
>>>> -    nio_pages = size >> tbl->it_page_shift;
>>>> +    nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
>>>> +
>>>>       io_order = get_iommu_order(size, tbl);
>>>>       mapping = iommu_alloc(dev, tbl, ret, nio_pages, 
>>>> DMA_BIDIRECTIONAL,
>>>>                     mask >> tbl->it_page_shift, io_order, 0);
>>>> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, 
>>>> struct iommu_table *tbl,
>>>>           free_pages((unsigned long)ret, order);
>>>>           return NULL;
>>>>       }
>>>> -    *dma_handle = mapping;
>>>> +
>>>> +    *dma_handle = mapping | ((u64)ret & (tcesize - 1));
>>>>       return ret;
>>>>   }
>>>>   --

-- 
Alexey


More information about the Linuxppc-dev mailing list