[PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

Gaurav Batra gbatra at linux.vnet.ibm.com
Mon May 22 23:11:11 AEST 2023


Hello Alexey,

No worries. I resolved the issue with Michael's help. The patch is 
merged upstream and it fixes the issue.

Here is the link

https://lore.kernel.org/all/20230504175913.83844-1-gbatra@linux.vnet.ibm.com/

Thanks,

Gaurav

On 5/21/23 7:08 PM, Alexey Kardashevskiy wrote:
> Hi Gaurav,
>
> Sorry I missed this. Please share the link to the your fix, I do not 
> see it in my mail. In general, the problem can probably be solved by 
> using huge pages (anything more than 64K) only for 1:1 mapping.
>
>
> On 03/05/2023 13:25, Gaurav Batra wrote:
>> Hello Alexey,
>>
>> I recently joined IOMMU team. There was a bug reported by test team 
>> where Mellanox driver was timing out during configuration. I proposed 
>> a fix for the same, which is below in the email.
>>
>> You suggested a fix for Srikar's reported problem. Basically, both 
>> these fixes will resolve Srikar and Mellanox driver issues. The 
>> problem is with 2MB DDW.
>>
>> Since you have extensive knowledge of IOMMU design and code, in your 
>> opinion, which patch should we adopt?
>>
>> Thanks a lot
>>
>> Gaurav
>>
>> On 4/20/23 2:45 PM, Gaurav Batra wrote:
>>> Hello Michael,
>>>
>>> I was looking into the Bug: 199106 
>>> (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106).
>>>
>>> In the Bug, Mellanox driver was timing out when enabling SRIOV device.
>>>
>>> I tested, Alexey's patch and it fixes the issue with Mellanox 
>>> driver. The down side
>>>
>>> to Alexey's fix is that even a small memory request by the driver 
>>> will be aligned up
>>>
>>> to 2MB. In my test, the Mellanox driver is issuing multiple requests 
>>> of 64K size.
>>>
>>> All these will get aligned up to 2MB, which is quite a waste of 
>>> resources.
>>>
>>>
>>> In any case, both the patches work. Let me know which approach you 
>>> prefer. In case
>>>
>>> we decide to go with my patch, I just realized that I need to fix 
>>> nio_pages in
>>>
>>> iommu_free_coherent() as well.
>>>
>>>
>>> Thanks,
>>>
>>> Gaurav
>>>
>>> On 4/20/23 10:21 AM, Michael Ellerman wrote:
>>>> Gaurav Batra <gbatra at linux.vnet.ibm.com> writes:
>>>>> When DMA window is backed by 2MB TCEs, the DMA address for the mapped
>>>>> page should be the offset of the page relative to the 2MB TCE. The 
>>>>> code
>>>>> was incorrectly setting the DMA address to the beginning of the TCE
>>>>> range.
>>>>>
>>>>> Mellanox driver is reporting timeout trying to ENABLE_HCA for an 
>>>>> SR-IOV
>>>>> ethernet port, when DMA window is backed by 2MB TCEs.
>>>> I assume this is similar or related to the bug Srikar reported?
>>>>
>>>> https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/ 
>>>>
>>>>
>>>> In that thread Alexey suggested a patch, have you tried his patch? He
>>>> suggested rounding up the allocation size, rather than adjusting the
>>>> dma_handle.
>>>>
>>>>> Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44d
>>>> That's not the right syntax, it's described in the documentation 
>>>> how to
>>>> generate it.
>>>>
>>>> It should be:
>>>>
>>>>    Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M 
>>>> IOMMU page size")
>>>>
>>>> cheers
>>>>
>>>>> diff --git a/arch/powerpc/kernel/iommu.c 
>>>>> b/arch/powerpc/kernel/iommu.c
>>>>> index ee95937bdaf1..ca57526ce47a 100644
>>>>> --- a/arch/powerpc/kernel/iommu.c
>>>>> +++ b/arch/powerpc/kernel/iommu.c
>>>>> @@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, 
>>>>> struct iommu_table *tbl,
>>>>>           /* Convert entry to a dma_addr_t */
>>>>>           entry += tbl->it_offset;
>>>>>           dma_addr = entry << tbl->it_page_shift;
>>>>> -        dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl));
>>>>> +        dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl));
>>>>>             DBG("  - %lu pages, entry: %lx, dma_addr: %lx\n",
>>>>>                   npages, entry, dma_addr);
>>>>> @@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev, 
>>>>> struct iommu_table *tbl,
>>>>>       unsigned int order;
>>>>>       unsigned int nio_pages, io_order;
>>>>>       struct page *page;
>>>>> +    int tcesize = (1 << tbl->it_page_shift);
>>>>>         size = PAGE_ALIGN(size);
>>>>>       order = get_order(size);
>>>>> @@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev, 
>>>>> struct iommu_table *tbl,
>>>>>       memset(ret, 0, size);
>>>>>         /* Set up tces to cover the allocated range */
>>>>> -    nio_pages = size >> tbl->it_page_shift;
>>>>> +    nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift;
>>>>> +
>>>>>       io_order = get_iommu_order(size, tbl);
>>>>>       mapping = iommu_alloc(dev, tbl, ret, nio_pages, 
>>>>> DMA_BIDIRECTIONAL,
>>>>>                     mask >> tbl->it_page_shift, io_order, 0);
>>>>> @@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev, 
>>>>> struct iommu_table *tbl,
>>>>>           free_pages((unsigned long)ret, order);
>>>>>           return NULL;
>>>>>       }
>>>>> -    *dma_handle = mapping;
>>>>> +
>>>>> +    *dma_handle = mapping | ((u64)ret & (tcesize - 1));
>>>>>       return ret;
>>>>>   }
>>>>>   --
>


More information about the Linuxppc-dev mailing list