[PATCH] powerpc/powernv/iommu: iommu incorrectly bypass DMA APIs

Gaurav Batra gbatra at linux.ibm.com
Tue Apr 7 08:49:39 AEST 2026


On 4/3/26 10:53 PM, Shivaprasad G Bhat wrote:
> On 4/1/26 6:35 AM, Ritesh Harjani (IBM) wrote:
>> ++CC few people who would be interested in this fix.
>>
>> Gaurav Batra <gbatra at linux.ibm.com> writes:
>>
>>> In a PowerNV environment, for devices that supports DMA mask less than
>>> 64 bit but larger than 32 bits, iommu is incorrectly bypassing DMA
>>> APIs while allocating and mapping buffers for DMA operations.
>>>
>>> Devices are failing with ENOMEN during probe with the following 
>>> messages
>>>
>>> amdgpu 0000:01:00.0: [drm] Detected VRAM RAM=4096M, BAR=4096M
>>> amdgpu 0000:01:00.0: [drm] RAM width 128bits GDDR5
>>> amdgpu 0000:01:00.0: iommu: 64-bit OK but direct DMA is limited by 0
>>> amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass 
>>> mask 0xfffffffffffffff
>>> amdgpu 0000:01:00.0:  4096M of VRAM memory ready
>>> amdgpu 0000:01:00.0:  32570M of GTT memory ready.
>>> amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
>>> amdgpu 0000:01:00.0: [drm] Debug VRAM access will use slowpath MM 
>>> access
>>> amdgpu 0000:01:00.0: [drm] GART: num cpu pages 4096, num gpu pages 
>>> 65536
>>> amdgpu 0000:01:00.0: [drm] PCIE GART of 256M enabled (table at 
>>> 0x000000F4FFF80000).
>>> amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
>>> amdgpu 0000:01:00.0: (-12) create WB bo failed
>>> amdgpu 0000:01:00.0: amdgpu_device_wb_init failed -12
>>> amdgpu 0000:01:00.0: amdgpu_device_ip_init failed
>>> amdgpu 0000:01:00.0: Fatal error during GPU init
>>> amdgpu 0000:01:00.0: finishing device.
>>> amdgpu 0000:01:00.0: probe with driver amdgpu failed with error -12
>>> amdgpu 0000:01:00.0:  ttm finalized
>>>
>>> Fixes: 1471c517cf7d ("powerpc/iommu: bypass DMA APIs for coherent 
>>> allocations for pre-mapped memory")
>>> Reported-by: Dan Horák <dan at danny.cz>
>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5039
>> We could even add the lore link so that in future people can find the
>> discussion.
>> Closes: 
>> https://lore.kernel.org/linuxppc-dev/20260313142351.609bc4c3efe1184f64ca5f44@danny.cz/
>>
>>> Tested-by: Dan Horak <dan at danny.cz>
>>> Signed-off-by: Ritesh Harjani <ritesh.list at gmail.com>
>> Feel free to change this to the following, I mainly only suggested 
>> the fix :)
>>
>> Suggested-by: Ritesh Harjani (IBM) <ritesh.list at gmail.com>
>>
>>
>>> Signed-off-by: Gaurav Batra <gbatra at linux.ibm.com>
>>> ---
>> Generally this info...
>>> I am working on testing the patch in an LPAR with AMDGPU. I will 
>>> update the
>>> results soon.
>> ... goes below the three dashes in here ^^^
>> This is the same place where people also update the patch change log.
>>
>> But sure, thanks for updating. As for this patch, it looks good to me.
>> So, feel free to add:
>>
>> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list at gmail.com>
>>
>>
>>>   arch/powerpc/kernel/dma-iommu.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/dma-iommu.c 
>>> b/arch/powerpc/kernel/dma-iommu.c
>>> index 73e10bd4d56d..8b4de508d2eb 100644
>>> --- a/arch/powerpc/kernel/dma-iommu.c
>>> +++ b/arch/powerpc/kernel/dma-iommu.c
>>> @@ -67,7 +67,7 @@ bool arch_dma_unmap_sg_direct(struct device *dev, 
>>> struct scatterlist *sg,
>>>   }
>>>   bool arch_dma_alloc_direct(struct device *dev)
>>>   {
>>> -    if (dev->dma_ops_bypass)
>>> +    if (dev->dma_ops_bypass && dev->bus_dma_limit)
>>>           return true;
>>>         return false;
>>> @@ -75,7 +75,7 @@ bool arch_dma_alloc_direct(struct device *dev)
>>>     bool arch_dma_free_direct(struct device *dev, dma_addr_t 
>>> dma_handle)
>>>   {
>>> -    if (!dev->dma_ops_bypass)
>>> +    if (!dev->dma_ops_bypass || !dev->bus_dma_limit)
>>>           return false;
>
>
> While this works, looking more on why the 1471c517cf7dae needed 
> arch_dma_[alloc|free]_direct()
>
> functions was because dev->bus_dma_limit being not updated when the 
> new memory pages were
>
> getting added in add_pages() case leading to the can_map_direct() 
> checks failing later when needed.
>
> Looks like these functions were modeled after 
> arch_dma_[map|unmap]_phys_direct() which were
>
> introduced for handing a similar need.
>
>
> Do we get iommu_mem_notifier() call for action MEM_GOING_ONLINE in 
> device memory being
>
> made online case? If yes, we can just update the bus_dma_limit for 
> each device in the iommu
>
> group to include the new max_pfn. This makes it sound wrong as the 
> pages though configured
>
> as system memory, are still  device private and the memory notifiers 
> are for the real memory
>
> hotplug/unplug cases.
>
>
> In the commit 7170130e4c below, Balibir does point this out to be a 
> problem and prevents updating
>
> max_pfn for device private memory. He has mentioned it to be 
> considered for PPC too.
>
> The dma_addressing_limited() of x86 is same as PPC can_map_direct() 
> and we know why
>
> arch_dma_[alloc|free]_direct() would not be needed anymore.

When pmemory is converted to "system-ram" it moves "max_pfn" as well. 
The command is "daxctl reconfigure-device --mode=system-ram dax0.0 
--force". This needs to be considered as well.


Thanks,

Gaurav

>
>
> commit 7170130e4c72ce0caa0cb42a1627c635cc262821
> Author: Balbir Singh <balbirs at nvidia.com>
> Date:   Tue Apr 1 11:07:52 2025 +1100
>
>     x86/mm/init: Handle the special case of device private pages in 
> add_pages(), to not increase max_pfn and trigger 
> dma_addressing_limited() bounce buffers
>
> So, I think we need to bring the changes from 7170130e4c72ce0c into PPC.
>
>
> With that, the original hunk setting the bypass unconditionally in 
> 1471c517cf7dae can be
>
> changed like below, and remove the redundant 
> arch_dma_[alloc|free]_direct() functions.
>
>
> diff --git a/arch/powerpc/kernel/dma-iommu.c 
> b/arch/powerpc/kernel/dma-iommu.c
> index aa3689d61917..120800b962cf 100644
> --- a/arch/powerpc/kernel/dma-iommu.c
> +++ b/arch/powerpc/kernel/dma-iommu.c
> @@ -150,7 +150,7 @@ int dma_iommu_dma_supported(struct device *dev, 
> u64 mask)
>                  * 1:1 mapping but it is somehow limited.
>                  * ibm,pmemory is one example.
>                  */
> -               dev->dma_ops_bypass = dev->bus_dma_limit == 0;
> +               dev->dma_ops_bypass = (dev->bus_dma_limit == 0) || 
> (min_not_zero(mask, dev->bus_dma_limit) >= 
> dma_direct_get_required_mask(dev));
>                 if (!dev->dma_ops_bypass)
>                         dev_warn(dev,
>                                  "iommu: 64-bit OK but direct DMA is 
> limited by %llx\n",
>
> Suggested-by: Shivaprasad G Bhat <sbhat at linux.ibm.com>
>
>
> Thanks,
>
> Shivaprasad
>


More information about the Linuxppc-dev mailing list