amdgpu driver fails to initialize on ppc64le in 7.0-rc1 and newer

Timothy Pearson tpearson at raptorengineering.com
Mon Mar 23 11:30:04 AEDT 2026



----- Original Message -----
> From: "Ritesh Harjani" <ritesh.list at gmail.com>
> To: "Dan Horák" <dan at danny.cz>, "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>, "Gaurav Batra" <gbatra at linux.ibm.com>
> Cc: "amd-gfx" <amd-gfx at lists.freedesktop.org>, "Donet Tom" <donettom at linux.ibm.com>
> Sent: Saturday, March 14, 2026 11:25:11 PM
> Subject: Re: amdgpu driver fails to initialize on ppc64le in 7.0-rc1 and newer

> Dan Horák <dan at danny.cz> writes:
> 
> +cc Gaurav,
> 
>> Hi,
>>
>> starting with 7.0-rc1 (meaning 6.19 is OK) the amdgpu driver fails to
>> initialize on my Linux/ppc64le Power9 based system (with Radeon Pro WX4100)
>> with the following in the log
>>
>> ...
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: GART: 256M
>> 0x000000FF00000000 - 0x000000FF0FFFFFFF
> 
>                  ^^^^
> So looks like this is a PowerNV (Power9) machine.
> 
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Detected VRAM
>> RAM=4096M, BAR=4096M
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] RAM width
>> 128bits GDDR5
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: iommu: 64-bit OK but
>> direct DMA is limited by 0
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:
>> dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  4096M of VRAM
>> memory ready
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  32570M of GTT
>> memory ready.
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed to
>> allocate kernel bo
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Debug VRAM
>> access will use slowpath MM access
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] GART: num cpu
>> pages 4096, num gpu pages 65536
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] PCIE GART of
>> 256M enabled (table at 0x000000F4FFF80000).
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed to
>> allocate kernel bo
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) create WB bo
>> failed
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:
>> amdgpu_device_wb_init failed -12
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:
>> amdgpu_device_ip_init failed
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: Fatal error during
>> GPU init
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: finishing device.
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: probe with driver
>> amdgpu failed with error -12
>> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  ttm finalized
>> ...
>>
>> After some hints from Alex and bisecting and other investigation I have
>> found that
>> https://github.com/torvalds/linux/commit/1471c517cf7dae1a6342fb821d8ed501af956dd0
>> is the culprit and reverting it makes amdgpu load (and work) again.
> 
> Thanks for confirming this. Yes, this was recently added [1]
> 
> [1]:
> https://lore.kernel.org/linuxppc-dev/20251107161105.85999-1-gbatra@linux.ibm.com/

As this patch appears to be primarily aimed at improving performance, and has introduced a serious regression into the kernel for a large number of active users of the PowerNV platform, I would kindly ask that it be reverted until it can be reworked not to break PowerNV support.  Bear in mind there are other devices that are 40 bit DMA limited, and they are also likely to break on Linux 7.0.

Thank you!


More information about the Linuxppc-dev mailing list