[PATCH v2 00/11] Remove device private pages from physical address space

Jordan Niethe jniethe at nvidia.com
Wed Jan 14 16:41:46 AEDT 2026


Hi,

On 9/1/26 17:22, Matthew Brost wrote:
> On Fri, Jan 09, 2026 at 12:27:50PM +1100, Jordan Niethe wrote:
>> Hi
>> On 9/1/26 11:31, Matthew Brost wrote:
>>> On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
>>>> Hi,
>>>>
>>>> On 8/1/26 16:42, Jordan Niethe wrote:
>>>>> Hi,
>>>>>
>>>>> On 8/1/26 13:25, Jordan Niethe wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 8/1/26 05:36, Matthew Brost wrote:
>>>>>>>
>>>>>>> Thanks for the series. For some reason Intel's CI couldn't apply this
>>>>>>> series to drm-tip to get results [1]. I'll manually apply this
>>>>>>> and run all
>>>>>>> our SVM tests and get back you on results + review the changes here. For
>>>>>>> future reference if you want to use our CI system, the series must apply
>>>>>>> to drm-tip, feel free to rebase this series and just send to intel-xe
>>>>>>> list if you want CI
>>>>>>
>>>>>> Thanks, I'll rebase on drm-tip and send to the intel-xe list.
>>>>>
>>>>> For reference the rebase on drm-tip on the intel-xe list:
>>>>>
>>>>> https://patchwork.freedesktop.org/series/159738/
>>>>>
>>>>> Will watch the CI results.
>>>>
>>>> The series causes some failures in the intel-xe tests:
>>>> https://patchwork.freedesktop.org/series/159738/#rev4
>>>>
>>>> Working through the failures now.
>>>>
>>>
>>> Yea, I saw the failures. I haven't had time look at the patches on my
>>> end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
>>> not have bandwidth to look in depth until mid next week but digging is
>>> on my TODO list.
>>
>> Sure, that's completely fine. The failures seem pretty directly related to
>> the
>> series so I think I'll be able to make good progress.
>>
>> For example https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@xe_evict@evict-beng-small.html
>>
>> It looks like I missed that xe_pagemap_destroy_work() needs to be updated to
>> remove the call to devm_release_mem_region() now we are no longer reserving
>> a mem
>> region.
> 
> +1
> 
> So this is the one I’d be most concerned about [1].
> xe_exec_system_allocator is our SVM test, which does almost all the
> ridiculous things possible in user space to stress SVM. It’s blowing up
> in the core MM—but the source of the bug could be anywhere (e.g., Xe
> SVM, GPU SVM, migrate device layer, or core MM). I’ll try to help when I
> have bandwidth.
> 
> Matt
> 
> [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/shard-bmg-9/igt@xe_exec_system_allocator@threads-many-large-execqueues-free-nomemset.html

A similar fault in lruvec_stat_mod_folio can be repro'd if
memremap_device_private_pagemap() is called with NUMA_NO_NODE instead of 
(say)
numa_node_id() for the nid parameter.

The xe_svm driver uses devm_memremap_device_private_pagemap() which uses
dev_to_node() for the nid parameter. Suspect this is causing something 
similar
to happen.

When memremap_pages() calls pagemap_range() we have the following logic:

         if (nid < 0)
                 nid = numa_mem_id();

I think we might need to add this to memremap_device_private_pagemap() 
to handle
the NUMA_NO_NODE case. Still confirming.

Thanks,
Jordan.

> 
>>
>>
>> Thanks,
>> Jordan.
>>
>>>
>>> Matt
>>>
>>>> Thanks,
>>>> Jordan.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Jordan.
>>>>>
>>>>>>
>>>>>> Jordan.
>>>>>>
>>>>>>>
>>>>>>> I was also wondering if Nvidia could help review one our core MM patches
>>>>>>> [2] which is gating enabling 2M device pages too?
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> [1] https://patchwork.freedesktop.org/series/159738/
>>>>>>> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
>>>>>>
>>>>>>
>>>>>
>>>>
>>



More information about the Linuxppc-dev mailing list