[PATCH v6 13/13] mm: Remove device private pages from the physical address space
David Hildenbrand (Arm)
david at kernel.org
Sat Mar 7 03:11:13 AEDT 2026
On 2/2/26 12:36, Jordan Niethe wrote:
> The existing design of device private memory imposes limitations which
> render it non functional for certain systems and configurations where
> the physical address space is limited.
>
> Device private memory is implemented by first reserving a region of the
> physical address space. This is a problem. The physical address space is
> not a resource that is directly under the kernel's control. Availability
> of suitable physical address space is constrained by the underlying
> hardware and firmware and may not always be available.
>
> Device private memory assumes that it will be able to reserve a device
> memory sized chunk of physical address space. However, there is nothing
> guaranteeing that this will succeed, and there a number of factors that
> increase the likelihood of failure. We need to consider what else may
> exist in the physical address space. It is observed that certain VM
> configurations place very large PCI windows immediately after RAM. Large
> enough that there is no physical address space available at all for
> device private memory. This is more likely to occur on 43 bit physical
> width systems which have less physical address space.
>
> Instead of using the physical address space, introduce a device private
> address space and allocate devices regions from there to represent the
> device private pages.
>
> Introduce a new interface memremap_device_private_pagemap() that
> allocates a requested amount of device private address space and creates
> the necessary device private pages.
>
> To support this new interface, struct dev_pagemap needs some changes:
>
> - Add a new dev_pagemap::nr_pages field as an input parameter.
> - Add a new dev_pagemap::pages array to store the device
> private pages.
>
> When using memremap_device_private_pagemap(), rather then passing in
> dev_pagemap::ranges[dev_pagemap::nr_ranges] of physical address space to
> be remapped, dev_pagemap::nr_ranges will always be 1, and the device
> private range that is reserved is returned in dev_pagemap::range.
>
> Forbid calling memremap_pages() with dev_pagemap::ranges::type =
> MEMORY_DEVICE_PRIVATE.
>
> Represent this device private address space using a new
> device_private_pgmap_tree maple tree. This tree maps a given device
> private address to a struct dev_pagemap, where a specific device private
> page may then be looked up in that dev_pagemap::pages array.
>
> Device private address space can be reclaimed and the assoicated device
> private pages freed using the corresponding new
> memunmap_device_private_pagemap() interface.
>
> Because the device private pages now live outside the physical address
> space, they no longer have a normal PFN. This means that page_to_pfn(),
> et al. are no longer meaningful.
>
> Introduce helpers:
>
> - device_private_page_to_offset()
> - device_private_folio_to_offset()
>
> to take a given device private page / folio and return its offset within
> the device private address space.
>
> Update the places where we previously converted a device private page to
> a PFN to use these new helpers. When we encounter a device private
> offset, instead of looking up its page within the pagemap use
> device_private_offset_to_page() instead.
>
> Update the existing users:
>
> - lib/test_hmm.c
> - ppc ultravisor
> - drm/amd/amdkfd
> - gpu/drm/xe
> - gpu/drm/nouveau
>
> to use the new memremap_device_private_pagemap() interface.
>
> Acked-by: Felix Kuehling <felix.kuehling at amd.com>
> Reviewed-by: Zi Yan <ziy at nvidia.com> # for MM changes
> Signed-off-by: Jordan Niethe <jniethe at nvidia.com>
> Signed-off-by: Alistair Popple <apopple at nvidia.com>
>
> ---
> v1:
> - Include NUMA node paramater for memremap_device_private_pagemap()
> - Add devm_memremap_device_private_pagemap() and friends
> - Update existing users of memremap_pages():
> - ppc ultravisor
> - drm/amd/amdkfd
> - gpu/drm/xe
> - gpu/drm/nouveau
> - Update for HMM huge page support
> - Guard device_private_offset_to_page and friends with CONFIG_ZONE_DEVICE
>
> v2:
> - Make sure last member of struct dev_pagemap remains DECLARE_FLEX_ARRAY(struct range, ranges);
>
> v3:
> - Use numa_mem_id() if memremap_device_private_pagemap is called with
> NUMA_NO_NODE. This fixes a null pointer deref in
> lruvec_stat_mod_folio().
> - drm/xe: Remove call to devm_release_mem_region() in xe_pagemap_destroy_work()
> - s/VM_BUG/VM_WARN/
>
> v4:
> - Use devm_memunmap_device_private_pagemap() in
> xe_pagemap_destroy_work()
> - Replace ^ with != for PVMW_DEVICE_PRIVATE comparisions
> - Minor style changes
> - remove discussion of aarch64 from commit message - not relevant post
> eeb8fdfcf090 ("arm64: Expose the end of the linear map in PHYSMEM_END")
>
> v6:
> - Fix maybe unused in kgd2kfd_init_zone_device()
> - Replace division by PAGE_SIZE with DIV_ROUND_UP() when setting
> nr_pages. This mirrors the align up that previously happened in
> get_free_mem_region()
> ---
There is just too much in this patch to review it reasonably.
You should probably have a patch that just introduces the helpers and
have them just do what we to today.
E.g., device_private_page_to_offset() would just do a pfn_to_page().
Then you can convert individual core-mm pieces that I people can review
them making their brain hurt.
Afterwards, you can have a patch that does the real "mm: Remove device
private pages from the physical address space" and doesn't have to touch
too many core-mm pieces.
[...]
> diff --git a/mm/util.c b/mm/util.c
> index 65e3f1a97d76..8482ebc5c394 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1244,7 +1244,10 @@ void snapshot_page(struct page_snapshot *ps, const struct page *page)
> struct folio *foliop;
> int loops = 5;
>
> - ps->pfn = page_to_pfn(page);
> + if (is_device_private_page(page))
> + ps->pfn = device_private_page_to_offset(page);
> + else
> + ps->pfn = page_to_pfn(page);
> ps->flags = PAGE_SNAPSHOT_FAITHFUL;
Why is that not done by the caller?
--
Cheers,
David
More information about the Linuxppc-dev
mailing list