[PATCH v4 1/7] mm/zone_device: Add order argument to folio_free callback
Matthew Brost
matthew.brost at intel.com
Tue Jan 13 08:49:06 AEDT 2026
On Mon, Jan 12, 2026 at 09:45:10AM -0400, Jason Gunthorpe wrote:
Hi, catching up here.
> On Sun, Jan 11, 2026 at 07:51:01PM -0500, Zi Yan wrote:
> > On 11 Jan 2026, at 19:19, Balbir Singh wrote:
> >
> > > On 1/12/26 08:35, Matthew Wilcox wrote:
> > >> On Sun, Jan 11, 2026 at 09:55:40PM +0100, Francois Dugast wrote:
> > >>> The core MM splits the folio before calling folio_free, restoring the
> > >>> zone pages associated with the folio to an initialized state (e.g.,
> > >>> non-compound, pgmap valid, etc...). The order argument represents the
> > >>> folio’s order prior to the split which can be used driver side to know
> > >>> how many pages are being freed.
> > >>
> > >> This really feels like the wrong way to fix this problem.
> > >>
> >
> > Hi Matthew,
> >
> > I think the wording is confusing, since the actual issue is that:
> >
> > 1. zone_device_page_init() calls prep_compound_page() to form a large folio,
> > 2. but free_zone_device_folio() never reverse the course,
> > 3. the undo of prep_compound_page() in free_zone_device_folio() needs to
> > be done before driver callback ->folio_free(), since once ->folio_free()
> > is called, the folio can be reallocated immediately,
> > 4. after the undo of prep_compound_page(), folio_order() can no longer provide
> > the original order information, thus, folio_free() needs that for proper
> > device side ref manipulation.
>
> There is something wrong with the driver if the "folio can be
> reallocated immediately".
>
> The flow generally expects there to be a driver allocator linked to
> folio_free()
>
> 1) Allocator finds free memory
> 2) zone_device_page_init() allocates the memory and makes refcount=1
> 3) __folio_put() knows the recount 0.
> 4) free_zone_device_folio() calls folio_free(), but it doesn't
> actually need to undo prep_compound_page() because *NOTHING* can
> use the page pointer at this point.
Correct—nothing can use the folio prior to calling folio_free(). Once
folio_free() returns, the driver side is free to immediately reallocate
the folio (or a subset of its pages).
> 5) Driver puts the memory back into the allocator and now #1 can
> happen. It knows how much memory to put back because folio->order
> is valid from #2
> 6) #1 happens again, then #2 happens again and the folio is in the
> right state for use. The successor #2 fully undoes the work of the
> predecessor #2.
>
> If you have races where #1 can happen immediately after #3 then the
> driver design is fundamentally broken and passing around order isn't
> going to help anything.
>
The above race does not exist; if it did, I agree we’d be solving
nothing here.
> If the allocator is using the struct page memory then step #5 should
> also clean up the struct page with the allocator data before returning
> it to the allocator.
>
We could move the call to free_zone_device_folio_prepare() [1] into the
driver-side implementation of ->folio_free() and drop the order argument
here. Zi didn’t particularly like that; he preferred calling
free_zone_device_folio_prepare() [2] before invoking ->folio_free(),
which is why this patch exists.
FWIW, I do not have a strong opinion here—either way works. Xe doesn’t
actually need the order regardless of where
free_zone_device_folio_prepare() is called, but Nouveau does need the
order if free_zone_device_folio_prepare() is called before
->folio_free().
[1] https://patchwork.freedesktop.org/patch/697877/?series=159120&rev=4
[2] https://patchwork.freedesktop.org/patch/697709/?series=159120&rev=3#comment_1282405
> I vaugely remember talking about this before in the context of the Xe
> driver.. You can't just take an existing VRAM allocator and layer it
> on top of the folios and have it broadly ignore the folio_free
> callback.
>
We are definitely not ignoring the ->folio_free callback—that is the
point at which we tell our VRAM allocator (DRM buddy) it is okay to
release the allocation and make it available for reuse.
Matt
> Jsaon
More information about the Linuxppc-dev
mailing list