[PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
Vlastimil Babka
vbabka at suse.cz
Thu Jan 22 19:00:49 AEDT 2026
On 1/22/26 08:19, Matthew Brost wrote:
> On Tue, Jan 20, 2026 at 10:01:18PM -0500, Zi Yan wrote:
>> On 20 Jan 2026, at 8:53, Jason Gunthorpe wrote:
>>
>
> This whole thread makes my head hurt, as does core MM.
>
> IMO the TL;DR is:
>
> - Why is Intel the only one proving this stuff works? We can debate all
> day about what should or should not work — but someone else needs to
> actually prove it.i, rather than type hypotheticals.
>
> - Intel has demonstrated that this works and is still getting blocked.
>
> - This entire thread is about a fixes patch for large device pages.
> Changing prep_compound_page is completely out of scope for a fixes
> patch, and honestly so is most of the rest of what’s being proposed.
FWIW I'm ok if this lands as a fix patch, and perceived the discussion to be
about how refactor things more properly afterwards, going forward.
> - At a minimum, you must clear every page’s flags in the loop. So why not
> conservatively clear anything else a folio might have set before calling
> an existing core-MM function, ensuring the pages are in a known state?
> This is a fixes patch.
>
> - Given the current state of the discussion, I don’t think large device
> pages should be in 6.19. And if so, why didn’t the entire device pages
> series receive this level of scrutiny earlier? It’s my mistake for not
> saying “no” until the reallocation at different sizes issue was resolved.
>
> @Andrew. - I'd revert large device pages in 6.19 as it doesn't work and
> we seemly cannot close on this.
>
> Matt
>
>> > On Mon, Jan 19, 2026 at 09:50:16PM -0500, Zi Yan wrote:
>> >>>> I suppose we want some prep_single_page(page) and some reorg to share
>> >>>> code with the other prep function.
>> >>
>> >> This is just an unnecessary need due to lack of knowledge of/do not want
>> >> to investigate core MM page and folio initialization code.
>> >
>> > It will be better to keep this related code together, not spread all
>> > around.
>>
>> Or clarify what code is for preparing pages, which would go away at memdesc
>> time, and what code is for preparing folios, which would stay.
>>
>> >
>> >>>> I don't think so. It should do the above job efficiently and iterate
>> >>>> over the page list exactly once.
>> >>
>> >> folio initialization should not iterate over any page list, since folio is
>> >> supposed to be treated as a whole instead of individual pages.
>> >
>> > The tail pages need to have the right data in them or compound_head
>> > won't work.
>>
>> That is done by set_compound_head() in prep_compound_tail().
>> prep_compound_page() take cares of it. As long as it is called, even if
>> the pages in that compound page have random states before, the compound
>> page should function correctly afterwards.
>>
>> >
>> >> folio->mapping = NULL;
>> >> folio->memcg_data = 0;
>> >> folio->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
>> >>
>> >> should be enough.
>> >
>> > This seems believable to me for setting up an order 0 page.
>>
>> It works for any folio, regardless of its order. fields used in second
>> or third subpages are all taken care of by prep_compound_page().
>>
>> >
>> >> if (order)
>> >> folio_set_large_rmappable(folio);
>> >
>> > That one is in zone_device_folio_init()
>>
>> Yes. And the code location looks right to me.
>>
>> >
>> > And maybe the naming has got really confused if we have both functions
>> > now :\
>>
>> Yes. One of the issues is that device private code used to only handles
>> order-0 pages and was converted to use high order folio directly without
>> using high order page (namely compound page) as an intermediate step.
>> This two-step-in-one caused confusion. But the key thing to avoid the
>> confusion is that to form a high order folio, a list of contiguous pages
>> would become a compound page by calling prep_compound_page(), then
>> the compound page becomes a folio by calling folio_set_large_rmappable().
>>
>> BTW, the code in prep_compound_head() after folio_set_order(folio, order)
>> should belong to folio_set_large_rmappable() and they are causing confusion,
>> since they are only applicable to rmappable large folios. I am going to
>> send a patch to fix it.
>>
>>
>> Best Regards,
>> Yan, Zi
More information about the Linuxppc-dev
mailing list