(bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
    Christophe Leroy 
    christophe.leroy at csgroup.eu
       
    Thu Oct  9 23:08:05 AEDT 2025
    
    
  
Le 09/10/2025 à 12:27, David Hildenbrand a écrit :
> On 09.10.25 12:01, Christophe Leroy wrote:
>>
>>
>> Le 09/10/2025 à 11:20, David Hildenbrand a écrit :
>>> On 09.10.25 11:16, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>>>>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>>>>
>>>>>>
>>>>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>>>>> --- a/mm/hugetlb.c
>>>>>>>>> +++ b/mm/hugetlb.c
>>>>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>>>>           BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>>>>> BITS_PER_BYTE <
>>>>>>>>>                   __NR_HPAGEFLAGS);
>>>>>>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>>>>           if (!hugepages_supported()) {
>>>>>>>>>               if (hugetlb_max_hstate ||
>>>>>>>>> default_hstate_max_huge_pages)
>>>>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>>>>> order)
>>>>>>>>>           }
>>>>>>>>>           BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>>>>           BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>>>>           h = &hstates[hugetlb_max_hstate++];
>>>>>>>>>           __mutex_init(&h->resize_lock, "resize mutex", &h-
>>>>>>>>>> resize_key);
>>>>>>>>>           h->order = order;
>>>>>>>
>>>>>>> We end up registering hugetlb folios that are bigger than
>>>>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger
>>>>>>> that
>>>>>>> (and if we have to support that).
>>>>>>>
>>>>>>
>>>>>> MAX_FOLIO_ORDER is defined as:
>>>>>>
>>>>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>>>>> #define MAX_FOLIO_ORDER        PUD_ORDER
>>>>>> #else
>>>>>> #define MAX_FOLIO_ORDER        MAX_PAGE_ORDER
>>>>>> #endif
>>>>>>
>>>>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>>>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>>>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>>>>
>>>>>>       hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>>>>
>>>>>> Gives:
>>>>>>
>>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>>>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>>>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>>>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>>>>
>>>>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The 
>>>>> existing
>>>>> folio_dump() code would not handle it correctly as well.
>>>>
>>>> I'm trying to dig into history and when looking at commit 4eb0716e868e
>>>> ("hugetlb: allow to free gigantic pages regardless of the
>>>> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
>>>> needed to be able to allocate gigantic pages at runtime. It is not
>>>> needed to reserve gigantic pages at boottime.
>>>>
>>>> What am I missing ?
>>>
>>> That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its
>>> name.
>>
>> In its name for sure, but the commit I mention says:
>>
>>       On systems without CONTIG_ALLOC activated but that support gigantic
>> pages,
>>       boottime reserved gigantic pages can not be freed at all.  This 
>> patch
>>       simply enables the possibility to hand back those pages to memory
>>       allocator.
> 
> Right, I think it was a historical artifact.
> 
>>
>> And one of the hunks is:
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7f7fbd8bd9d5b..7a1aa53d188d3 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -19,7 +19,7 @@ config ARM64
>>           select ARCH_HAS_FAST_MULTIPLIER
>>           select ARCH_HAS_FORTIFY_SOURCE
>>           select ARCH_HAS_GCOV_PROFILE_ALL
>> -       select ARCH_HAS_GIGANTIC_PAGE if CONTIG_ALLOC
>> +       select ARCH_HAS_GIGANTIC_PAGE
>>           select ARCH_HAS_KCOV
>>           select ARCH_HAS_KEEPINITRD
>>           select ARCH_HAS_MEMBARRIER_SYNC_CORE
>>
>> So I understand from the commit message that it was possible at that
>> time to have gigantic pages without ARCH_HAS_GIGANTIC_PAGE as long as
>> you didn't have to be able to free them during runtime.
> 
> Yes, I agree.
> 
>>
>>>
>>> Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant
>>> hugetlb config that allows for *gigantic pages*.
>>>
>>
>> We probably can, but I'd really like to understand history and how we
>> ended up in the situation we are now.
>> Because blind fixes often lead to more problems.
> 
> Yes, let's figure out how to to it cleanly.
> 
>>
>> If I follow things correctly I see a helper gigantic_page_supported()
>> added by commit 944d9fec8d7a ("hugetlb: add support for gigantic page
>> allocation at runtime").
>>
>> And then commit 461a7184320a ("mm/hugetlb: introduce
>> ARCH_HAS_GIGANTIC_PAGE") is added to wrap gigantic_page_supported()
>>
>> Then commit 4eb0716e868e ("hugetlb: allow to free gigantic pages
>> regardless of the configuration") changed gigantic_page_supported() to
>> gigantic_page_runtime_supported()
>>
>> So where are we now ?
> 
> In
> 
> commit fae7d834c43ccdb9fcecaf4d0f33145d884b3e5c
> Author: Matthew Wilcox (Oracle) <willy at infradead.org>
> Date:   Tue Feb 27 19:23:31 2024 +0000
> 
>      mm: add __dump_folio()
> 
> 
> We started assuming that a folio in the system (boottime, dynamic, 
> whatever)
> has a maximum of MAX_FOLIO_NR_PAGES.
> 
> Any other interpretation doesn't make any sense for MAX_FOLIO_NR_PAGES.
> 
> 
> So we have two questions:
> 
> 1) How to teach MAX_FOLIO_NR_PAGES that hugetlb supports gigantic pages
> 
> 2) How do we handle CONFIG_ARCH_HAS_GIGANTIC_PAGE
> 
> 
> We have the following options
> 
> (A) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE to something else that is
> clearer and add a new CONFIG_ARCH_HAS_GIGANTIC_PAGE.
> 
> (B) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE -> to something else 
> that is
> clearer and derive somehow else that hugetlb in that config supports 
> gigantic pages.
> 
> (c) Just use CONFIG_ARCH_HAS_GIGANTIC_PAGE if hugetlb on an architecture
> supports gigantic pages.
> 
> 
> I don't quite see why an architecture should be able to opt in into 
> dynamically
> allocating+freeing gigantic pages. That's just CONTIG_ALLOC magic and 
> not some
> arch-specific thing IIRC.
> 
> 
> Note that in mm/hugetlb.c it is
> 
>      #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>      #ifdef CONFIG_CONTIG_ALLOC
> 
> Meaning that at least the allocation side is guarded by CONTIG_ALLOC.
Yes but not the freeing since commit 4eb0716e868e ("hugetlb: allow to 
free gigantic pages regardless of the configuration")
> 
> So I think (C) is just the right thing to do.
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 0bfdaecaa8775..12c11eb9279d3 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -283,6 +283,8 @@ config HUGETLB_PMD_PAGE_TABLE_SHARING
>          def_bool HUGETLB_PAGE
>          depends on ARCH_WANT_HUGE_PMD_SHARE && SPLIT_PMD_PTLOCKS
> 
> +# An architecture must select this option if there is any mechanism 
> (esp. hugetlb)
> +# could obtain gigantic folios.
>   config ARCH_HAS_GIGANTIC_PAGE
>          bool
> 
> 
I gave it a try. That's not enough, it fixes the problem for 64 Mbytes 
pages and 256 Mbytes pages, but not for 1 Gbytes pages.
Max folio is defined by PUD_ORDER, but PUD_SIZE is 256 Mbytes so we need 
to make MAX_FOLIO larger. Do we change it to P4D_ORDER or is it too much 
? P4D_SIZE is 128 Gbytes
Christophe
    
    
More information about the Linuxppc-dev
mailing list