[PATCH v5] mm/hugetlb: ignore hugepage kernel args if hugepages are unsupported

Fri Dec 19 19:21:05 AEDT 2025

On 19/12/25 11:43, David Hildenbrand (Red Hat) wrote:
> On 12/18/25 14:06, Sourabh Jain wrote:
>>
>>
>> On 18/12/25 17:32, David Hildenbrand (Red Hat) wrote:
>>> On 12/18/25 12:41, Sourabh Jain wrote:
>>>> Skip processing hugepage kernel arguments (hugepagesz, hugepages, and
>>>> default_hugepagesz) when hugepages are not supported by the
>>>> architecture.
>>>>
>>>> Some architectures may need to disable hugepages based on conditions
>>>> discovered during kernel boot. The hugepages_supported() helper allows
>>>> architecture code to advertise whether hugepages are supported.
>>>>
>>>> Currently, normal hugepage allocation is guarded by
>>>> hugepages_supported(), but gigantic hugepages are allocated regardless
>>>> of this check. This causes problems on powerpc for fadump (firmware-
>>>> assisted dump).
>>>>
>>>> In the fadump (firmware-assisted dump) scenario, a production kernel
>>>> crash causes the system to boot into a special kernel whose sole
>>>> purpose is to collect the memory dump and reboot. Features such as
>>>> hugepages are not required in this environment and should be
>>>> disabled.
>>>>
>>>> For example, fadump kernel booting with the kernel arguments
>>>> default_hugepagesz=1GB hugepagesz=1GB hugepages=200 prints the
>>>> following logs:
>>>>
>>>> HugeTLB: allocating 200 of page size 1.00 GiB failed.  Only allocated
>>>> 58 hugepages.
>>>> HugeTLB support is disabled!
>>>> HugeTLB: huge pages not supported, ignoring associated command-line
>>>> parameters
>>>> hugetlbfs: disabling because there are no supported hugepage sizes
>>>>
>>>> Even though the logs say that hugetlb support is disabled, gigantic
>>>> hugepages are still getting allocated, which causes the fadump kernel
>>>> to run out of memory during boot.
>>>
>>> Yeah, that's suboptimal.
>>>
>>>>
>>>> To fix this, the gigantic hugepage allocation should come under
>>>> hugepages_supported().
>>>>
>>>> To bring gigantic hugepage allocation under hugepages_supported(), two
>>>> approaches were previously proposed:
>>>> [1] Check hugepages_supported() in the generic code before allocating
>>>> gigantic hugepages.
>>>> [2] Make arch_hugetlb_valid_size() return false for all hugetlb sizes.
>>>>
>>>> Approach [2] has two minor issues:
>>>> 1. It prints misleading logs about invalid hugepage sizes
>>>> 2. The kernel still processes hugepage kernel arguments unnecessarily
>>>>
>>>> To control gigantic hugepage allocation, it is proposed to skip
>>>> processing the hugepage kernel arguments (hugepagesz, hugepages, and
>>>> default_hugepagesz) when hugepages_support() returns false.
>>>
>>> You could briefly mention the new output here, so one has a
>>> before-after comparison.
>>
>> Here is the fadump kernel boot logs after this patch applied:
>> kernel command had: default_hugepagesz=1GB hugepagesz=1GB hugepages=200
>>
>> HugeTLB: hugepages unsupported, ignoring default_hugepagesz=1GB cmdline
>> HugeTLB: hugepages unsupported, ignoring hugepagesz=1GB cmdline
>> HugeTLB: hugepages unsupported, ignoring hugepages=200 cmdline
>> HugeTLB support is disabled!
>> hugetlbfs: disabling because there are no supported hugepage sizes
>>
>> I will wait for a day or two before sending v2 with the above logs
>> included in
>> the commit message.
>>
>>>
>>> Curious, should we at least add a Fixes: tag? Allocating memory when
>>> it's completely unusable sounds wrong.
>>
>> Not sure which commit I should use for Fixes. This issue has
>> been present for a long time, possibly since the beginning.
>
> I don't know the full history, but I would assume that support for 
> gigantic pages was added later?
>
> It would be great if you could dig a bit so we could add a Fixes:.

Sure, I will try to find it.

>
>>
>> I also noticed an interesting issue related to excessive memory
>> allocation, where the production/first kernel failed to boot.
>> While testing this patch, I configured a very high hugepages 
>> (hugepagesz=2M)
>> count, and the first kernel failed to boot as a result. I will report
>> this issue separately.
>
> I'd say that's rather expected: if you steal too much memory from the 
> kernel it will not be able to function. It's the same when using the 
> mem= parameter I would assume.
>
I reported this behavior as an issue yesterday; let's see what others 
think about it.
https://lore.kernel.org/all/cb9f3604-8a0a-478a-8bf7-2d139ccbc89d@linux.ibm.com/

Sourabh Jain