[PATCH v4] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

Sourabh Jain sourabhjain at linux.ibm.com
Thu Mar 6 19:30:12 AEDT 2025




On 06/03/25 00:47, Ritesh Harjani (IBM) wrote:
> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>
>> Hello Ritesh,
>>
>>
>> On 04/03/25 10:27, Ritesh Harjani (IBM) wrote:
>>> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>>>
>>>> Hello Ritesh,
>>>>
>>>> Thanks for the review.
>>>>
>>>> On 02/03/25 12:05, Ritesh Harjani (IBM) wrote:
>>>>> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>>>>>
>>>>>> The fadump kernel boots with limited memory solely to collect the kernel
>>>>>> core dump. Having gigantic hugepages in the fadump kernel is of no use.
>>>>> Sure got it.
>>>>>
>>>>>> Many times, the fadump kernel encounters OOM (Out of Memory) issues if
>>>>>> gigantic hugepages are allocated.
>>>>>>
>>>>>> To address this, disable gigantic hugepages if fadump is active by
>>>>>> returning early from arch_hugetlb_valid_size() using
>>>>>> hugepages_supported(). When fadump is active, the global variable
>>>>>> hugetlb_disabled is set to true, which is later used by the
>>>>>> PowerPC-specific hugepages_supported() function to determine hugepage
>>>>>> support.
>>>>>>
>>>>>> Returning early from arch_hugetlb_vali_size() not only disables
>>>>>> gigantic hugepages but also avoids unnecessary hstate initialization for
>>>>>> every hugepage size supported by the platform.
>>>>>>
>>>>>> kernel logs related to hugepages with this patch included:
>>>>>> kernel argument passed: hugepagesz=1G hugepages=1
>>>>>>
>>>>>> First kernel: gigantic hugepage got allocated
>>>>>> ==============================================
>>>>>>
>>>>>> dmesg | grep -i "hugetlb"
>>>>>> -------------------------
>>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>>> HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
>>>>>>
>>>>>> $ cat /proc/meminfo | grep -i "hugetlb"
>>>>>> -------------------------------------
>>>>>> Hugetlb:         1048576 kB
>>>>> Was this tested with patch [1] in your local tree?
>>>>>
>>>>> [1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d629d7a8efc33
>>>>>
>>>>> IIUC, this patch [1] disables the boot time allocation of hugepages.
>>>>> Isn't it also disabling the boot time allocation for gigantic huge pages
>>>>> passed by the cmdline params like hugepagesz=1G and hugepages=2 ?
>>>> Yes, I had the patch [1] in my tree.
>>>>
>>>> My understanding is that gigantic pages are allocated before normal huge
>>>> pages.
>>>>
>>>> In hugepages_setup() in hugetlb.c, we have:
>>>>
>>>>        if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate))
>>>>            hugetlb_hstate_alloc_pages(parsed_hstate);
>>>>
>>>> I believe the above code allocates memory for gigantic pages, and
>>>> hugetlb_init() is
>>>> called later because it is a subsys_initcall.
>>>>
>>>> So, by the time the kernel reaches hugetlb_init(), the gigantic pages
>>>> are already
>>>> allocated. Isn't that right?
>>>>
>>>> Please let me know your opinion.
>>> Yes, you are right. We are allocating hugepages from memblock, however
>>> this isn't getting advertized anywhere. i.e. there is no way one can
>>> know from any user interface on whether hugepages were allocated or not.
>>> i.e. for fadump kernel when hugepagesz= and hugepages= params are
>>> passed, though it will allocate gigantic pages, it won't advertize this
>>> in meminfo or anywhere else. This was adding the confusion when I tested
>>> this (which wasn't clear from the commit msg either).
>>>
>>> And I guess this is happening during fadump kernel because of our patch
>>> [1], which added a check to see whether hugetlb_disabled is true in
>>> hugepages_supported(). Due to this hugetlb_init() is now not doing the
>>> rest of the initialization for those gigantic pages which were allocated
>>> due to cmdline options from hugepages_setup().
>>>
>>> [1]: https://lore.kernel.org/linuxppc-dev/20241202054310.928610-1-sourabhjain@linux.ibm.com/
>>>
>>> Now as we know from below that fadump can set hugetlb_disabled call in early_setup().
>>> i.e. fadump can mark hugetlb_disabled to true in
>>> early_setup() -> early_init_devtree() -> fadump_reserve_mem()
>>>
>>> And hugepages_setup() and hugepagesz_setup() gets called late in
>>> start_kernel() -> parse_args()
>>>
>>>
>>> And we already check for hugepages_supported() in all necessary calls in
>>> mm/hugetlb.c. So IMO, this check should go in mm/hugetlb.c in
>>> hugepagesz_setup() and hugepages_setup(). Because otherwise every arch
>>> implementation will end up duplicating this by adding
>>> hugepages_supported() check in their arch implementation of
>>> arch_hugetlb_valid_size().
>>>
>>> e.g. references of hugepages_supported() checks in mm/hugetlb.c
>>>
>>> mm/hugetlb.c hugetlb_show_meminfo_node 4959 if (!hugepages_supported())
>>> mm/hugetlb.c hugetlb_report_node_meminfo 4943 if (!hugepages_supported())
>>> mm/hugetlb.c hugetlb_report_meminfo 4914 if (!hugepages_supported())
>>> mm/hugetlb.c hugetlb_overcommit_handler 4848 if (!hugepages_supported())
>>> mm/hugetlb.c hugetlb_sysctl_handler_common 4809 if (!hugepages_supported())
>>> mm/hugetlb.c hugetlb_init 4461 if (!hugepages_supported()) {
>>> mm/hugetlb.c dissolve_free_hugetlb_folios 2211 if (!hugepages_supported())
>>> fs/hugetlbfs/inode.c init_hugetlbfs_fs 1604 if (!hugepages_supported()) {
>>>
>>>
>>> Let me also see the history on why this wasn't done earlier though...
>>>
>>> ... Oh actually there is more history to this. See [2]. We already had
>>> hugepages_supported() check in hugepages_setup() and other places
>>> earlier which was removed to fix some other problem in ppc ;)
>>>
>>> [2]: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2833a5bf75b3657c4dd20b3709c8c702754cb1f
>>>
>>>
>>> Hence I believe this needs a wider cleanup than just fixing it for our
>>> arch. I see there is a patch series already fixing these code paths,
>>> which is also cleaning up the path of gigantic hugepage allocation in
>>> hugepages_setup(). I think it is in mm-unstable branch too. Can we
>>> please review & test that to make sure that the fadump usecase of
>>> disabling hugepages & gigantic are getting covered properly?
>>>
>>> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-fvdl@google.com/
>> I evaluated the patch series [3] for the fadump issue, and here are my
>> observations:
>>
>> Currently, the patch series [3] does not fix the issue I am trying to
>> address with this patch.
>>
>> With patch series [3] applied, I see the following logs when booting the
>> fadump kernel with
>> hugepagesz=1G hugepages=1
>> |
>> |
>> With just Patch series [3]:
>> ------------------------------------
>>
>> kdump:/# dmesg | grep -i HugeTLB
>> [    0.000000] HugeTLB: allocating 10 of page size 1.00 GiB failed.
>> Only allocated 9 hugepages.
>> [    0.405964] HugeTLB support is disabled!
>> [    0.409162] HugeTLB: huge pages not supported, ignoring associated
>> command-line parameters
>> [    0.437740] hugetlbfs: disabling because there are no supported
>> hugepage sizes
>>
>> One good thing is that the kernel now at least reports the gigantic
>> pages allocated, which was
>> not the case before. I think patch series [3] introduced that improvement.
>>
>> Now, on top of patch series [3], I applied this fix, and the kernel
>> prints the following logs:
>>
>> Patch series [3] + this fix:
>> ------------------------------------
>>
>> [    0.000000] HugeTLB: unsupported hugepagesz=1G
>> [    0.000000] HugeTLB: hugepages=10 does not follow a valid hugepagesz,
>> ignoring
>> [    0.366158] HugeTLB support is disabled!
>> [    0.398004] hugetlbfs: disabling because there are no supported
>> hugepage sizes
>>
>> With these logs, one can clearly identify what is happening.
>>
>> What are your thoughts on this fix now?
>>
>> Do you still think handling this in generic code is better?
> I believe so yes (unless we have a valid reason for not doing that).
> hugepages_supported() is an arch specific call. If you see the prints
> above what we are essentially saying is that this is not a valid
> hugepagesz. But that's not the case really right. What it should just
> reflect is that the hugepages support is disabled.

Yeah better to just print hugetlb support is disabled.

>
> That being said, I will have to go and look into that series to suggest,
> where in that path it should use hugepages_supported() arch call to see
> if the hugepages are supported or not before initializing. And hopefully
> as you suggested since our cmdline parsing problem was already solved,
> it should not be a problem in using hugepages_supported() during cmdline
> parsing phase.
> But let me check that series and get back.
>
>
>> Given that I was already advised to handle things in arch
>> code. [4]
>>
>> Or should we keep it this way?
>> One advantage handling things in arch_hugetlb_valid_size() is that it helps
>> avoid populating hstates since it is not required anyway. I am not claiming
>> that it is not possible in generic code.
> IMO, even adding hugepages_supported() check at the right place should avoid
> populating hstates too. But let's confirm that.

Agree I will explore that.

Thanks,
Sourabh Jain

>
> -ritesh
>
>> Thoughts?
>>
>> Thanks,
>> Sourabh Jain
>>
>>
>> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-fvdl@google.com/
>> [4]: https://lore.kernel.org/all/20250122150613.28a92438@thinkpad-T15/



More information about the Linuxppc-dev mailing list