[PATCH v4] powerpc/hugetlb: Disable gigantic hugepages if fadump is active
Sourabh Jain
sourabhjain at linux.ibm.com
Wed Mar 5 16:48:06 AEDT 2025
Hello Ritesh,
On 04/03/25 10:27, Ritesh Harjani (IBM) wrote:
> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>
>> Hello Ritesh,
>>
>> Thanks for the review.
>>
>> On 02/03/25 12:05, Ritesh Harjani (IBM) wrote:
>>> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>>>
>>>> The fadump kernel boots with limited memory solely to collect the kernel
>>>> core dump. Having gigantic hugepages in the fadump kernel is of no use.
>>> Sure got it.
>>>
>>>> Many times, the fadump kernel encounters OOM (Out of Memory) issues if
>>>> gigantic hugepages are allocated.
>>>>
>>>> To address this, disable gigantic hugepages if fadump is active by
>>>> returning early from arch_hugetlb_valid_size() using
>>>> hugepages_supported(). When fadump is active, the global variable
>>>> hugetlb_disabled is set to true, which is later used by the
>>>> PowerPC-specific hugepages_supported() function to determine hugepage
>>>> support.
>>>>
>>>> Returning early from arch_hugetlb_vali_size() not only disables
>>>> gigantic hugepages but also avoids unnecessary hstate initialization for
>>>> every hugepage size supported by the platform.
>>>>
>>>> kernel logs related to hugepages with this patch included:
>>>> kernel argument passed: hugepagesz=1G hugepages=1
>>>>
>>>> First kernel: gigantic hugepage got allocated
>>>> ==============================================
>>>>
>>>> dmesg | grep -i "hugetlb"
>>>> -------------------------
>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>> HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
>>>>
>>>> $ cat /proc/meminfo | grep -i "hugetlb"
>>>> -------------------------------------
>>>> Hugetlb: 1048576 kB
>>> Was this tested with patch [1] in your local tree?
>>>
>>> [1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d629d7a8efc33
>>>
>>> IIUC, this patch [1] disables the boot time allocation of hugepages.
>>> Isn't it also disabling the boot time allocation for gigantic huge pages
>>> passed by the cmdline params like hugepagesz=1G and hugepages=2 ?
>> Yes, I had the patch [1] in my tree.
>>
>> My understanding is that gigantic pages are allocated before normal huge
>> pages.
>>
>> In hugepages_setup() in hugetlb.c, we have:
>>
>> if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate))
>> hugetlb_hstate_alloc_pages(parsed_hstate);
>>
>> I believe the above code allocates memory for gigantic pages, and
>> hugetlb_init() is
>> called later because it is a subsys_initcall.
>>
>> So, by the time the kernel reaches hugetlb_init(), the gigantic pages
>> are already
>> allocated. Isn't that right?
>>
>> Please let me know your opinion.
> Yes, you are right. We are allocating hugepages from memblock, however
> this isn't getting advertized anywhere. i.e. there is no way one can
> know from any user interface on whether hugepages were allocated or not.
> i.e. for fadump kernel when hugepagesz= and hugepages= params are
> passed, though it will allocate gigantic pages, it won't advertize this
> in meminfo or anywhere else. This was adding the confusion when I tested
> this (which wasn't clear from the commit msg either).
Yeah I should have added that in my commit message.
>
> And I guess this is happening during fadump kernel because of our patch
> [1], which added a check to see whether hugetlb_disabled is true in-
> hugepages_supported(). Due to this hugetlb_init() is now not doing the
> rest of the initialization for those gigantic pages which were allocated
> due to cmdline options from hugepages_setup().
Yeah patch [1] has disabled the hugetlb initialization.
>
> [1]: https://lore.kernel.org/linuxppc-dev/20241202054310.928610-1-sourabhjain@linux.ibm.com/
>
> Now as we know from below that fadump can set hugetlb_disabled call in early_setup().
> i.e. fadump can mark hugetlb_disabled to true in
> early_setup() -> early_init_devtree() -> fadump_reserve_mem()
>
> And hugepages_setup() and hugepagesz_setup() gets called late in
> start_kernel() -> parse_args()
>
>
> And we already check for hugepages_supported() in all necessary calls in
> mm/hugetlb.c. So IMO, this check should go in mm/hugetlb.c in
> hugepagesz_setup() and hugepages_setup(). Because otherwise every arch
> implementation will end up duplicating this by adding
> hugepages_supported() check in their arch implementation of
> arch_hugetlb_valid_size().
>
> e.g. references of hugepages_supported() checks in mm/hugetlb.c
>
> mm/hugetlb.c hugetlb_show_meminfo_node 4959 if (!hugepages_supported())
> mm/hugetlb.c hugetlb_report_node_meminfo 4943 if (!hugepages_supported())
> mm/hugetlb.c hugetlb_report_meminfo 4914 if (!hugepages_supported())
> mm/hugetlb.c hugetlb_overcommit_handler 4848 if (!hugepages_supported())
> mm/hugetlb.c hugetlb_sysctl_handler_common 4809 if (!hugepages_supported())
> mm/hugetlb.c hugetlb_init 4461 if (!hugepages_supported()) {
> mm/hugetlb.c dissolve_free_hugetlb_folios 2211 if (!hugepages_supported())
> fs/hugetlbfs/inode.c init_hugetlbfs_fs 1604 if (!hugepages_supported()) {
v1 proposed to do it in generic code:
https://lore.kernel.org/all/20250121150419.1342794-1-sourabhjain@linux.ibm.com/
But it was suggested to try it in arch specific code.
>
>
> Let me also see the history on why this wasn't done earlier though...
>
> ... Oh actually there is more history to this. See [2]. We already had
> hugepages_supported() check in hugepages_setup() and other places
> earlier which was removed to fix some other problem in ppc ;)
>
> [2]: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2833a5bf75b3657c4dd20b3709c8c702754cb1f
>
IIUC, I think it was done because the HPAGE_SHIFT was not initialized
early enough
on powrepc. But it is no longer the case after the below commit:
2354ad252b66 ("powerpc/mm: Update default hugetlb size early")
> Hence I believe this needs a wider cleanup than just fixing it for our
> arch. I see there is a patch series already fixing these code paths,
> which is also cleaning up the path of gigantic hugepage allocation in
> hugepages_setup(). I think it is in mm-unstable branch too. Can we
> please review & test that to make sure that the fadump usecase of
> disabling hugepages & gigantic are getting covered properly?
>
> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-fvdl@google.com/
>
>
> Thoughts?
Agree, if there is cleanup happening already we should try to handle things
there itself. I will review the patch series [3].
Thanks, Ritesh, for the detailed review! Appreciate your time and effort.
Thanks,
Sourabh Jain
>
> -ritesh
>
>> Thanks,
>> Sourabh Jain
>>
>>
>>>
>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
>>> This print comes from report_hugepages(). The only place from where
>>> report_hugepages() gets called is hugetlb_init(). hugetlb_init() is what
>>> is responsible for hugepages & gigantic hugepage allocations of the
>>> passed kernel cmdline params.
>>>
>>> But hugetlb_init() already checks for hugepages_supported() in the very
>>> beginning. So I am not sure whether we need this extra patch to disable
>>> gigantic hugepages allocation by the kernel cmdline params like
>>> hugepagesz=1G and hugepages=2 type of options.
>>>
>>> Hence I was wondering if you had this patch [1] in your tree when you were
>>> testing this?
>>>
>>> But I may be missing something. Could you please help clarify on whether
>>> we really need this patch to disable gigantic hugetlb page allocations?
>>>
>>>> Fadump kernel: gigantic hugepage not allocated
>>>> ===============================================
>>>>
>>>> dmesg | grep -i "hugetlb"
>>>> -------------------------
>>>> [ 0.000000] HugeTLB: unsupported hugepagesz=1G
>>>> [ 0.000000] HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
>>>> [ 0.706375] HugeTLB support is disabled!
>>>> [ 0.773530] hugetlbfs: disabling because there are no supported hugepage sizes
>>>>
>>>> $ cat /proc/meminfo | grep -i "hugetlb"
>>>> ----------------------------------
>>>> <Nothing>
> What I meant was, when we pass hugepagesz= and hugepages= and fadump=on,
> then during the fadump kernel, we still will see the above prints and
> nothing in meminfo even w/o this patch (because the user interfaces will
> be disabled since hugetlb_supported() is false).
> This observation should have been mentioned in the commit msg to avoid
> the confusion :)
>
>
>>>> Cc: Hari Bathini <hbathini at linux.ibm.com>
>>>> Cc: Madhavan Srinivasan <maddy at linux.ibm.com>
>>>> Cc: Mahesh Salgaonkar <mahesh at linux.ibm.com>
>>>> Cc: Michael Ellerman <mpe at ellerman.id.au>
>>>> Cc: Ritesh Harjani (IBM)" <ritesh.list at gmail.com>
>>> I guess the extra " in the above was not adding me in the cc list.
>>> Hence I missed to see this patch early.
>> Thanks for point it out. I will fix it.
>>
>>
>>> -ritesh
>>>
>>>
>>>> Reviewed-by: Christophe Leroy <christophe.leroy at csgroup.eu>
>>>> Signed-off-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>>>> ---
>>>> Changelog:
>>>>
>>>> v1:
>>>> https://lore.kernel.org/all/20250121150419.1342794-1-sourabhjain@linux.ibm.com/
>>>>
>>>> v2:
>>>> https://lore.kernel.org/all/20250124103220.111303-1-sourabhjain@linux.ibm.com/
>>>> - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()
>>>>
>>>> v3:
>>>> https://lore.kernel.org/all/20250125104928.88881-1-sourabhjain@linux.ibm.com/
>>>> - Do not modify the initialization of the shift variable
>>>>
>>>> v4:
>>>> - Update commit message to include how hugepages_supported() detects
>>>> hugepages support when fadump is active
>>>> - Add Reviewed-by tag
>>>> - NO functional change
>>>>
>>>> ---
>>>> arch/powerpc/mm/hugetlbpage.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>>>> index 6b043180220a..88cfd182db4e 100644
>>>> --- a/arch/powerpc/mm/hugetlbpage.c
>>>> +++ b/arch/powerpc/mm/hugetlbpage.c
>>>> @@ -138,6 +138,9 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>>>> int shift = __ffs(size);
>>>> int mmu_psize;
>>>>
>>>> + if (!hugepages_supported())
>>>> + return false;
>>>> +
>>>> /* Check that it is a page size supported by the hardware and
>>>> * that it fits within pagetable and slice limits. */
>>>> if (size <= PAGE_SIZE || !is_power_of_2(size))
>>>> --
>>>> 2.48.1
More information about the Linuxppc-dev
mailing list