[PATCH v2] powerpc/mm: Update default hugetlb size early
Aneesh Kumar K.V
aneesh.kumar at linux.ibm.com
Fri Feb 11 23:23:50 AEDT 2022
David Hildenbrand <david at redhat.com> writes:
> On 11.02.22 10:16, Aneesh Kumar K V wrote:
>> On 2/11/22 14:00, David Hildenbrand wrote:
>>> On 11.02.22 07:52, Aneesh Kumar K.V wrote:
>>>> commit: d9c234005227 ("Do not depend on MAX_ORDER when grouping pages by mobility")
>>>> introduced pageblock_order which will be used to group pages better.
>>>> The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT
>>>> should be set before we call set_pageblock_order.
>>>>
>>>> set_pageblock_order happens early in the boot and default hugetlb page size
>>>> should be initialized before that to compute the right pageblock_order value.
>>>>
>>>> Currently, default hugetlbe page size is set via arch_initcalls which happens
>>>> late in the boot as shown via the below callstack:
>>>>
>>>> [c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8
>>>> [c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320
>>>> [c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8
>>>> [c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c
>>>> [c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64
>>>>
>>>> and the pageblock_order initialization is done early during the boot.
>>>>
>>>> [c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64
>>>> [c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268
>>>> [c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328
>>>> [c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480
>>>> [c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934
>>>> [c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98
>>>>
>>>> delaying default hugetlb page size initialization implies the kernel will
>>>> initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal
>>>> value for mobility grouping. IIUC we always had this issue. But it was not
>>>> a problem for hash translation mode because (MAX_ORDER - 1) is the same as
>>>> HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix,
>>>> HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be
>>>> 5 instead of 8.
>>>
>>>
>>> A related question: Can we on ppc still have pageblock_order > MAX_ORDER
>>> - 1? We have some code for that and I am not so sure if we really need that.
>>>
>>
>> I also have been wondering about the same. On book3s64 I don't think we
>> need that support for both 64K and 4K page size because with hash
>> hugetlb size is MAX_ORDER -1. (16MB hugepage size)
>>
>> I am not sure about the 256K page support. Christophe may be able to
>> answer that.
>>
>> For the gigantic hugepage support we depend on cma based allocation or
>> firmware reservation. So I am not sure why we ever considered pageblock
>> > MAX_ORDER -1 scenario. If you have pointers w.r.t why that was ever
>> needed, I could double-check whether ppc64 is still dependent on that.
>
> commit dc78327c0ea7da5186d8cbc1647bd6088c5c9fa5
> Author: Michal Nazarewicz <mina86 at mina86.com>
> Date: Wed Jul 2 15:22:35 2014 -0700
>
> mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
>
> indicates that at least arm64 used to have cases for that as well.
>
> However, nowadays with ARM64_64K_PAGES we have FORCE_MAX_ZONEORDER=14 as
> default, corresponding to 512MiB.
>
> So I'm not sure if this is something worth supporting. If you want
> somewhat reliable gigantic pages, use CMA or preallocate them during boot.
>
> --
> Thanks,
>
> David / dhildenb
I could build a kernel with FORCE_MAX_ZONEORDER=8 and pageblock_order =
8. We need to disable THP for such a kernel to boot, because THP do
check for PMD_ORDER < MAX_ORDER. I was able to boot that kernel on a
virtualized platform, but then gigantic_page_runtime_supported is not
supported on such config with hash translation.
On non virtualized platform I am hitting crashes like below during boot.
[ 47.637865][ C42] =============================================================================
[ 47.637907][ C42] BUG pgtable-2^11 (Not tainted): Object already free
[ 47.637925][ C42] -----------------------------------------------------------------------------
[ 47.637925][ C42]
[ 47.637945][ C42] Allocated in __pud_alloc+0x84/0x2a0 age=278 cpu=40 pid=1409
[ 47.637974][ C42] __slab_alloc.isra.0+0x40/0x60
[ 47.637995][ C42] kmem_cache_alloc+0x1a8/0x510
[ 47.638010][ C42] __pud_alloc+0x84/0x2a0
[ 47.638024][ C42] copy_page_range+0x38c/0x1b90
[ 47.638040][ C42] dup_mm+0x548/0x880
[ 47.638058][ C42] copy_process+0xdc0/0x1e90
[ 47.638076][ C42] kernel_clone+0xd4/0x9d0
[ 47.638094][ C42] __do_sys_clone+0x88/0xe0
[ 47.638112][ C42] system_call_exception+0x368/0x3a0
[ 47.638128][ C42] system_call_common+0xec/0x250
[ 47.638147][ C42] Freed in __tlb_remove_table+0x1d4/0x200 age=263 cpu=57 pid=326
[ 47.638172][ C42] kmem_cache_free+0x44c/0x680
[ 47.638187][ C42] __tlb_remove_table+0x1d4/0x200
[ 47.638204][ C42] tlb_remove_table_rcu+0x54/0xa0
[ 47.638222][ C42] rcu_core+0xdd4/0x15d0
[ 47.638239][ C42] __do_softirq+0x360/0x69c
[ 47.638257][ C42] run_ksoftirqd+0x54/0xc0
[ 47.638273][ C42] smpboot_thread_fn+0x28c/0x2f0
[ 47.638290][ C42] kthread+0x1a4/0x1b0
[ 47.638305][ C42] ret_from_kernel_thread+0x5c/0x64
[ 47.638320][ C42] Slab 0xc00c00000000d600 objects=10 used=9 fp=0xc0000000035a8000 flags=0x7ffff000010201(locked|slab|head|node=0|zone=0|lastcpupid=0x7ffff)
[ 47.638352][ C42] Object 0xc0000000035a8000 @offset=163840 fp=0x0000000000000000
[ 47.638352][ C42]
[ 47.638373][ C42] Redzone c0000000035a4000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638394][ C42] Redzone c0000000035a4010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638414][ C42] Redzone c0000000035a4020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638435][ C42] Redzone c0000000035a4030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638455][ C42] Redzone c0000000035a4040: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638474][ C42] Redzone c0000000035a4050: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638494][ C42] Redzone c0000000035a4060: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638514][ C42] Redzone c0000000035a4070: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638534][ C42] Redzone c0000000035a4080: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
More information about the Linuxppc-dev
mailing list