[PATCH v2] powerpc/mm: Update default hugetlb size early

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Fri Feb 11 23:23:50 AEDT 2022


David Hildenbrand <david at redhat.com> writes:

> On 11.02.22 10:16, Aneesh Kumar K V wrote:
>> On 2/11/22 14:00, David Hildenbrand wrote:
>>> On 11.02.22 07:52, Aneesh Kumar K.V wrote:
>>>> commit: d9c234005227 ("Do not depend on MAX_ORDER when grouping pages by mobility")
>>>> introduced pageblock_order which will be used to group pages better.
>>>> The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT
>>>> should be set before we call set_pageblock_order.
>>>>
>>>> set_pageblock_order happens early in the boot and default hugetlb page size
>>>> should be initialized before that to compute the right pageblock_order value.
>>>>
>>>> Currently, default hugetlbe page size is set via arch_initcalls which happens
>>>> late in the boot as shown via the below callstack:
>>>>
>>>> [c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8
>>>> [c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320
>>>> [c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8
>>>> [c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c
>>>> [c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64
>>>>
>>>> and the pageblock_order initialization is done early during the boot.
>>>>
>>>> [c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64
>>>> [c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268
>>>> [c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328
>>>> [c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480
>>>> [c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934
>>>> [c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98
>>>>
>>>> delaying default hugetlb page size initialization implies the kernel will
>>>> initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal
>>>> value for mobility grouping. IIUC we always had this issue. But it was not
>>>> a problem for hash translation mode because (MAX_ORDER - 1) is the same as
>>>> HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix,
>>>> HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be
>>>> 5 instead of 8.
>>>
>>>
>>> A related question: Can we on ppc still have pageblock_order > MAX_ORDER
>>> - 1? We have some code for that and I am not so sure if we really need that.
>>>
>> 
>> I also have been wondering about the same. On book3s64 I don't think we 
>> need that support for both 64K and 4K page size because with hash 
>> hugetlb size is MAX_ORDER -1. (16MB hugepage size)
>> 
>> I am not sure about the 256K page support. Christophe may be able to 
>> answer that.
>> 
>> For the gigantic hugepage support we depend on cma based allocation or
>> firmware reservation. So I am not sure why we ever considered pageblock 
>>  > MAX_ORDER -1 scenario. If you have pointers w.r.t why that was ever 
>> needed, I could double-check whether ppc64 is still dependent on that.
>
> commit dc78327c0ea7da5186d8cbc1647bd6088c5c9fa5
> Author: Michal Nazarewicz <mina86 at mina86.com>
> Date:   Wed Jul 2 15:22:35 2014 -0700
>
>     mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
>
> indicates that at least arm64 used to have cases for that as well.
>
> However, nowadays with ARM64_64K_PAGES we have FORCE_MAX_ZONEORDER=14 as
> default, corresponding to 512MiB.
>
> So I'm not sure if this is something worth supporting. If you want
> somewhat reliable gigantic pages, use CMA or preallocate them during boot.
>
> -- 
> Thanks,
>
> David / dhildenb

I could build a kernel with FORCE_MAX_ZONEORDER=8 and pageblock_order =
8. We need to disable THP for such a kernel to boot, because THP do
check for PMD_ORDER < MAX_ORDER. I was able to boot that kernel on a
virtualized platform, but then gigantic_page_runtime_supported is not
supported on such config with hash translation.

On non virtualized platform I am hitting crashes like below during boot.

[   47.637865][   C42] =============================================================================                                                                                                                                                                                                              
[   47.637907][   C42] BUG pgtable-2^11 (Not tainted): Object already free                                                                                     
[   47.637925][   C42] -----------------------------------------------------------------------------                                                           
[   47.637925][   C42]                                                                                                                                         
[   47.637945][   C42] Allocated in __pud_alloc+0x84/0x2a0 age=278 cpu=40 pid=1409                                                                             
[   47.637974][   C42]  __slab_alloc.isra.0+0x40/0x60                                                                                                          
[   47.637995][   C42]  kmem_cache_alloc+0x1a8/0x510                                                                                                           
[   47.638010][   C42]  __pud_alloc+0x84/0x2a0                                                                                                                 
[   47.638024][   C42]  copy_page_range+0x38c/0x1b90                                                                                                           
[   47.638040][   C42]  dup_mm+0x548/0x880                                                                                                                     
[   47.638058][   C42]  copy_process+0xdc0/0x1e90                                                                                                              
[   47.638076][   C42]  kernel_clone+0xd4/0x9d0                                                                                                                
[   47.638094][   C42]  __do_sys_clone+0x88/0xe0                                                                                                               
[   47.638112][   C42]  system_call_exception+0x368/0x3a0                                                                                                      
[   47.638128][   C42]  system_call_common+0xec/0x250                                                                                                          
[   47.638147][   C42] Freed in __tlb_remove_table+0x1d4/0x200 age=263 cpu=57 pid=326                                                                          
[   47.638172][   C42]  kmem_cache_free+0x44c/0x680                                                                                                            
[   47.638187][   C42]  __tlb_remove_table+0x1d4/0x200                                                                                                         
[   47.638204][   C42]  tlb_remove_table_rcu+0x54/0xa0                                                                                                         
[   47.638222][   C42]  rcu_core+0xdd4/0x15d0                                                                                                                  
[   47.638239][   C42]  __do_softirq+0x360/0x69c                                                                                                               
[   47.638257][   C42]  run_ksoftirqd+0x54/0xc0                                                                                                                
[   47.638273][   C42]  smpboot_thread_fn+0x28c/0x2f0                                                                                                          
[   47.638290][   C42]  kthread+0x1a4/0x1b0                                                                                                                    
[   47.638305][   C42]  ret_from_kernel_thread+0x5c/0x64                                                                                                       
[   47.638320][   C42] Slab 0xc00c00000000d600 objects=10 used=9 fp=0xc0000000035a8000 flags=0x7ffff000010201(locked|slab|head|node=0|zone=0|lastcpupid=0x7ffff)                                                                                                                                                              
[   47.638352][   C42] Object 0xc0000000035a8000 @offset=163840 fp=0x0000000000000000                                                                          
[   47.638352][   C42]                                                                                                                                         
[   47.638373][   C42] Redzone  c0000000035a4000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638394][   C42] Redzone  c0000000035a4010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638414][   C42] Redzone  c0000000035a4020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638435][   C42] Redzone  c0000000035a4030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638455][   C42] Redzone  c0000000035a4040: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638474][   C42] Redzone  c0000000035a4050: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638494][   C42] Redzone  c0000000035a4060: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638514][   C42] Redzone  c0000000035a4070: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            
[   47.638534][   C42] Redzone  c0000000035a4080: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................                                            


More information about the Linuxppc-dev mailing list