[PATCH v2 3/3] x86: Support huge vmalloc mappings

Christophe Leroy christophe.leroy at csgroup.eu
Sat Jan 15 21:15:34 AEDT 2022



Le 28/12/2021 à 17:14, Dave Hansen a écrit :
> On 12/28/21 2:26 AM, Kefeng Wang wrote:
>>>> There are some disadvantages about this feature[2], one of the main
>>>> concerns is the possible memory fragmentation/waste in some scenarios,
>>>> also archs must ensure that any arch specific vmalloc allocations that
>>>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX)
>>>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings.
>>> That just says that x86 *needs* PAGE_SIZE allocations.  But, what
>>> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)?  Will the
>>> subsequent permission changes just fragment the 2M mapping?
>>
>> Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping.
>>
>> When module alloc with STRICT_MODULE_RWX on x86, it calls
>> __change_page_attr()
>>
>> from set_memory_ro/rw/nx which will split large page, so there is no
>> need to make
>>
>> module alloc with HUGE_VMALLOC.
> 
> This all sounds very fragile to me.  Every time a new architecture would
> get added for huge vmalloc() support, the developer needs to know to go
> find that architecture's module_alloc() and add this flag.  They next
> guy is going to forget, just like you did.

That's not correct from my point of view.

When powerpc added that, a clear comment explains why:


+	/*
+	 * Don't do huge page allocations for modules yet until more testing
+	 * is done. STRICT_MODULE_RWX may require extra work to support this
+	 * too.
+	 */

So as you can see, this is something specific to powerpc and temporary.

> 
> Considering that this is not a hot path, a weak function would be a nice
> choice:
> 
> /* vmalloc() flags used for all module allocations. */
> unsigned long __weak arch_module_vm_flags()
> {
> 	/*
> 	 * Modules use a single, large vmalloc().  Different
> 	 * permissions are applied later and will fragment
> 	 * huge mappings.  Avoid using huge pages for modules.
> 	 */

Why ? Not everybody use STRICT_MODULES_RWX.
Even if you do so, you can still benefit from huge pages for modules.

Why make what was initially a temporary precaution for powerpc become a 
definitive default limitation for all ?

> 	return VM_NO_HUGE_VMAP;
> }
> 
> Stick that in some the common module code, next to:
> 
>> void * __weak module_alloc(unsigned long size)
>> {
>>          return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
> ...
> 
> Then, put arch_module_vm_flags() in *all* of the module_alloc()
> implementations, including the generic one.  That way (even with a new
> architecture) whoever copies-and-pastes their module_alloc()
> implementation is likely to get it right.  The next guy who just does a
> "select HAVE_ARCH_HUGE_VMALLOC" will hopefully just work.
> 
> VM_FLUSH_RESET_PERMS could probably be dealt with in the same way.


More information about the Linuxppc-dev mailing list