[PATCH v6 11/12] mm/vmalloc: Hugepage vmalloc mappings

Nicholas Piggin npiggin at gmail.com
Sat Aug 22 02:05:35 AEST 2020


Excerpts from Eric Dumazet's message of August 22, 2020 1:38 am:
> 
> On 8/21/20 8:12 AM, Nicholas Piggin wrote:
>> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>> supports PMD sized vmap mappings.
>> 
>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
>> larger, and fall back to small pages if that was unsuccessful.
>> 
>> Allocations that do not use PAGE_KERNEL prot are not permitted to use huge
>> pages, because not all callers expect this (e.g., module allocations vs
>> strict module rwx).
>> 
>> This reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
>> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>> 
>> This can result in more internal fragmentation and memory overhead for a
>> given allocation, an option nohugevmalloc is added to disable at boot.
>> 
>>
> 
> Thanks for working on this stuff, I tried something similar in the past,
> but could not really do more than a hack.
> ( https://lkml.org/lkml/2016/12/21/285 )

Oh nice. It might be possible to do some ideas from your patch
still. Higher order pages smaller than PMD size, or the memory
policy stuff, perhaps.

> Note that __init alloc_large_system_hash() is used at boot time,
> when NUMA policy is spreading allocations over all NUMA nodes.
> 
> This means that on a dual node system, a hash table should be 50/50 spread.
> 
> With your patch, if a hashtable is exactly the size of one huge page,
> the location of this hashtable will be not balanced, this might have some
> unwanted impact.

In that case it shouldn't because it divides by the number of nodes,
but it will in general have a bit larger granularity in balancing than
smaller pages of course.

There's probably a better way to size these important hashes on NUMA. I
suspect most of the time you have a NUMA machine you actually would
prefer to use large pages now, even if it means taking up to 2MB more
memory per node per hash. It's not a great amount and the allocation 
size is rather arbitrary anyway.

Thanks,
Nick


More information about the Linuxppc-dev mailing list