[PATCH v4 4/6] mm/hotplug: Allow pageblock alignment via altmap reservation

Tue Jul 25 02:02:20 AEST 2023

On 7/24/23 9:11 PM, David Hildenbrand wrote:
> On 24.07.23 17:16, Aneesh Kumar K V wrote:
> 
>>>
>>> /*
>>>   * In "forced" memmap_on_memory mode, we always align the vmemmap size up to cover
>>>   * full pageblocks. That way, we can add memory even if the vmemmap size is not properly
>>>   * aligned, however, we might waste memory.
>>>   */
>>
>> I am finding that confusing. We do want things to be pageblock_nr_pages aligned both ways.
>> With MEMMAP_ON_MEMORY_FORCE, we do that by allocating more space for memmap and
>> in the default case we do that by making sure only memory blocks of specific size supporting
>> that alignment can use MEMMAP_ON_MEMORY feature.
> 
> See the usage inm hp_supports_memmap_on_memory(), I guess that makes sense then.
> 
> But if you have any ideas on how to clarify that (terminology), I'm all ears!
> 

I updated the commit message 

mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks

Currently, memmap_on_memory feature is only supported with memory block
sizes that result in vmemmap pages covering full page blocks. This is
because memory onlining/offlining code requires applicable ranges to be
pageblock-aligned, for example, to set the migratetypes properly.

This patch helps to lift that restriction by reserving more pages than
required for vmemmap space. This helps to align the start addr to be
page block aligned with different memory block sizes. This implies the
kernel will be reserving some pages for every memoryblock. This also
allows the memmap on memory feature to be widely useful with different
memory block size values.

For ex: with 64K page size and 256MiB memory block size, we require 4
pages to map vmemmap pages, To align things correctly we end up adding a
reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.

Also while implementing your  suggestion to use memory_block_memmap_on_memory_size()
I am finding it not really useful because in mhp_supports_memmap_on_memory() we are checking
if remaining_size is pageblock_nr_pages aligned (dax_kmem may want to use that helper
later). Also I still think altmap.reserve is easier because of the start_pfn calculation.
(more on this below)

> [...]
> 
>>>> +    return arch_supports_memmap_on_memory(size);
>>>>    }
>>>>      /*
>>>> @@ -1311,7 +1391,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>>>    {
>>>>        struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>>>        enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>>>> -    struct vmem_altmap mhp_altmap = {};
>>>> +    struct vmem_altmap mhp_altmap = {
>>>> +        .base_pfn =  PHYS_PFN(res->start),
>>>> +        .end_pfn  =  PHYS_PFN(res->end),
>>>> +        .reserve  = memory_block_align_base(resource_size(res)),
>>>
>>> Can you remind me why we have to set reserve here at all?
>>>
>>> IOW, can't we simply set
>>>
>>> .free = memory_block_memmap_on_memory_size();
>>>
>>> end then pass
>>>
>>> mhp_altmap.alloc + mhp_altmap.free
>>>
>>> to create_memory_block_devices() instead?
>>>
>>
>> But with the dax usage of altmap, altmap->reserve is what we use to reserve things to get
>> the required alignment. One difference is where we allocate the struct page at. For this specific
>> case it should not matter.
>>
>> static unsigned long __meminit vmem_altmap_next_pfn(struct vmem_altmap *altmap)
>> {
>>     return altmap->base_pfn + altmap->reserve + altmap->alloc
>>         + altmap->align;
>> }
>>
>> And other is where we online a memory block
>>
>> We find the start pfn using mem->altmap->alloc + mem->altmap->reserve;
>>
>> Considering altmap->reserve is what dax pfn_dev use, is there a reason you want to use altmap->free for this?
> 
> "Reserve" is all about "reserving that much memory for driver usage".
> 
> We don't care about that. We simply want vmemmap allocations coming from the pageblock(s) we set aside. Where exactly, we don't care.
> 
>> I find it confusing to update free when we haven't allocated any altmap blocks yet.
> 
> "
> @reserve: pages mapped, but reserved for driver use (relative to @base)"
> @free: free pages set aside in the mapping for memmap storage
> @alloc: track pages consumed, private to vmemmap_populate()
> "
> 
> To me, that implies that we can ignore "reserve". We set @free to the aligned value and let the vmemmap get allocated from anything in there.
> 
> free + alloc should always sum up to our set-aside pageblock(s), no?
> 
>

The difference is 

 mhp_altmap.free = PHYS_PFN(size) - reserved blocks;

ie, with 256MiB memory block size with 64K pages, we need 4 memmap pages and we reserve 28 pages for aligment.

mhp_altmap.free = PHYS_PFN(size) - 28. 

So that 4 pages from which we are allocating the memmap pages are still counted in free page.

We could all make it work by doing

mhp_altmap.free = PHYS_PFN(size) -  (memory_block_memmap_on_memory_size() - memory_block_memmap_size())

But is that any better than what we have now? I understand the term "reserved for driver use" is confusing for this use case.
But it is really reserving things for required alignment. 

-aneesh