[PATCH v4 4/6] mm/hotplug: Allow pageblock alignment via altmap reservation

David Hildenbrand david at redhat.com
Tue Jul 25 02:24:01 AEST 2023


On 24.07.23 18:02, Aneesh Kumar K V wrote:
> On 7/24/23 9:11 PM, David Hildenbrand wrote:
>> On 24.07.23 17:16, Aneesh Kumar K V wrote:
>>
>>>>
>>>> /*
>>>>    * In "forced" memmap_on_memory mode, we always align the vmemmap size up to cover
>>>>    * full pageblocks. That way, we can add memory even if the vmemmap size is not properly
>>>>    * aligned, however, we might waste memory.
>>>>    */
>>>
>>> I am finding that confusing. We do want things to be pageblock_nr_pages aligned both ways.
>>> With MEMMAP_ON_MEMORY_FORCE, we do that by allocating more space for memmap and
>>> in the default case we do that by making sure only memory blocks of specific size supporting
>>> that alignment can use MEMMAP_ON_MEMORY feature.
>>
>> See the usage inm hp_supports_memmap_on_memory(), I guess that makes sense then.
>>
>> But if you have any ideas on how to clarify that (terminology), I'm all ears!
>>
> 
> 
> I updated the commit message
> 
> mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks
> 
> Currently, memmap_on_memory feature is only supported with memory block
> sizes that result in vmemmap pages covering full page blocks. This is
> because memory onlining/offlining code requires applicable ranges to be
> pageblock-aligned, for example, to set the migratetypes properly.
> 
> This patch helps to lift that restriction by reserving more pages than
> required for vmemmap space. This helps to align the start addr to be
> page block aligned with different memory block sizes. This implies the
> kernel will be reserving some pages for every memoryblock. This also
> allows the memmap on memory feature to be widely useful with different
> memory block size values.
> 
> For ex: with 64K page size and 256MiB memory block size, we require 4
> pages to map vmemmap pages, To align things correctly we end up adding a
> reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved.
> 
> 

Much better.

> Also while implementing your  suggestion to use memory_block_memmap_on_memory_size()
> I am finding it not really useful because in mhp_supports_memmap_on_memory() we are checking
> if remaining_size is pageblock_nr_pages aligned (dax_kmem may want to use that helper
> later).

Let's focus on this patchset here first.

Factoring out how manye memmap pages we actually need vs. how many pages 
we need when aligning up sound very reasonable to me.


Can you elaborate what the problem is?

> Also I still think altmap.reserve is easier because of the start_pfn calculation.
> (more on this below)

Can you elaborate? Do you mean the try_remove_memory() change?

> 
> 
>> [...]
>>
>>>>> +    return arch_supports_memmap_on_memory(size);
>>>>>     }
>>>>>       /*
>>>>> @@ -1311,7 +1391,11 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>>>>>     {
>>>>>         struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
>>>>>         enum memblock_flags memblock_flags = MEMBLOCK_NONE;
>>>>> -    struct vmem_altmap mhp_altmap = {};
>>>>> +    struct vmem_altmap mhp_altmap = {
>>>>> +        .base_pfn =  PHYS_PFN(res->start),
>>>>> +        .end_pfn  =  PHYS_PFN(res->end),
>>>>> +        .reserve  = memory_block_align_base(resource_size(res)),
>>>>
>>>> Can you remind me why we have to set reserve here at all?
>>>>
>>>> IOW, can't we simply set
>>>>
>>>> .free = memory_block_memmap_on_memory_size();
>>>>
>>>> end then pass
>>>>
>>>> mhp_altmap.alloc + mhp_altmap.free
>>>>
>>>> to create_memory_block_devices() instead?
>>>>
>>>
>>> But with the dax usage of altmap, altmap->reserve is what we use to reserve things to get
>>> the required alignment. One difference is where we allocate the struct page at. For this specific
>>> case it should not matter.
>>>
>>> static unsigned long __meminit vmem_altmap_next_pfn(struct vmem_altmap *altmap)
>>> {
>>>      return altmap->base_pfn + altmap->reserve + altmap->alloc
>>>          + altmap->align;
>>> }
>>>
>>> And other is where we online a memory block
>>>
>>> We find the start pfn using mem->altmap->alloc + mem->altmap->reserve;
>>>
>>> Considering altmap->reserve is what dax pfn_dev use, is there a reason you want to use altmap->free for this?
>>
>> "Reserve" is all about "reserving that much memory for driver usage".
>>
>> We don't care about that. We simply want vmemmap allocations coming from the pageblock(s) we set aside. Where exactly, we don't care.
>>
>>> I find it confusing to update free when we haven't allocated any altmap blocks yet.
>>
>> "
>> @reserve: pages mapped, but reserved for driver use (relative to @base)"
>> @free: free pages set aside in the mapping for memmap storage
>> @alloc: track pages consumed, private to vmemmap_populate()
>> "
>>
>> To me, that implies that we can ignore "reserve". We set @free to the aligned value and let the vmemmap get allocated from anything in there.
>>
>> free + alloc should always sum up to our set-aside pageblock(s), no?
>>
>>
> 
> The difference is
> 
>   mhp_altmap.free = PHYS_PFN(size) - reserved blocks;
> 
> ie, with 256MiB memory block size with 64K pages, we need 4 memmap pages and we reserve 28 pages for aligment.
> 
> mhp_altmap.free = PHYS_PFN(size) - 28.
> 
> So that 4 pages from which we are allocating the memmap pages are still counted in free page.
> 
> We could all make it work by doing
> 
> mhp_altmap.free = PHYS_PFN(size) -  (memory_block_memmap_on_memory_size() - memory_block_memmap_size())
> 
> But is that any better than what we have now? I understand the term "reserved for driver use" is confusing for this use case.
> But it is really reserving things for required alignment.


Let's take a step back.

altmap->alloc tells us how much was already allocated.

altmap->free tells us how much memory we can allocate at max (confusing, 
but see vmem_altmap_nr_free()).

altmap->free should actually have been called differently.


I think it's currently even *wrong* to set free = PHYS_PFN(size). We 
don't want to allocate beyond the first pageblock(s) we selected.


Can't we set:

1) add_memory_resource():

	.base_pfn = PHYS_PFN(start);
	.free = PHYS_PFN(memory_block_memmap_on_memory_size());

2) try_remove_memory():
	.base_pfn = PHYS_PFN(start);
	.alloc = PHYS_PFN(memory_block_memmap_on_memory_size());

Faking that all was allocated and avoiding any reservation terminology?

-- 
Cheers,

David / dhildenb



More information about the Linuxppc-dev mailing list