[PATCH v2 4/4] powerpc/mm/radix: Create separate mappings for hot-plugged memory

Michael Ellerman mpe at ellerman.id.au
Wed Jul 8 22:14:44 AEST 2020


"Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
> On 7/8/20 10:14 AM, Michael Ellerman wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
>>> To enable memory unplug without splitting kernel page table
>>> mapping, we force the max mapping size to the LMB size. LMB
>>> size is the unit in which hypervisor will do memory add/remove
>>> operation.
>>>
>>> This implies on pseries system, we now end up mapping
>> 
>> Please expand on why it "implies" that for pseries.
>> 
>>> memory with 2M page size instead of 1G. To improve
>>> that we want hypervisor to hint the kernel about the hotplug
>>> memory range.  This was added that as part of
>>                   That
>>>
>>> commit b6eca183e23e ("powerpc/kernel: Enables memory
>>> hot-remove after reboot on pseries guests")
>>>
>>> But we still don't do that on PowerVM. Once we get PowerVM
>> 
>> I think you mean PowerVM doesn't provide that hint yet?
>> 
>> Realistically it won't until P10. So this means we'll always use 2MB on
>> Power9 PowerVM doesn't it?
>> 
>> What about KVM?
>> 
>> Have you done any benchmarking on the impact of switching the linear
>> mapping to 2MB pages?
>> 
>
> The TLB impact should be minimal because with a 256M LMB size partition 
> scoped entries are still 2M and hence we end up with TLBs of 2M size.
>
>
>>> updated, we can then force the 2M mapping only to hot-pluggable
>>> memory region using memblock_is_hotpluggable(). Till then
>>> let's depend on LMB size for finding the mapping page size
>>> for linear range.
>>>
>
> updated
>
>
> powerpc/mm/radix: Create separate mappings for hot-plugged memory
>
> To enable memory unplug without splitting kernel page table
> mapping, we force the max mapping size to the LMB size. LMB
> size is the unit in which hypervisor will do memory add/remove
> operation.
>
> Pseries systems supports max LMB size of 256MB. Hence on pseries,
> we now end up mapping memory with 2M page size instead of 1G. To improve
> that we want hypervisor to hint the kernel about the hotplug
> memory range.  That was added that as part of
>
> commit b6eca18 ("powerpc/kernel: Enables memory
> hot-remove after reboot on pseries guests")
>
> But PowerVM doesn't provide that hint yet. Once we get PowerVM
> updated, we can then force the 2M mapping only to hot-pluggable
> memory region using memblock_is_hotpluggable(). Till then
> let's depend on LMB size for finding the mapping page size
> for linear range.
>
> With this change KVM guest will also be doing linear mapping with
> 2M page size.

...
>>> @@ -494,17 +544,27 @@ void __init radix__early_init_devtree(void)
>>>   	 * Try to find the available page sizes in the device-tree
>>>   	 */
>>>   	rc = of_scan_flat_dt(radix_dt_scan_page_sizes, NULL);
>>> -	if (rc != 0)  /* Found */
>>> -		goto found;
>>> +	if (rc == 0) {
>>> +		/*
>>> +		 * no page size details found in device tree
>>> +		 * let's assume we have page 4k and 64k support
>> 
>> Capitals and punctuation please?
>> 
>>> +		 */
>>> +		mmu_psize_defs[MMU_PAGE_4K].shift = 12;
>>> +		mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
>>> +
>>> +		mmu_psize_defs[MMU_PAGE_64K].shift = 16;
>>> +		mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
>>> +	}
>> 
>> Moving that seems like an unrelated change. It's a reasonable change but
>> I'd rather you did it in a standalone patch.
>> 
>
> we needed that change so that we can call radix_memory_block_size() for 
> both found and !found case.

But the found and !found cases converge at found:, which is where you
call it. So I don't understand.

But as I said below, it would be even simpler if you worked out the
memory block size first.

cheers

>>>   	/*
>>> -	 * let's assume we have page 4k and 64k support
>>> +	 * Max mapping size used when mapping pages. We don't use
>>> +	 * ppc_md.memory_block_size() here because this get called
>>> +	 * early and we don't have machine probe called yet. Also
>>> +	 * the pseries implementation only check for ibm,lmb-size.
>>> +	 * All hypervisor supporting radix do expose that device
>>> +	 * tree node.
>>>   	 */
>>> -	mmu_psize_defs[MMU_PAGE_4K].shift = 12;
>>> -	mmu_psize_defs[MMU_PAGE_4K].ap = 0x0;
>>> -
>>> -	mmu_psize_defs[MMU_PAGE_64K].shift = 16;
>>> -	mmu_psize_defs[MMU_PAGE_64K].ap = 0x5;
>>> -found:
>>> +	radix_mem_block_size = radix_memory_block_size();
>> 
>> If you did that earlier in the function, before
>> radix_dt_scan_page_sizes(), the logic would be simpler.
>> 
>>>   	return;
>>>   }


More information about the Linuxppc-dev mailing list