[PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter

David Hildenbrand david at redhat.com
Tue Jun 20 02:28:56 AEST 2023

On 19.06.23 18:17, Aneesh Kumar K.V wrote:
> David Hildenbrand <david at redhat.com> writes:
>> On 09.06.23 08:08, Aneesh Kumar K.V wrote:
>>> Certain devices can possess non-standard memory capacities, not constrained
>>> to multiples of 1GB. Provide a kernel parameter so that we can map the
>>> device memory completely on memory hotplug.
>> So, the unfortunate thing is that these devices would have worked out of
>> the box before the memory block size was increased from 256 MiB to 1 GiB
>> in these setups. Now, one has to fine-tune the memory block size. The
>> only other arch that I know, which supports setting the memory block
>> size, is x86 for special (large) UV systems -- and at least in the past
>> 128 MiB vs. 2 GiB memory blocks made a performance difference during
>> boot (maybe no longer today, who knows).
>> Obviously, less tunable and getting stuff simply working out of the box
>> is preferable.
>> Two questions:
>> 1) Isn't there a way to improve auto-detection to fallback to 256 MiB in
>> these setups, to avoid specifying these parameters?
> The patch does try to detect as much as possible by looking at device tree
> nodes and aperture window size. But there are still cases where we find
> a memory aperture of size X GB and device driver hotplug X.YGB memory.

Okay, and I assume we can't detect that case easily.

Which interface is that device driver using to hotplug memory? It's 
quite surprising I have to say ...

>> 2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On
>> x86-64, experiments (with direct map fragmentation) showed that the
>> effective performance boost is pretty insignificant, so I wonder how big
>> the 1 GiB direct map performance improvement is.
> Tarun is running some tests to evaluate the impact. We used to use 1GiB
> mapping always. This was later switched to use memory block size to fix
> issues with memory unplug
> commit af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged memory")
> explains some details related to that change.

IIUC, that commit (conditionally) increased the memory block size to 
avoid the splitting, correct? By that, it broke the device driver use case.

>> I guess the only real issue with 256 MiB memory blocks and 1 GiB direct
>> mapping is memory unplug of boot memory: when unplugging a 256 MiB
>> block, one would have to remap the 1 GiB range using 2 MiB ranges.
>> ... I was wondering what would happen if you simply leave the direct
>> mapping in this corner case in place instead of doing this remapping.
>> IOW, remove the memory but keep the direct map pointing at the removed
>> memory. Nobody should be touching it, or are there any cases where that
>> could hurt?
>> Or is there any other reason why we really want 1 GiB memory blocks
>> instead of to defaulting to 256 MiB the way it used to be?
> The idea we are working towards is to keep the memory block size small

That would be preferable, yes ...

> but map the boot memory using 1G. An unplug request can split that 1G
> mapping later. We could look at the possibility of leaving that mapping
> without splitting. But not sure why we would want to do that if we can
> correctly split things. Right now there is no splitting support in powerpc.

If splitting over-complicates the matter (and well, it will even consume 
more memory), it might at least be worth looking into that. Yes, it's 

I think there is also the option to fail memory offlining (and therefore 
unplug) if we have a 1 GiB mapping and don't want to split. For 
hotplugged memory it would always work to unplug again. aarch64 blocks 
any boot memory from getting unplugged.

But I guess that might break existing use cases (unplug boot memory) on 
ppc64 that rely on ZONE_MOVABLE to have it working with guarantees, 
right? Could be optimized but not sure if that's the best approach.


David / dhildenb

More information about the Linuxppc-dev mailing list