[PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter

David Hildenbrand david at redhat.com
Tue Jun 20 22:53:35 AEST 2023

On 20.06.23 14:35, Michael Ellerman wrote:
> David Hildenbrand <david at redhat.com> writes:
>> On 09.06.23 08:08, Aneesh Kumar K.V wrote:
>>> Certain devices can possess non-standard memory capacities, not constrained
>>> to multiples of 1GB. Provide a kernel parameter so that we can map the
>>> device memory completely on memory hotplug.
>> So, the unfortunate thing is that these devices would have worked out of
>> the box before the memory block size was increased from 256 MiB to 1 GiB
>> in these setups. Now, one has to fine-tune the memory block size. The
>> only other arch that I know, which supports setting the memory block
>> size, is x86 for special (large) UV systems -- and at least in the past
>> 128 MiB vs. 2 GiB memory blocks made a performance difference during
>> boot (maybe no longer today, who knows).
>> Obviously, less tunable and getting stuff simply working out of the box
>> is preferable.
>> Two questions:
>> 1) Isn't there a way to improve auto-detection to fallback to 256 MiB in
>> these setups, to avoid specifying these parameters?
>> 2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On
>> x86-64, experiments (with direct map fragmentation) showed that the
>> effective performance boost is pretty insignificant, so I wonder how big
>> the 1 GiB direct map performance improvement is.
> The other issue is simply the number of sysfs entries.
> With 64TB of memory and a 256MB block size you end up with ~250,000
> directories in /sys/devices/system/memory.

Yes, and so far on other archs we only optimize for that for on UV x86 
systems (with a default of 2 GiB). And that was added before we started 
to speed up memory device lookups significantly using a radix tree IIRC.

It's worth noting that there was a discussion on:

(a) not creating these device sysfs entries (when configured on the 
cmdline); often, nobody really ends up using them to online/offline 
memory blocks. Then, the only primary users is lsmem.

(b) exposing logical devices (e.g., a DIMM) taht can only be 
offlined/removed as a whole, instead of their individual memblocks (when 
configured on the cmdline). But for PPC64 that won't help.

But (a) gets more tricky if device drivers (and things like dax/kmem) 
rely on user-space memory onlining/offlining.


David / dhildenb

More information about the Linuxppc-dev mailing list