powerpc: Set crashkernel offset to mid of RMA region
Sourabh Jain
sourabhjain at linux.ibm.com
Thu Feb 3 02:08:43 AEDT 2022
On 01/02/22 17:14, Michael Ellerman wrote:
> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>> On large config LPARs (having 192 and more cores), Linux fails to boot
>> due to insufficient memory in the first memblock. It is due to the
>> memory reservation for the crash kernel which starts at 128MB offset of
>> the first memblock. This memory reservation for the crash kernel doesn't
>> leave enough space in the first memblock to accommodate other essential
>> system resources.
>>
>> The crash kernel start address was set to 128MB offset by default to
>> ensure that the crash kernel get some memory below the RMA region which
>> is used to be of size 256MB. But given that the RMA region size can be
>> 512MB or more, setting the crash kernel offset to mid of RMA size will
>> leave enough space for kernel to allocate memory for other system
>> resources.
>>
>> Since the above crash kernel offset change is only applicable to the LPAR
>> platform, the LPAR feature detection is pushed before the crash kernel
>> reservation. The rest of LPAR specific initialization will still
>> be done during pseries_probe_fw_features as usual.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>> Reported-and-tested-by: Abdul haleem <abdhalee at linux.vnet.ibm.com>
>>
>> ---
>> arch/powerpc/kernel/rtas.c | 4 ++++
>> arch/powerpc/kexec/core.c | 15 +++++++++++----
>> 2 files changed, 15 insertions(+), 4 deletions(-)
>>
>> ---
>> Change in v3:
>> Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
>> series [1] try to discover 1T segment MMU feature support
>> BEFORE boot CPU paca allocation ([1] describes why it is needed).
>> MPE has posted a patch [2] that archives a similar objective by moving
>> boot CPU paca allocation after mmu_early_init_devtree().
>>
>> NOTE: This patch is dependent on the patch [2].
>>
>> [1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
>> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
>> ---
>>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 733e6ef36758..06df7464fb57 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -1313,6 +1313,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
>> entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
>> sizep = of_get_flat_dt_prop(node, "rtas-size", NULL);
>>
>> + /* need this feature to decide the crashkernel offset */
>> + if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
>> + powerpc_firmware_features |= FW_FEATURE_LPAR;
>> +
> As you'd have seen this breaks the 32-bit build. It will need an #ifdef
> CONFIG_PPC64 around it.
>
>> if (basep && entryp && sizep) {
>> rtas.base = *basep;
>> rtas.entry = *entryp;
>> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>> index 8b68d9f91a03..abf5897ae88c 100644
>> --- a/arch/powerpc/kexec/core.c
>> +++ b/arch/powerpc/kexec/core.c
>> @@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
>> if (!crashk_res.start) {
>> #ifdef CONFIG_PPC64
>> /*
>> - * On 64bit we split the RMO in half but cap it at half of
>> - * a small SLB (128MB) since the crash kernel needs to place
>> - * itself and some stacks to be in the first segment.
>> + * On the LPAR platform place the crash kernel to mid of
>> + * RMA size (512MB or more) to ensure the crash kernel
>> + * gets enough space to place itself and some stack to be
>> + * in the first segment. At the same time normal kernel
>> + * also get enough space to allocate memory for essential
>> + * system resource in the first segment. Keep the crash
>> + * kernel starts at 128MB offset on other platforms.
>> */
>> - crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>> + if (firmware_has_feature(FW_FEATURE_LPAR))
>> + crashk_res.start = ppc64_rma_size / 2;
>> + else
>> + crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
> I think this will break on machines using Radix won't it? At this point
> in boot ppc64_rma_size will be == 0. Because we won't call into
> hash__setup_initial_memory_limit().
>
> That's not changed by your patch, but seems like this code needs to be
> more careful/clever.
Interesting, but in my testing, I found that ppc64_rma_size
did get initialized before reserve_crashkernel() using radix on LPAR.
I am not sure why but hash__setup_initial_memory_limit() function is
gets called
regardless of radix or hash. Not sure whether it is by design but here
is the flow:
setup_initial_memory_limit()
static inline void setup_initial_memory_limit()
(arch/powerpc/include/asm/book3s/64/mmu.h)
if (!early_radix_enabled()) // FALSE regardless of radix
is enabled or not
hash__setup_initial_memory_limit() // initialize
ppc64_rma_size
reserve_crashkernel() // initialize crashkernel offset to mid of
RMA size.
For the sack of understanding even if we restrict crashkernel offset
setting to mid RMA (i.e. ppc64_rma_size/2) for
only hash it may not save radix because even today we are assigning
crashkernel offset using
ppc64_rma_size variable.
Is the current flow of initializing ppc64_rma_size variable before
reserve_crashkernel() for radix expected?
Please provide your input.
Thanks,
Sourabh Jain
More information about the Linuxppc-dev
mailing list