[PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation

Sourabh Jain sourabhjain at linux.ibm.com
Tue Nov 4 20:34:37 AEDT 2025



On 04/11/25 10:48, Sourabh Jain wrote:
>
>
> On 03/11/25 15:40, Ritesh Harjani (IBM) wrote:
>> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>>
>>> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the
>>> crashkernel= command line option") and commit ab475510e042 ("kdump:
>>> implement reserve_crashkernel_cma") added CMA support for kdump
>>> crashkernel reservation.
>>>
>>> Extend crashkernel CMA reservation support to powerpc.
>>>
>>> The following changes are made to enable CMA reservation on powerpc:
>>>
>>> - Parse and obtain the CMA reservation size along with other 
>>> crashkernel
>>>    parameters
>>> - Call reserve_crashkernel_cma() to allocate the CMA region for kdump
>>> - Include the CMA-reserved ranges in the usable memory ranges for the
>>>    kdump kernel to use.
>>> - Exclude the CMA-reserved ranges from the crash kernel memory to
>>>    prevent them from being exported through /proc/vmcore.
>>>
>>> With the introduction of the CMA crashkernel regions,
>>> crash_exclude_mem_range() needs to be called multiple times to exclude
>>> both crashk_res and crashk_cma_ranges from the crash memory ranges. To
>>> avoid repetitive logic for validating mem_ranges size and handling
>>> reallocation when required, this functionality is moved to a new 
>>> wrapper
>>> function crash_exclude_mem_range_guarded().
>>>
>>> To ensure proper CMA reservation, reserve_crashkernel_cma() is called
>>> after pageblock_order is initialized.
>>>
>>> Update kernel-parameters.txt to document CMA support for crashkernel on
>>> powerpc architecture.
>>>
>>> Cc: Baoquan he <bhe at redhat.com>
>>> Cc: Jiri Bohac <jbohac at suse.cz>
>>> Cc: Hari Bathini <hbathini at linux.ibm.com>
>>> Cc: Madhavan Srinivasan <maddy at linux.ibm.com>
>>> Cc: Mahesh Salgaonkar <mahesh at linux.ibm.com>
>>> Cc: Michael Ellerman <mpe at ellerman.id.au>
>>> Cc: Ritesh Harjani (IBM) <ritesh.list at gmail.com>
>>> Cc: Shivang Upadhyay <shivangu at linux.ibm.com>
>>> Cc: kexec at lists.infradead.org
>>> Signed-off-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>>> ---
>>> Changlog:
>>>
>>> v3 -> v4
>>>   - Removed repeated initialization to tmem in
>>>     crash_exclude_mem_range_guarded()
>>>   - Call crash_exclude_mem_range() with right crashk ranges
>>>
>>> v4 -> v5:
>>>   - Document CMA-based crashkernel support for ppc64 in 
>>> kernel-parameters.txt
>>> ---
>>>   .../admin-guide/kernel-parameters.txt         |  2 +-
>>>   arch/powerpc/include/asm/kexec.h              |  2 +
>>>   arch/powerpc/kernel/setup-common.c            |  4 +-
>>>   arch/powerpc/kexec/core.c                     | 10 ++++-
>>>   arch/powerpc/kexec/ranges.c                   | 43 
>>> ++++++++++++++-----
>>>   5 files changed, 47 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>>> b/Documentation/admin-guide/kernel-parameters.txt
>>> index 6c42061ca20e..0f386b546cec 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -1013,7 +1013,7 @@
>>>               It will be ignored when crashkernel=X,high is not used
>>>               or memory reserved is below 4G.
>>>       crashkernel=size[KMG],cma
>>> -            [KNL, X86] Reserve additional crash kernel memory from
>>> +            [KNL, X86, ppc64] Reserve additional crash kernel 
>>> memory from
>> Shouldn't this be PPC and not ppc64?
>>
>> If I see the crash_dump support...
>>
>> config ARCH_SUPPORTS_CRASH_DUMP
>>     def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
>>
>> The changes below aren't specific to ppc64 correct?
>
> The thing is this feature is only supported with KEXEC_FILE and which 
> only supported on PPC64:
>
> config ARCH_SUPPORTS_KEXEC_FILE
>     def_bool PPC64
>
> Hence I kept it as ppc64.
>
> I think I should update that in the commit message.
>
> Also do you think is it good to restrict this feature to KEXEC_FILE?

Putting this under KEXEC_FILE may not help much because KEXEC_FILE is 
enabled
by default in most configurations. Once it is enabled, the CMA 
reservation will
happen regardless of which system call is used to load the kdump kernel
(kexec_load or kexec_file_load).

However, not restricting this feature to KEXEC_FILE will allow the kexec 
tool to
independently add support for this feature in the future for the kexec_load
system call.

With that logic, I think if we do not restrict this feature to 
KEXEC_FILE, the support
will be available for ppc and not limited to ppc64.



>
>>
>>>               CMA. This reservation is usable by the first system's
>>>               userspace memory and kernel movable allocations (memory
>>>               balloon, zswap). Pages allocated from this memory range
>>> diff --git a/arch/powerpc/include/asm/kexec.h 
>>> b/arch/powerpc/include/asm/kexec.h
>>> index 4bbf9f699aaa..bd4a6c42a5f3 100644
>>> --- a/arch/powerpc/include/asm/kexec.h
>>> +++ b/arch/powerpc/include/asm/kexec.h
>>> @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage 
>>> *image, void *fdt, struct crash_mem
>>>   #ifdef CONFIG_CRASH_RESERVE
>>>   int __init overlaps_crashkernel(unsigned long start, unsigned long 
>>> size);
>>>   extern void arch_reserve_crashkernel(void);
>>> +extern void kdump_cma_reserve(void);
>>>   #else
>>>   static inline void arch_reserve_crashkernel(void) {}
>>>   static inline int overlaps_crashkernel(unsigned long start, 
>>> unsigned long size) { return 0; }
>>> +static inline void kdump_cma_reserve(void) { }
>>>   #endif
>>>     #if defined(CONFIG_CRASH_DUMP)
>>> diff --git a/arch/powerpc/kernel/setup-common.c 
>>> b/arch/powerpc/kernel/setup-common.c
>>> index 68d47c53876c..c8c42b419742 100644
>>> --- a/arch/powerpc/kernel/setup-common.c
>>> +++ b/arch/powerpc/kernel/setup-common.c
>>> @@ -35,6 +35,7 @@
>>>   #include <linux/of_irq.h>
>>>   #include <linux/hugetlb.h>
>>>   #include <linux/pgtable.h>
>>> +#include <asm/kexec.h>
>>>   #include <asm/io.h>
>>>   #include <asm/paca.h>
>>>   #include <asm/processor.h>
>>> @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p)
>>>       initmem_init();
>>>         /*
>>> -     * Reserve large chunks of memory for use by CMA for fadump, 
>>> KVM and
>>> +     * Reserve large chunks of memory for use by CMA for kdump, 
>>> fadump, KVM and
>>>        * hugetlb. These must be called after initmem_init(), so that
>>>        * pageblock_order is initialised.
>>>        */
>>>       fadump_cma_init();
>>> +    kdump_cma_reserve();
>>>       kvm_cma_reserve();
>>>       gigantic_hugetlb_cma_reserve();
>>>   diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>>> index d1a2d755381c..25744737eff5 100644
>>> --- a/arch/powerpc/kexec/core.c
>>> +++ b/arch/powerpc/kexec/core.c
>>> @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image)
>>>   {
>>>   }
>>>   +unsigned long long cma_size;
>>> +
>> nit:
>> Since this is a gloabal powerpc variable you are defining, then can we
>> keep it's name to crashk_cma_size?
>
> Yeah make sense. I will update the variable name.
>
>
>>
>>>   /*
>>>    * Do not allocate memory (or fail in any way) in machine_kexec().
>>>    * We are past the point of no return, committed to rebooting now.
>>> @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void)
>>>         /* use common parsing */
>>>       ret = parse_crashkernel(boot_command_line, total_mem_sz, 
>>> &crash_size,
>>> -                &crash_base, NULL, NULL, NULL);
>>> +                &crash_base, NULL, &cma_size, NULL);
>>>         if (ret)
>>>           return;
>>> @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void)
>>>       reserve_crashkernel_generic(crash_size, crash_base, 0, false);
>>>   }
>>>   +void __init kdump_cma_reserve(void)
>>> +{
>>> +    if (cma_size)
>>> +        reserve_crashkernel_cma(cma_size);
>>> +}
>>> +
>> nit:
>> cma_size is already checked for null within reserve_crashkernel_cma(),
>> so we don't really need kdump_cma_reserve() function call as such.
>>
>> Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA..
>> so instead do you think we can directly call 
>> reserve_crashkernel_cma(cma_size)?
>
> I think the above kdump_cma_reserve() definition should come under 
> CONFIG_CRASH_RESERVE
> because the way it is declared in arch/powerpc/include/asm/kexec.h.
>
> I would like to keep kdump_cma_reserve() as is it because of two reasons:
>
> - It keeps setup_arch() free from kdump #ifdefs
> - In case if we want to add some condition on this reservation it 
> would straight forward.
>
> So lets keep kdump_cma_reserve as is, unless you have strong opinion 
> on not to.
>
>>>   int __init overlaps_crashkernel(unsigned long start, unsigned long 
>>> size)
>>>   {
>>>       return (start + size) > crashk_res.start && start <= 
>>> crashk_res.end;
>>> diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
>>> index 3702b0bdab14..3bd27c38726b 100644
>>> --- a/arch/powerpc/kexec/ranges.c
>>> +++ b/arch/powerpc/kexec/ranges.c
>>> @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem 
>>> **mem_ranges)
>>>    */
>>>   int get_usable_memory_ranges(struct crash_mem **mem_ranges)
>>>   {
>>> -    int ret;
>>> +    int ret, i;
>>>         /*
>>>        * Early boot failure observed on guests when low memory 
>>> (first memory
>>> @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem 
>>> **mem_ranges)
>>>       if (ret)
>>>           goto out;
>>>   +    for (i = 0; i < crashk_cma_cnt; i++) {
>>> +        ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start,
>>> +                    crashk_cma_ranges[i].end - 
>>> crashk_cma_ranges[i].start + 1);
>>> +        if (ret)
>>> +            goto out;
>>> +    }
>>> +
>>>       ret = add_rtas_mem_range(mem_ranges);
>>>       if (ret)
>>>           goto out;
>>> @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem 
>>> **mem_ranges)
>>>   #endif /* CONFIG_KEXEC_FILE */
>>>     #ifdef CONFIG_CRASH_DUMP
>>> +static int crash_exclude_mem_range_guarded(struct crash_mem 
>>> **mem_ranges,
>>> +                       unsigned long long mstart,
>>> +                       unsigned long long mend)
>>> +{
>>> +    struct crash_mem *tmem = *mem_ranges;
>>> +
>>> +    /* Reallocate memory ranges if there is no space to split 
>>> ranges */
>>> +    if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> +        tmem = realloc_mem_ranges(mem_ranges);
>>> +        if (!tmem)
>>> +            return -ENOMEM;
>>> +    }
>>> +
>>> +    return crash_exclude_mem_range(tmem, mstart, mend);
>>> +}
>>> +
>>>   /**
>>>    * get_crash_memory_ranges - Get crash memory ranges. This list 
>>> includes
>>>    *                           first/crashing kernel's memory 
>>> regions that
>>> @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem 
>>> **mem_ranges)
>>>   int get_crash_memory_ranges(struct crash_mem **mem_ranges)
>>>   {
>>>       phys_addr_t base, end;
>>> -    struct crash_mem *tmem;
>>>       u64 i;
>>>       int ret;
>>>   @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem 
>>> **mem_ranges)
>>>               sort_memory_ranges(*mem_ranges, true);
>>>       }
>>>   -    /* Reallocate memory ranges if there is no space to split 
>>> ranges */
>>> -    tmem = *mem_ranges;
>>> -    if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> -        tmem = realloc_mem_ranges(mem_ranges);
>>> -        if (!tmem)
>>> -            goto out;
>>> -    }
>>> -
>>>       /* Exclude crashkernel region */
>>> -    ret = crash_exclude_mem_range(tmem, crashk_res.start, 
>>> crashk_res.end);
>>> +    ret = crash_exclude_mem_range_guarded(mem_ranges, 
>>> crashk_res.start, crashk_res.end);
>>>       if (ret)
>>>           goto out;
>>>   +    for (i = 0; i < crashk_cma_cnt; ++i) {
>>> +        ret = crash_exclude_mem_range_guarded(mem_ranges, 
>>> crashk_cma_ranges[i].start,
>>> +                          crashk_cma_ranges[i].end);
>>> +        if (ret)
>>> +            goto out;
>>> +    }
>>> +
>>>       /*
>>>        * FIXME: For now, stay in parity with kexec-tools but if 
>>> RTAS/OPAL
>>>        *        regions are exported to save their context at the 
>>> time of
>>> -- 
>>> 2.51.0
>



More information about the Linuxppc-dev mailing list