[PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation
Sourabh Jain
sourabhjain at linux.ibm.com
Tue Nov 4 20:34:37 AEDT 2025
On 04/11/25 10:48, Sourabh Jain wrote:
>
>
> On 03/11/25 15:40, Ritesh Harjani (IBM) wrote:
>> Sourabh Jain <sourabhjain at linux.ibm.com> writes:
>>
>>> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the
>>> crashkernel= command line option") and commit ab475510e042 ("kdump:
>>> implement reserve_crashkernel_cma") added CMA support for kdump
>>> crashkernel reservation.
>>>
>>> Extend crashkernel CMA reservation support to powerpc.
>>>
>>> The following changes are made to enable CMA reservation on powerpc:
>>>
>>> - Parse and obtain the CMA reservation size along with other
>>> crashkernel
>>> parameters
>>> - Call reserve_crashkernel_cma() to allocate the CMA region for kdump
>>> - Include the CMA-reserved ranges in the usable memory ranges for the
>>> kdump kernel to use.
>>> - Exclude the CMA-reserved ranges from the crash kernel memory to
>>> prevent them from being exported through /proc/vmcore.
>>>
>>> With the introduction of the CMA crashkernel regions,
>>> crash_exclude_mem_range() needs to be called multiple times to exclude
>>> both crashk_res and crashk_cma_ranges from the crash memory ranges. To
>>> avoid repetitive logic for validating mem_ranges size and handling
>>> reallocation when required, this functionality is moved to a new
>>> wrapper
>>> function crash_exclude_mem_range_guarded().
>>>
>>> To ensure proper CMA reservation, reserve_crashkernel_cma() is called
>>> after pageblock_order is initialized.
>>>
>>> Update kernel-parameters.txt to document CMA support for crashkernel on
>>> powerpc architecture.
>>>
>>> Cc: Baoquan he <bhe at redhat.com>
>>> Cc: Jiri Bohac <jbohac at suse.cz>
>>> Cc: Hari Bathini <hbathini at linux.ibm.com>
>>> Cc: Madhavan Srinivasan <maddy at linux.ibm.com>
>>> Cc: Mahesh Salgaonkar <mahesh at linux.ibm.com>
>>> Cc: Michael Ellerman <mpe at ellerman.id.au>
>>> Cc: Ritesh Harjani (IBM) <ritesh.list at gmail.com>
>>> Cc: Shivang Upadhyay <shivangu at linux.ibm.com>
>>> Cc: kexec at lists.infradead.org
>>> Signed-off-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>>> ---
>>> Changlog:
>>>
>>> v3 -> v4
>>> - Removed repeated initialization to tmem in
>>> crash_exclude_mem_range_guarded()
>>> - Call crash_exclude_mem_range() with right crashk ranges
>>>
>>> v4 -> v5:
>>> - Document CMA-based crashkernel support for ppc64 in
>>> kernel-parameters.txt
>>> ---
>>> .../admin-guide/kernel-parameters.txt | 2 +-
>>> arch/powerpc/include/asm/kexec.h | 2 +
>>> arch/powerpc/kernel/setup-common.c | 4 +-
>>> arch/powerpc/kexec/core.c | 10 ++++-
>>> arch/powerpc/kexec/ranges.c | 43
>>> ++++++++++++++-----
>>> 5 files changed, 47 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
>>> b/Documentation/admin-guide/kernel-parameters.txt
>>> index 6c42061ca20e..0f386b546cec 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -1013,7 +1013,7 @@
>>> It will be ignored when crashkernel=X,high is not used
>>> or memory reserved is below 4G.
>>> crashkernel=size[KMG],cma
>>> - [KNL, X86] Reserve additional crash kernel memory from
>>> + [KNL, X86, ppc64] Reserve additional crash kernel
>>> memory from
>> Shouldn't this be PPC and not ppc64?
>>
>> If I see the crash_dump support...
>>
>> config ARCH_SUPPORTS_CRASH_DUMP
>> def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
>>
>> The changes below aren't specific to ppc64 correct?
>
> The thing is this feature is only supported with KEXEC_FILE and which
> only supported on PPC64:
>
> config ARCH_SUPPORTS_KEXEC_FILE
> def_bool PPC64
>
> Hence I kept it as ppc64.
>
> I think I should update that in the commit message.
>
> Also do you think is it good to restrict this feature to KEXEC_FILE?
Putting this under KEXEC_FILE may not help much because KEXEC_FILE is
enabled
by default in most configurations. Once it is enabled, the CMA
reservation will
happen regardless of which system call is used to load the kdump kernel
(kexec_load or kexec_file_load).
However, not restricting this feature to KEXEC_FILE will allow the kexec
tool to
independently add support for this feature in the future for the kexec_load
system call.
With that logic, I think if we do not restrict this feature to
KEXEC_FILE, the support
will be available for ppc and not limited to ppc64.
>
>>
>>> CMA. This reservation is usable by the first system's
>>> userspace memory and kernel movable allocations (memory
>>> balloon, zswap). Pages allocated from this memory range
>>> diff --git a/arch/powerpc/include/asm/kexec.h
>>> b/arch/powerpc/include/asm/kexec.h
>>> index 4bbf9f699aaa..bd4a6c42a5f3 100644
>>> --- a/arch/powerpc/include/asm/kexec.h
>>> +++ b/arch/powerpc/include/asm/kexec.h
>>> @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage
>>> *image, void *fdt, struct crash_mem
>>> #ifdef CONFIG_CRASH_RESERVE
>>> int __init overlaps_crashkernel(unsigned long start, unsigned long
>>> size);
>>> extern void arch_reserve_crashkernel(void);
>>> +extern void kdump_cma_reserve(void);
>>> #else
>>> static inline void arch_reserve_crashkernel(void) {}
>>> static inline int overlaps_crashkernel(unsigned long start,
>>> unsigned long size) { return 0; }
>>> +static inline void kdump_cma_reserve(void) { }
>>> #endif
>>> #if defined(CONFIG_CRASH_DUMP)
>>> diff --git a/arch/powerpc/kernel/setup-common.c
>>> b/arch/powerpc/kernel/setup-common.c
>>> index 68d47c53876c..c8c42b419742 100644
>>> --- a/arch/powerpc/kernel/setup-common.c
>>> +++ b/arch/powerpc/kernel/setup-common.c
>>> @@ -35,6 +35,7 @@
>>> #include <linux/of_irq.h>
>>> #include <linux/hugetlb.h>
>>> #include <linux/pgtable.h>
>>> +#include <asm/kexec.h>
>>> #include <asm/io.h>
>>> #include <asm/paca.h>
>>> #include <asm/processor.h>
>>> @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p)
>>> initmem_init();
>>> /*
>>> - * Reserve large chunks of memory for use by CMA for fadump,
>>> KVM and
>>> + * Reserve large chunks of memory for use by CMA for kdump,
>>> fadump, KVM and
>>> * hugetlb. These must be called after initmem_init(), so that
>>> * pageblock_order is initialised.
>>> */
>>> fadump_cma_init();
>>> + kdump_cma_reserve();
>>> kvm_cma_reserve();
>>> gigantic_hugetlb_cma_reserve();
>>> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>>> index d1a2d755381c..25744737eff5 100644
>>> --- a/arch/powerpc/kexec/core.c
>>> +++ b/arch/powerpc/kexec/core.c
>>> @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image)
>>> {
>>> }
>>> +unsigned long long cma_size;
>>> +
>> nit:
>> Since this is a gloabal powerpc variable you are defining, then can we
>> keep it's name to crashk_cma_size?
>
> Yeah make sense. I will update the variable name.
>
>
>>
>>> /*
>>> * Do not allocate memory (or fail in any way) in machine_kexec().
>>> * We are past the point of no return, committed to rebooting now.
>>> @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void)
>>> /* use common parsing */
>>> ret = parse_crashkernel(boot_command_line, total_mem_sz,
>>> &crash_size,
>>> - &crash_base, NULL, NULL, NULL);
>>> + &crash_base, NULL, &cma_size, NULL);
>>> if (ret)
>>> return;
>>> @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void)
>>> reserve_crashkernel_generic(crash_size, crash_base, 0, false);
>>> }
>>> +void __init kdump_cma_reserve(void)
>>> +{
>>> + if (cma_size)
>>> + reserve_crashkernel_cma(cma_size);
>>> +}
>>> +
>> nit:
>> cma_size is already checked for null within reserve_crashkernel_cma(),
>> so we don't really need kdump_cma_reserve() function call as such.
>>
>> Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA..
>> so instead do you think we can directly call
>> reserve_crashkernel_cma(cma_size)?
>
> I think the above kdump_cma_reserve() definition should come under
> CONFIG_CRASH_RESERVE
> because the way it is declared in arch/powerpc/include/asm/kexec.h.
>
> I would like to keep kdump_cma_reserve() as is it because of two reasons:
>
> - It keeps setup_arch() free from kdump #ifdefs
> - In case if we want to add some condition on this reservation it
> would straight forward.
>
> So lets keep kdump_cma_reserve as is, unless you have strong opinion
> on not to.
>
>>> int __init overlaps_crashkernel(unsigned long start, unsigned long
>>> size)
>>> {
>>> return (start + size) > crashk_res.start && start <=
>>> crashk_res.end;
>>> diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
>>> index 3702b0bdab14..3bd27c38726b 100644
>>> --- a/arch/powerpc/kexec/ranges.c
>>> +++ b/arch/powerpc/kexec/ranges.c
>>> @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> */
>>> int get_usable_memory_ranges(struct crash_mem **mem_ranges)
>>> {
>>> - int ret;
>>> + int ret, i;
>>> /*
>>> * Early boot failure observed on guests when low memory
>>> (first memory
>>> @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> if (ret)
>>> goto out;
>>> + for (i = 0; i < crashk_cma_cnt; i++) {
>>> + ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start,
>>> + crashk_cma_ranges[i].end -
>>> crashk_cma_ranges[i].start + 1);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> ret = add_rtas_mem_range(mem_ranges);
>>> if (ret)
>>> goto out;
>>> @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> #endif /* CONFIG_KEXEC_FILE */
>>> #ifdef CONFIG_CRASH_DUMP
>>> +static int crash_exclude_mem_range_guarded(struct crash_mem
>>> **mem_ranges,
>>> + unsigned long long mstart,
>>> + unsigned long long mend)
>>> +{
>>> + struct crash_mem *tmem = *mem_ranges;
>>> +
>>> + /* Reallocate memory ranges if there is no space to split
>>> ranges */
>>> + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> + tmem = realloc_mem_ranges(mem_ranges);
>>> + if (!tmem)
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + return crash_exclude_mem_range(tmem, mstart, mend);
>>> +}
>>> +
>>> /**
>>> * get_crash_memory_ranges - Get crash memory ranges. This list
>>> includes
>>> * first/crashing kernel's memory
>>> regions that
>>> @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> int get_crash_memory_ranges(struct crash_mem **mem_ranges)
>>> {
>>> phys_addr_t base, end;
>>> - struct crash_mem *tmem;
>>> u64 i;
>>> int ret;
>>> @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> sort_memory_ranges(*mem_ranges, true);
>>> }
>>> - /* Reallocate memory ranges if there is no space to split
>>> ranges */
>>> - tmem = *mem_ranges;
>>> - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> - tmem = realloc_mem_ranges(mem_ranges);
>>> - if (!tmem)
>>> - goto out;
>>> - }
>>> -
>>> /* Exclude crashkernel region */
>>> - ret = crash_exclude_mem_range(tmem, crashk_res.start,
>>> crashk_res.end);
>>> + ret = crash_exclude_mem_range_guarded(mem_ranges,
>>> crashk_res.start, crashk_res.end);
>>> if (ret)
>>> goto out;
>>> + for (i = 0; i < crashk_cma_cnt; ++i) {
>>> + ret = crash_exclude_mem_range_guarded(mem_ranges,
>>> crashk_cma_ranges[i].start,
>>> + crashk_cma_ranges[i].end);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> /*
>>> * FIXME: For now, stay in parity with kexec-tools but if
>>> RTAS/OPAL
>>> * regions are exported to save their context at the
>>> time of
>>> --
>>> 2.51.0
>
More information about the Linuxppc-dev
mailing list