[PATCH] powerpc/crashkernel: take mem option into account

Pingfan Liu kernelfans at gmail.com
Mon Sep 23 14:14:40 AEST 2019


On Wed, Sep 18, 2019 at 7:23 PM Michael Ellerman <mpe at ellerman.id.au> wrote:
>
> Pingfan Liu <kernelfans at gmail.com> writes:
> > Cc Kexec list. And keep the original content.
> >
> > On Thu, Sep 12, 2019 at 10:50 AM Pingfan Liu <kernelfans at gmail.com> wrote:
> >>
> >> 'mem=" option is an easy way to put high pressure on memory during some
> >> test. Hence in stead of total mem, the effective usable memory size
>                ^                          ^
>                instead                    "actual" would be clearer
>
> I think adding: "after applying the memory limit"
>
> would help here.
>
> >> should be considered when reserving mem for crashkernel. Otherwise
> >> the boot up may experience oom issue.
>                               ^
>                               OOM
> >>
> >> E.g passing
> >> crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and
> >> mem=5G on a 256G machine.
>
> Spelling out the behaviour before and after would help here, eg:
>
> .. "would reserve 4G prior to the change and 512M afterward."
>
Thanks for kindly review. I will update the commit based on your suggestion.
>
> >> Signed-off-by: Pingfan Liu <kernelfans at gmail.com>
> >> Cc: Hari Bathini <hbathini at linux.ibm.com>
> >> Cc: Michael Ellerman <mpe at ellerman.id.au>
> >> To: linuxppc-dev at lists.ozlabs.org
> >> ---
> >> v1 -> v2: fix the printk info about the total mem
> >>  arch/powerpc/kernel/machine_kexec.c | 7 ++++---
> >>  1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
> >> index c4ed328..eec96dc 100644
> >> --- a/arch/powerpc/kernel/machine_kexec.c
> >> +++ b/arch/powerpc/kernel/machine_kexec.c
> >> @@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image)
> >>
> >>  void __init reserve_crashkernel(void)
> >>  {
> >> -       unsigned long long crash_size, crash_base;
> >> +       unsigned long long crash_size, crash_base, total_mem_sz;
> >>         int ret;
> >>
> >> +       total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
> >>         /* use common parsing */
> >> -       ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> >> +       ret = parse_crashkernel(boot_command_line, total_mem_sz,
> >>                         &crash_size, &crash_base);
>
> I think this change makes sense. But we have multiple arches that
> implement similar logic, and I wonder if we should keep them all the
> same.
>
> eg:
>
>   arch/arm/kernel/setup.c:                ret = parse_crashkernel(boot_command_line, total_mem,
>   arch/arm64/mm/init.c:                   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/ia64/kernel/setup.c:               ret = parse_crashkernel(boot_command_line, total,
>   arch/mips/kernel/setup.c:               ret = parse_crashkernel(boot_command_line, total_mem,
>   arch/powerpc/kernel/fadump.c:           ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/powerpc/kernel/machine_kexec.c:    ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/s390/kernel/setup.c:               rc = parse_crashkernel(boot_command_line, memory_end, &crash_size,
>   arch/sh/kernel/machine_kexec.c:         ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>   arch/x86/kernel/setup.c:                ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
>
>
> From a quick glance most of them don't seem to take the memory limit
> into account.
>
> So I guess the question is do we want all arches to implement the same
> behaviour or do we think it doesn't matter if they differ in details
> like this?

On powerpc, the current code make fadump/kdump a higher priority than
"mem=" option, as the notes in fadump_reserve_mem() says
"
        /*
         * Calculate the memory boundary.
         * If memory_limit is less than actual memory boundary then reserve
         * the memory for fadump beyond the memory_limit and adjust the
         * memory_limit accordingly, so that the running kernel can run with
         * specified memory_limit.
         */
"

While on other archs, they pack "mem=" info into memblock before
calling memblock_phys_mem_size(). So when parse_crashkernel() calls
memblock_phys_mem_size(), the "mem=" takes effect.

E.g for x86 in arch/x86/kernel/e820.c
static int __init parse_memopt(char *p)
{
...
e820__range_remove(mem_size, ULLONG_MAX - mem_size, E820_TYPE_RAM, 1);
// this pack the "mem=" info into e820, and is finally feed to
memblock
}

Thanks,
Pingfan


More information about the Linuxppc-dev mailing list