[RFC PATCH] powerpc: Make crashing cpu to be discovered first in kdump kernel.

Mahesh J Salgaonkar mahesh at linux.ibm.com
Tue Sep 12 21:05:44 AEST 2023


On 2023-09-11 21:23:42 Mon, Pingfan Liu wrote:
> Hi Mahesh,
> 
> I am not quite sure about fdt, so I skip that part, and leave some
> comments from the kexec view.
> 
> On Thu, Sep 7, 2023 at 1:59 AM Mahesh Salgaonkar <mahesh at linux.ibm.com> wrote:
> >
> > The kernel boot parameter 'nr_cpus=' allows one to specify number of
> > possible cpus in the system. In the normal scenario the first cpu (cpu0)
> > that shows up is the boot cpu and hence it gets covered under nr_cpus
> > limit.
> >
> > But this assumption is broken in kdump scenario where kdump kernel after a
> > crash can boot up on an non-zero boot cpu. The paca structure allocation
> > depends on value of nr_cpus and is indexed using logical cpu ids. The cpu
> > discovery code brings up the cpus as they appear sequentially on device
> > tree and assigns logical cpu ids starting from 0. This definitely becomes
> > an issue if boot cpu id > nr_cpus. When this occurs it results into
> >
> > In past there were proposals to fix this by making changes to cpu discovery
> > code to identify non-zero boot cpu and map it to logical cpu 0. However,
> > the changes were very invasive, making discovery code more complicated and
> > risky.
> >
> > Considering that the non-zero boot cpu scenario is more specific to kdump
> > kernel, limiting the changes in panic/crash kexec path would probably be a
> > best approach to have.
> >
> > Hence proposed change is, in crash kexec path, move the crashing cpu's
> > device node to the first position under '/cpus' node, which will make the
> > crashing cpu to be discovered as part of the first core in kdump kernel.
> >
> > In order to accommodate boot cpu for the case where boot_cpuid > nr_cpu_ids,
> > align up the nr_cpu_ids to SMT threads in early_init_dt_scan_cpus(). This
> > will allow kdump kernel to work with nr_cpus=X where X will be aligned up
> > in multiple of SMT threads per core.
> >
> > Signed-off-by: Mahesh Salgaonkar <mahesh at linux.ibm.com>
> > ---
> >  arch/powerpc/include/asm/kexec.h  |    1
> >  arch/powerpc/kernel/prom.c        |   13 ++++
> >  arch/powerpc/kexec/core_64.c      |  128 +++++++++++++++++++++++++++++++++++++
> >  arch/powerpc/kexec/file_load_64.c |    2 -
> >  4 files changed, 143 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> > index a1ddba01e7d13..f5a6f4a1b8eb0 100644
> > --- a/arch/powerpc/include/asm/kexec.h
> > +++ b/arch/powerpc/include/asm/kexec.h
> > @@ -144,6 +144,7 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image);
> >  int setup_new_fdt_ppc64(const struct kimage *image, void *fdt,
> >                         unsigned long initrd_load_addr,
> >                         unsigned long initrd_len, const char *cmdline);
> > +int add_node_props(void *fdt, int node_offset, const struct device_node *dn);
> >  #endif /* CONFIG_PPC64 */
> >
> >  #endif /* CONFIG_KEXEC_FILE */
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index 0b5878c3125b1..c2d4f55042d72 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -322,6 +322,9 @@ static void __init check_cpu_feature_properties(unsigned long node)
> >         }
> >  }
> >
> > +/* align addr on a size boundary - adjust address up */
> > +#define _ALIGN_UP(addr, size)   (((addr)+((size)-1))&(~((typeof(addr))(size)-1)))
> > +
> >  static int __init early_init_dt_scan_cpus(unsigned long node,
> >                                           const char *uname, int depth,
> >                                           void *data)
> > @@ -348,6 +351,16 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> >
> >         nthreads = len / sizeof(int);
> >
> > +       /*
> > +        * Align nr_cpu_ids to correct SMT value. This will help us to allocate
> > +        * pacas correctly to accomodate boot_cpu != 0 scenario e.g. in kdump
> > +        * kernel the boot cpu can be any cpu between 0 through nthreads.
> > +        */
> > +       if (nr_cpu_ids % nthreads) {
> > +               nr_cpu_ids = _ALIGN_UP(nr_cpu_ids, nthreads);
> 
> It is better to use set_nr_cpu_ids(), which can hide the difference of
> nr_cpus_ids under different kernel configuration.

Yup, I have fixed that in RFC v2 at https://lore.kernel.org/linuxppc-dev/169426331903.1523857.16489366285648613217.stgit@jupiter/

> 
> > +               pr_info("Aligned nr_cpus to SMT=%d, nr_cpu_ids = %d\n", nthreads, nr_cpu_ids);
> > +       }
> > +
> >         /*
> >          * Now see if any of these threads match our boot cpu.
> >          * NOTE: This must match the parsing done in smp_setup_cpu_maps.
> > +
[...]
> >  /* too late to fail here */
> >  void default_machine_kexec(struct kimage *image)
> >  {
> > @@ -341,6 +455,20 @@ void default_machine_kexec(struct kimage *image)
> >                 printk("kexec: Unshared all shared pages.\n");
> >         }
> >
> > +       /*
> > +        * Move the crashing cpus FDT node as the first node under /cpus node.
> > +        * This will make the core (where crashing cpu belongs) to
> > +        * automatically become first core to show up in kdump kernel and
> > +        * crashing cpu as boot cpu within first n threads of that core.
> > +        *
> > +        * Currently this will work with kexec_file_load only.
> > +        *
> > +        * XXX: For kexec_load, change is required in kexec tool to exclude FDT
> > +        * segment from purgatory checksum check.
> > +        */
> > +       if (image->type == KEXEC_TYPE_CRASH && image->file_mode)
> > +               move_crashing_cpu(image);
> > +
> 
> Could "kexec -e" have the same logic? Then nr_cpus can work for both
> "kexec -p" and "kexec -e".

I see that for normal kexec path we always migrate to reboot_cpus
(which is by default is logical cpu 0) before kexec'ing into new kernel.
Hence, the 3rd hunk of this patch which aligns up nr_cpu_ids to nthreads
is enough to support nr_cpus.

However, if someone chooses to change reboot_cpu other than 0, either
through reboot_cpu= kernel parameter or by
"echo cpunum > /sys/kernel/reboot/cpu", then kexec -e will need to move
reboot_cpus fdt node to first position for nr_cpus to work.

Simple way to confirm this is:

$ kexec -l <vmlinux> --initrd=<initrd> --command-line="... nr_cpus=8"
$ echo 8 > /sys/kernel/reboot/cpu
$ kexec -e

Kexec kernel fails to boot.

Will work on changes to make kexec -e to work for nr_cpus in v3.

Thanks for your review.
-Mahesh.

> 
> Thanks,
> 
> Pingfan
> 
> >         paca_ptrs[kexec_paca.paca_index] = &kexec_paca;
> >
> >         setup_paca(&kexec_paca);
> > diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
> > index 110d28bede2a7..42d55a19454a7 100644
> > --- a/arch/powerpc/kexec/file_load_64.c
> > +++ b/arch/powerpc/kexec/file_load_64.c
> > @@ -1027,7 +1027,7 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
> >   *
> >   * Returns 0 on success, negative errno on error.
> >   */
> > -static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
> > +int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
> >  {
> >         int ret = 0;
> >         struct property *pp;
> >
> >

-- 
Mahesh J Salgaonkar


More information about the Linuxppc-dev mailing list