[PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid

Pingfan Liu piliu at redhat.com
Sat Oct 7 12:03:28 AEDT 2023


On Wed, Oct 4, 2023 at 2:07 AM Mahesh J Salgaonkar <mahesh at linux.ibm.com> wrote:
>
> On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote:
> > paca_ptrs should be large enough to hold the boot_cpuid, hence, its
> > lower boundary is set to the bigger one between boot_cpuid+1 and
> > nr_cpus.
> >
> > On the other hand, some kernel component: -1. the timer assumes cpu0
> > online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
> > not initialized to a proper present cpu.  -2. power9_idle_stop() assumes
> > the primary thread's paca is allocated.
> >
> > Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
> > boot cpu is not cpu0.
> >
> > Result:
> > When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
> > the kdump kernel brings up two cpus.
> > While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
> > the kdump kernel brings up one cpu.
>
> I tried your changes on power9 and power10 systems. However, on power10 lpar I
> see bellow backtrace in kdump kernel bootup with nr_cpus=1.
>

Thanks for the testing. I have only tried this series on Power9 bare
metal.  I think the bug is related with the code snippet in
update_mask_from_threadgroup()
  for (i = first_thread; i < first_thread + threads_per_core; i++) {
    int i_group_start = get_cpu_thread_group_start(i, tg);
                                  ^^^

Here it iterates over each thread in the core, but some of them are not online.

I will try to bring up a remedy.

Thanks,

Pingfan


> $ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'
> [...]
> [    0.000000] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [    0.000000] printk: bootconsole [udbg0] enabled
> [    0.000000] the round shift between dt seq and the cpu logic number: 8
> [    0.000000] Partition configured for 16 cpus, operating system maximum is 2.
> [    0.000000] CPU maps initialized for 8 threads per core
> [...]
> [    0.002249] BUG: Unable to handle kernel data access at 0x88888888888888c0
> [    0.002260] Faulting instruction address: 0xc00000001201226c
> [    0.002268] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [    0.002282] Modules linked in:
> [    0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1
> [    0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [    0.002305] NIP:  c00000001201226c LR: c000000012012234 CTR: 0000000000000004
> [    0.002312] REGS: c0000000167ff8f0 TRAP: 0380   Not tainted  (6.6.0-rc4)
> [    0.002321] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24000844  XER: 0000000a
> [    0.002346] CFAR: c00000001201231c IRQMASK: 0
> [    0.002346] GPR00: c000000012012234 c0000000167ffb90 c000000011b61900 0000000000000002
> [    0.002346] GPR04: 0000000000000000 0000000000000001 0000000000000001 c00000004ffeff80
> [    0.002346] GPR08: 0000000000000000 8888888888888888 0000000000000002 0000000000000000
> [    0.002346] GPR12: 0000000000000000 c000000013141000 c000000010011058 0000000000000000
> [    0.002346] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002346] GPR20: 0000000000000028 c000000012170968 c0000000120a3e80 0000000000000016
> [    0.002346] GPR24: c00000004ffdcfd0 0000000000000000 c000000012b82058 0000000000000000
> [    0.002346] GPR28: c00000004fc80a68 c000000012bf0350 c0000000120a3e2c 0000000000000000
> [    0.002426] NIP [c00000001201226c] update_mask_from_threadgroup+0x98/0x174
> [    0.002437] LR [c000000012012234] update_mask_from_threadgroup+0x60/0x174
> [    0.002444] Call Trace:
> [    0.002451] [c0000000167ffb90] [c000000012012234] update_mask_from_threadgroup+0x60/0x174 (unreliable)
> [    0.002464] [c0000000167ffbe0] [c0000000120125f8] init_thread_group_cache_map+0x2b0/0x328
> [    0.002477] [c0000000167ffc50] [c00000001201296c] smp_prepare_cpus+0x2fc/0x4f0
> [    0.002497] [c0000000167ffd10] [c000000012004e40] kernel_init_freeable+0x198/0x3cc
> [    0.002509] [c0000000167ffde0] [c000000010011084] kernel_init+0x34/0x1b0
> [    0.002531] [c0000000167ffe50] [c00000001000dd3c] ret_from_kernel_user_thread+0x14/0x1c
> [    0.002547] --- interrupt: 0 at 0x0
> [    0.002553] NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
> [    0.002563] REGS: c0000000167ffe80 TRAP: 0000   Not tainted  (6.6.0-rc4)
> [    0.002569] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
> [    0.002576] CFAR: 0000000000000000 IRQMASK: 0
> [    0.002576] GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002671] NIP [0000000000000000] 0x0
> [    0.002680] LR [0000000000000000] 0x0
> [    0.002689] --- interrupt: 0
> [    0.002697] Code: 7feafb78 813d0000 7d29fa14 7f895000 409d00d4 3ce20102 38e74758 79491f24 e87e0006 39000000 e8e70000 7d27482a <a8890038> 7f834000 79090020 419e005c
> [    0.002727] ---[ end trace 0000000000000000 ]---
> [    0.002739]
> [    1.002749] Kernel panic - not syncing: Fatal exception
> [    1.002795] Rebooting in 10 seconds..
>
> Thanks,
> -Mahesh.
>
> >
> > Signed-off-by: Pingfan Liu <piliu at redhat.com>
> > Cc: Michael Ellerman <mpe at ellerman.id.au>
> > Cc: Nicholas Piggin <npiggin at gmail.com>
> > Cc: Christophe Leroy <christophe.leroy at csgroup.eu>
> > Cc: Mahesh Salgaonkar <mahesh at linux.ibm.com>
> > Cc: Wen Xiong <wenxiong at us.ibm.com>
> > Cc: Baoquan He <bhe at redhat.com>
> > Cc: Ming Lei <ming.lei at redhat.com>
> > Cc: kexec at lists.infradead.org
> > To: linuxppc-dev at lists.ozlabs.org
> > ---
> >  arch/powerpc/kernel/paca.c | 10 ++++++----
> >  arch/powerpc/kernel/prom.c |  9 ++++++---
> >  2 files changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> > index cda4e00b67c1..91e2401de1bd 100644
> > --- a/arch/powerpc/kernel/paca.c
> > +++ b/arch/powerpc/kernel/paca.c
> > @@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
> >
> >  void __init allocate_paca_ptrs(void)
> >  {
> > -     paca_nr_cpu_ids = nr_cpu_ids;
> > +     int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
> >
> > -     paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> > +     paca_nr_cpu_ids = n;
> > +     paca_ptrs_size = sizeof(struct paca_struct *) * n;
> >       paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
> >       if (!paca_ptrs)
> >               panic("Failed to allocate %d bytes for paca pointers\n",
> > @@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
> >  void __init free_unused_pacas(void)
> >  {
> >       int new_ptrs_size;
> > +     int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
> >
> > -     new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> > +     new_ptrs_size = sizeof(struct paca_struct *) * n;
> >       if (new_ptrs_size < paca_ptrs_size)
> >               memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
> >                                  paca_ptrs_size - new_ptrs_size);
> >
> > -     paca_nr_cpu_ids = nr_cpu_ids;
> > +     paca_nr_cpu_ids = n;
> >       paca_ptrs_size = new_ptrs_size;
> >
> >  #ifdef CONFIG_PPC_64S_HASH_MMU
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index 87272a2d8c10..15c994f54bf9 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> >                        */
> >                       boot_cpuid = i;
> >                       found = true;
> > -                     /* This works around the hole in paca_ptrs[]. */
> > -                     if (nr_cpu_ids < nthreads)
> > -                             set_nr_cpu_ids(nthreads);
> > +                     /*
> > +                      * Ideally, nr_cpus=1 can be achieved if each kernel
> > +                      * component does not assume cpu0 is onlined.
> > +                      */
> > +                     if (boot_cpuid != 0 && nr_cpu_ids < 2)
> > +                             set_nr_cpu_ids(2);
> >               }
> >  #ifdef CONFIG_SMP
> >               /* logical cpu id is always 0 on UP kernels */
> > --
> > 2.31.1
> >
>
> --
> Mahesh J Salgaonkar
>



More information about the Linuxppc-dev mailing list