Very unreliable booting (WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0) with PPC_EARLY_DEBUG_G5 set on a PowerMac G5, kernel 6.7-rc6
Erhard Furtner
erhard_f at mailbox.org
Mon Jan 1 05:43:29 AEDT 2024
On Thu, 21 Dec 2023 16:30:34 +1100
Michael Ellerman <mpe at ellerman.id.au> wrote:
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc6-PMacG5 #4
> > Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
> > NIP: c000000000bbf320 LR: c0000000000ab040 CTR: 0000000000000000
> > REGS: c000000003103300 TRAP: 0700 Not tainted (6.7.0-rc6-PMacG5)
> > MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24040840 XER: 20000000
> > IRQMASK: 1
> > GPR00: c0000000000ab068 c0000000031035a0 c000000000e43500 0000000000000000
> > GPR04: c00000000fffbeb0 c00000000fffbeb8 0000000000000000 000000047ef66000
> > GPR08: c00000047fe059b0 c000000000e9f9b0 c000000000e9f9c0 0000000000000000
> > GPR12: c000000000149e10 c00000000110a000 c00000000000dc10 0000000000000000
> > GPR16: 0000000000000000 0000000000000000 c00000000108b9c8 0000000000000008
> > GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001
> > GPR24: c0000000010d04c4 c00000000108b900 c0000000010d04c0 000000047ef66000
> > GPR28: c000000003103650 c000000000e9a400 000000000000006e c000000003170000
> > NIP [c000000000bbf320] ct_nmi_exit+0xa0/0xc0
> > LR [c0000000000ab040] irq_exit+0x90/0x110
> > Call Trace:
> > [c0000000031035a0] [c0000000000ab068] irq_exit+0xb8/0x110 (unreliable)
> > [c0000000031035d0] [c0000000000196f0] timer_interrupt+0x1b0/0x370
> > [c000000003103620] [c0000000000089f4] decrementer_common_virt+0x214/0x220
> > --- interrupt: 900 at real_readb+0x44/0x68
> > NIP: c00000000001f24c LR: c000000000074258 CTR: c0000000000087c0
> > REGS: c000000003103650 TRAP: 0900 Not tainted (6.7.0-rc6-PMacG5)
> > MSR: 9000000000009022 <SF,HV,EE,ME,IR,RI> CR: 24040242 XER: 20000000
> > IRQMASK: 0
> > GPR00: 9000000000009022 c0000000031038f0 c000000000e43500 0000000000000071
> > GPR04: 0000000080013030 0000011000000040 0000001000000040 9000000000009032
> > GPR08: 0000000000000000 c0000000010d3500 000000047ef66000 0000000052454753
> > GPR12: 9000000000009032 c00000000110a000 c00000000000dc10 0000000000000000
> > GPR16: 0000000000000000 0000000000000000 c00000000108b9c8 0000000000000008
> > GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001
> > GPR24: c0000000010d04c4 c00000000108b900 c0000000010d04c0 0000000000000001
> > GPR28: 000000007db14ba6 0000000000000000 000000000000006e c0000000010d3b18
> > NIP [c00000000001f24c] real_readb+0x44/0x68
> > LR [c000000000074258] udbg_real_scc_putc+0x38/0x80
>
> Looks like you were spinning waiting for the serial port to become
> ready, long enough that a decrementer came in.
>
> That seems kind of odd, but then the system managed to boot, so I guess
> it resolved itself.
Tried without the serial port card inserted and without serial debug boot parameters but I still get it on 6.7-rc7 on the rare occasions the machine successfully finishes booting:
[...]
smp: Bringing up secondary CPUs ...
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc7-PMacG5-dirty #3
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
NIP: c000000000be4d20 LR: c0000000000ab060 CTR: c000000000019290
REGS: c0000000031032f0 TRAP: 0700 Not tainted (6.7.0-rc7-PMacG5-dirty)
MSR: 9000000000021032 <SF,HV,ME,IR,DR,RI> CR: 24040844 XER: 20000000
IRQMASK: 3
GPR00: c0000000000aaff4 c000000003103590 c000000000e73b00 0000000000000000
GPR04: c000000003103570 c000000003103578 0000000000000000 000000047ef35000
GPR08: c00000047fe059b0 c000000000ed09b0 c000000000ed09c0 0000000000000000
GPR12: c000000000019290 c00000000114a000 c00000000000dc10 0000000000000000
GPR16: 0000000000000000 0000000000000000 c0000000010cf9c8 0000000000000008
GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001
GPR24: c0000000011104c4 c0000000010cf900 c0000000011104c0 000000047ef35000
GPR28: c000000003103640 c000000000ecb400 c000000001113b18 c000000003170000
NIP [c000000000be4d20] ct_nmi_exit+0xa0/0xc0
LR [c0000000000ab060] irq_exit+0x90/0x110
Call Trace:
[c000000003103590] [c0000000000aaff4] irq_exit+0x24/0x110 (unreliable)
[c0000000031035c0] [c0000000000196f0] timer_interrupt+0x1b0/0x370
[c000000003103610] [c0000000000089f4] decrementer_common_virt+0x214/0x220
--- interrupt: 900 at real_readb+0x44/0x68
NIP: c00000000001f24c LR: c000000000074270 CTR: 0000000000000000
REGS: c000000003103640 TRAP: 0900 Not tainted (6.7.0-rc7-PMacG5-dirty)
MSR: 9000000000009022 <SF,HV,EE,ME,IR,RI> CR: 24040244 XER: 20000000
IRQMASK: 0
GPR00: 9000000000009022 c0000000031038e0 c000000000e73b00 0000000000000071
GPR04: 0000000080013030 0000011000000040 0000001000000040 9000000000009032
GPR08: 0000000000000000 c000000001113b00 0000000000000004 0000000000000000
GPR12: c00000000006bca0 c00000000114a000 c00000000000dc10 0000000000000000
GPR16: 0000000000000000 0000000000000000 c0000000010cf9c8 0000000000000008
GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001
GPR24: c0000000011104c4 c0000000010cf900 c0000000011104c0 0000000000000001
GPR28: 000000007db14ba6 000000000000006f c000000001113b18 00000000000026e8
NIP [c00000000001f24c] real_readb+0x44/0x68
LR [c000000000074270] udbg_real_scc_putc+0x50/0xa0
--- interrupt: 900
[c0000000031038e0] [c000000000968ba4] of_get_cpu_hwid+0x54/0x120 (unreliable)
[c000000003103920] [c0000000000743e0] udbg_adb_putc+0x30/0x50
[c000000003103940] [c00000000001e7d4] udbg_puts+0x64/0xb0
[c000000003103970] [c000000000e76124] udbg_progress+0x18/0x30
[c000000003103990] [c000000000075a40] smp_core99_kick_cpu+0x140/0x180
[c000000003103a10] [c00000000003371c] __cpu_up+0x12c/0x3c0
[c000000003103ad0] [c0000000000a3088] bringup_cpu+0x68/0x200
[c000000003103b30] [c0000000000a1bf0] cpuhp_invoke_callback+0x170/0x300
[c000000003103b80] [c0000000000a45b8] _cpu_up.constprop.0+0x308/0x730
[c000000003103c10] [c0000000000a4b0c] cpu_up+0x12c/0x180
[c000000003103ca0] [c000000000e8372c] bringup_nonboot_cpus+0x7c/0xec
[c000000003103cf0] [c000000000e8a31c] smp_init+0x40/0xa4
[c000000003103d50] [c000000000e71300] kernel_init_freeable+0x188/0x358
[c000000003103de0] [c00000000000dc38] kernel_init+0x30/0x158
[c000000003103e50] [c00000000000bf94] ret_from_kernel_user_thread+0x14/0x1c
--- interrupt: 0 at 0x0
Code: 39200000 38600004 f9280010 4bfffd4d e92d0128 38210020 81290000 e8010010 7c0803a6 4e800020 60000000 60000000 <0fe00000> 4bffff9c 60000000 60000000
---[ end trace 0000000000000000 ]---
> I don't see any obvious clues to why it's booting sometimes and not
> others. Maybe when it doesn't boot it's getting stuck forever in that
> real_readb() loop? We could try adding a maximum retry count, eg
> something like:
>
> diff --git a/arch/powerpc/platforms/powermac/udbg_scc.c b/arch/powerpc/platforms/powermac/udbg_scc.c
> index 1b7c39e841ee..b28c9a729fbf 100644
> --- a/arch/powerpc/platforms/powermac/udbg_scc.c
> +++ b/arch/powerpc/platforms/powermac/udbg_scc.c
> @@ -165,8 +165,11 @@ void __init udbg_scc_init(int force_scc)
> #ifdef CONFIG_PPC64
> static void udbg_real_scc_putc(char c)
> {
> - while ((real_readb(sccc) & SCC_TXRDY) == 0)
> - ;
> + int i = 0;
> +
> + while (((real_readb(sccc) & SCC_TXRDY) == 0) && i < 10000)
> + i++;
> +
> real_writeb(c, sccd);
> if (c == '\n')
> udbg_real_scc_putc('\r');
>
>
> But that's just a total stab in the dark :)
Thanks for patch! But unfortunately it does not seem to make a difference.
Also I set IRQ_ALL_CPUS=n but that does not change a thing either.
dmesg of v6.7-rc7 w. your patch attached.
Regards,
Erhard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config_67-rc7_g5-
Type: application/octet-stream
Size: 102548 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20231231/9e4244c8/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg_67-rc7_g5
Type: application/octet-stream
Size: 50652 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20231231/9e4244c8/attachment-0003.obj>
More information about the Linuxppc-dev
mailing list