Very unreliable booting (WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0) with PPC_EARLY_DEBUG_G5 set on a PowerMac G5, kernel 6.7-rc6

Erhard Furtner erhard_f at mailbox.org
Mon Jan 1 05:43:29 AEDT 2024


On Thu, 21 Dec 2023 16:30:34 +1100
Michael Ellerman <mpe at ellerman.id.au> wrote:

> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc6-PMacG5 #4
> > Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
> > NIP:  c000000000bbf320 LR: c0000000000ab040 CTR: 0000000000000000
> > REGS: c000000003103300 TRAP: 0700   Not tainted  (6.7.0-rc6-PMacG5)
> > MSR:  9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 24040840  XER: 20000000
> > IRQMASK: 1 
> > GPR00: c0000000000ab068 c0000000031035a0 c000000000e43500 0000000000000000 
> > GPR04: c00000000fffbeb0 c00000000fffbeb8 0000000000000000 000000047ef66000 
> > GPR08: c00000047fe059b0 c000000000e9f9b0 c000000000e9f9c0 0000000000000000 
> > GPR12: c000000000149e10 c00000000110a000 c00000000000dc10 0000000000000000 
> > GPR16: 0000000000000000 0000000000000000 c00000000108b9c8 0000000000000008 
> > GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 
> > GPR24: c0000000010d04c4 c00000000108b900 c0000000010d04c0 000000047ef66000 
> > GPR28: c000000003103650 c000000000e9a400 000000000000006e c000000003170000 
> > NIP [c000000000bbf320] ct_nmi_exit+0xa0/0xc0
> > LR [c0000000000ab040] irq_exit+0x90/0x110
> > Call Trace:
> > [c0000000031035a0] [c0000000000ab068] irq_exit+0xb8/0x110 (unreliable)
> > [c0000000031035d0] [c0000000000196f0] timer_interrupt+0x1b0/0x370
> > [c000000003103620] [c0000000000089f4] decrementer_common_virt+0x214/0x220
> > --- interrupt: 900 at real_readb+0x44/0x68
> > NIP:  c00000000001f24c LR: c000000000074258 CTR: c0000000000087c0
> > REGS: c000000003103650 TRAP: 0900   Not tainted  (6.7.0-rc6-PMacG5)
> > MSR:  9000000000009022 <SF,HV,EE,ME,IR,RI>  CR: 24040242  XER: 20000000
> > IRQMASK: 0 
> > GPR00: 9000000000009022 c0000000031038f0 c000000000e43500 0000000000000071 
> > GPR04: 0000000080013030 0000011000000040 0000001000000040 9000000000009032 
> > GPR08: 0000000000000000 c0000000010d3500 000000047ef66000 0000000052454753 
> > GPR12: 9000000000009032 c00000000110a000 c00000000000dc10 0000000000000000 
> > GPR16: 0000000000000000 0000000000000000 c00000000108b9c8 0000000000000008 
> > GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 
> > GPR24: c0000000010d04c4 c00000000108b900 c0000000010d04c0 0000000000000001 
> > GPR28: 000000007db14ba6 0000000000000000 000000000000006e c0000000010d3b18 
> > NIP [c00000000001f24c] real_readb+0x44/0x68
> > LR [c000000000074258] udbg_real_scc_putc+0x38/0x80  
> 
> Looks like you were spinning waiting for the serial port to become
> ready, long enough that a decrementer came in.
> 
> That seems kind of odd, but then the system managed to boot, so I guess
> it resolved itself.

Tried without the serial port card inserted and without serial debug boot parameters but I still get it on 6.7-rc7 on the rare occasions the machine successfully finishes booting:

[...]
smp: Bringing up secondary CPUs ...
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc7-PMacG5-dirty #3
Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac
NIP:  c000000000be4d20 LR: c0000000000ab060 CTR: c000000000019290
REGS: c0000000031032f0 TRAP: 0700   Not tainted  (6.7.0-rc7-PMacG5-dirty)
MSR:  9000000000021032 <SF,HV,ME,IR,DR,RI>  CR: 24040844  XER: 20000000
IRQMASK: 3 
GPR00: c0000000000aaff4 c000000003103590 c000000000e73b00 0000000000000000 
GPR04: c000000003103570 c000000003103578 0000000000000000 000000047ef35000 
GPR08: c00000047fe059b0 c000000000ed09b0 c000000000ed09c0 0000000000000000 
GPR12: c000000000019290 c00000000114a000 c00000000000dc10 0000000000000000 
GPR16: 0000000000000000 0000000000000000 c0000000010cf9c8 0000000000000008 
GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 
GPR24: c0000000011104c4 c0000000010cf900 c0000000011104c0 000000047ef35000 
GPR28: c000000003103640 c000000000ecb400 c000000001113b18 c000000003170000 
NIP [c000000000be4d20] ct_nmi_exit+0xa0/0xc0
LR [c0000000000ab060] irq_exit+0x90/0x110
Call Trace:
[c000000003103590] [c0000000000aaff4] irq_exit+0x24/0x110 (unreliable)
[c0000000031035c0] [c0000000000196f0] timer_interrupt+0x1b0/0x370
[c000000003103610] [c0000000000089f4] decrementer_common_virt+0x214/0x220
--- interrupt: 900 at real_readb+0x44/0x68
NIP:  c00000000001f24c LR: c000000000074270 CTR: 0000000000000000
REGS: c000000003103640 TRAP: 0900   Not tainted  (6.7.0-rc7-PMacG5-dirty)
MSR:  9000000000009022 <SF,HV,EE,ME,IR,RI>  CR: 24040244  XER: 20000000
IRQMASK: 0 
GPR00: 9000000000009022 c0000000031038e0 c000000000e73b00 0000000000000071 
GPR04: 0000000080013030 0000011000000040 0000001000000040 9000000000009032 
GPR08: 0000000000000000 c000000001113b00 0000000000000004 0000000000000000 
GPR12: c00000000006bca0 c00000000114a000 c00000000000dc10 0000000000000000 
GPR16: 0000000000000000 0000000000000000 c0000000010cf9c8 0000000000000008 
GPR20: 000000000000005d 0000000000000000 0000000000000000 0000000000000001 
GPR24: c0000000011104c4 c0000000010cf900 c0000000011104c0 0000000000000001 
GPR28: 000000007db14ba6 000000000000006f c000000001113b18 00000000000026e8 
NIP [c00000000001f24c] real_readb+0x44/0x68
LR [c000000000074270] udbg_real_scc_putc+0x50/0xa0
--- interrupt: 900
[c0000000031038e0] [c000000000968ba4] of_get_cpu_hwid+0x54/0x120 (unreliable)
[c000000003103920] [c0000000000743e0] udbg_adb_putc+0x30/0x50
[c000000003103940] [c00000000001e7d4] udbg_puts+0x64/0xb0
[c000000003103970] [c000000000e76124] udbg_progress+0x18/0x30
[c000000003103990] [c000000000075a40] smp_core99_kick_cpu+0x140/0x180
[c000000003103a10] [c00000000003371c] __cpu_up+0x12c/0x3c0
[c000000003103ad0] [c0000000000a3088] bringup_cpu+0x68/0x200
[c000000003103b30] [c0000000000a1bf0] cpuhp_invoke_callback+0x170/0x300
[c000000003103b80] [c0000000000a45b8] _cpu_up.constprop.0+0x308/0x730
[c000000003103c10] [c0000000000a4b0c] cpu_up+0x12c/0x180
[c000000003103ca0] [c000000000e8372c] bringup_nonboot_cpus+0x7c/0xec
[c000000003103cf0] [c000000000e8a31c] smp_init+0x40/0xa4
[c000000003103d50] [c000000000e71300] kernel_init_freeable+0x188/0x358
[c000000003103de0] [c00000000000dc38] kernel_init+0x30/0x158
[c000000003103e50] [c00000000000bf94] ret_from_kernel_user_thread+0x14/0x1c
--- interrupt: 0 at 0x0
Code: 39200000 38600004 f9280010 4bfffd4d e92d0128 38210020 81290000 e8010010 7c0803a6 4e800020 60000000 60000000 <0fe00000> 4bffff9c 60000000 60000000 
---[ end trace 0000000000000000 ]---

> I don't see any obvious clues to why it's booting sometimes and not
> others. Maybe when it doesn't boot it's getting stuck forever in that
> real_readb() loop? We could try adding a maximum retry count, eg
> something like:
> 
> diff --git a/arch/powerpc/platforms/powermac/udbg_scc.c b/arch/powerpc/platforms/powermac/udbg_scc.c
> index 1b7c39e841ee..b28c9a729fbf 100644
> --- a/arch/powerpc/platforms/powermac/udbg_scc.c
> +++ b/arch/powerpc/platforms/powermac/udbg_scc.c
> @@ -165,8 +165,11 @@ void __init udbg_scc_init(int force_scc)
>  #ifdef CONFIG_PPC64
>  static void udbg_real_scc_putc(char c)
>  {
> -	while ((real_readb(sccc) & SCC_TXRDY) == 0)
> -		;
> +	int i = 0;
> +	
> +	while (((real_readb(sccc) & SCC_TXRDY) == 0) && i < 10000)
> +		i++;
> +
>  	real_writeb(c, sccd);
>  	if (c == '\n')
>  		udbg_real_scc_putc('\r');
> 
> 
> But that's just a total stab in the dark :)

Thanks for patch! But unfortunately it does not seem to make a difference.

Also I set IRQ_ALL_CPUS=n but that does not change a thing either.

dmesg of v6.7-rc7 w. your patch attached.

Regards,
Erhard

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config_67-rc7_g5-
Type: application/octet-stream
Size: 102548 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20231231/9e4244c8/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg_67-rc7_g5
Type: application/octet-stream
Size: 50652 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20231231/9e4244c8/attachment-0003.obj>


More information about the Linuxppc-dev mailing list