Very unreliable booting (WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0) with PPC_EARLY_DEBUG_G5 set on a PowerMac G5, kernel 6.7-rc6

Erhard Furtner erhard_f at mailbox.org
Fri Jan 12 02:59:23 AEDT 2024


On Tue, 9 Jan 2024 16:40:13 +1100
Rohan McLure <rmclure at linux.ibm.com> wrote:

> On 12/21/23 09:20, Erhard Furtner wrote:
> > Greetings!
> > 
> > I wanted to check whether there are any changes on issue https://lore.kernel.org/all/20231114003721.4a9bfd37@yea/T/ on kernel 6.7-rc. KCSAN enabled kernels still won't boot on this G5 it seems.  
> 
> Hi Erhard,
> 
> Apologies for the late response on both of your queries. I believe that
> this patch should allow you to boot with KCSAN:
> Link:
> https://lore.kernel.org/all/20231127054648.1205221-4-rmclure@linux.ibm.com/
> 
> I wasn't able to replicate the Floating Point unavailable panic in qemu,
> which puts into question whether this patches addresses the problem that
> is presenting for you, or another probabilistic bug impeding startup.

Hi Rohan,

thanks for looking into this!

The patch applies on top of v6.7 but unfortunately it does not work out. The machine still only very rarely boots up with PPC_EARLY_DEBUG_G5 enabled. Does not seem to matter much whether KSCAN, KUEP, KUAP are enabled or disabled.

I got one successful boot with KSCAN, KUEP, KUAP all disabled. Here again dmesg of the booted machine shows this "WARNING: CPU: 0 PID: 1 at kernel/context_tracking.c:215 ct_nmi_exit+0xa0/0xc0" and

NIP [c00000000001d7fc] real_readb+0x44/0x68
LR [c000000000070c58] udbg_real_scc_putc+0x38/0x80
--- interrupt: 900
[c0000000031038f0] [c000000002641294] 0xc000000002641294 (unreliable)
[c000000003103920] [c000000000070dc0] udbg_adb_putc+0x30/0x50
[c000000003103940] [c00000000001cd84] udbg_puts+0x64/0xb0
[c000000003103970] [c000000000e6011c] udbg_progress+0x18/0x30
[c000000003103990] [c000000000072420] smp_core99_kick_cpu+0x140/0x180
[c000000003103a10] [c000000000030d1c] __cpu_up+0x12c/0x3c0
[c000000003103ad0] [c00000000009f2d8] bringup_cpu+0x68/0x200
[c000000003103b30] [c00000000009de40] cpuhp_invoke_callback+0x170/0x300
[c000000003103b80] [c0000000000a0808] _cpu_up.constprop.0+0x308/0x730
[c000000003103c10] [c0000000000a0d5c] cpu_up+0x12c/0x180
[c000000003103ca0] [c000000000e6d368] bringup_nonboot_cpus+0x7c/0xec
[c000000003103cf0] [c000000000e73f58] smp_init+0x40/0xa4
[c000000003103d50] [c000000000e5b300] kernel_init_freeable+0x188/0x358
[c000000003103de0] [c00000000000d138] kernel_init+0x30/0x158
[c000000003103e50] [c00000000000b4c4] ret_from_kernel_user_thread+0x14/0x1c
--- interrupt: 0 at 0x0
Code: 39200000 38600004 f9280010 4bfffd4d e92d0128 38210020 81290000 e8010010 7c0803a6 4e800020 60000000 60000000 <0fe00000> 4bffff9c 60000000 60000000


Looks like this is one of the hard-to-track down bugs... ;)

Kernel .config and full dmesg attached.

Regards,
Erhard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg_670_g5
Type: application/octet-stream
Size: 50217 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20240111/342dd9f7/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config_670_g5-
Type: application/octet-stream
Size: 102552 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20240111/342dd9f7/attachment-0003.obj>


More information about the Linuxppc-dev mailing list