KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

Erhard Furtner erhard_f at mailbox.org
Thu Aug 24 10:00:15 AEST 2023


On Tue, 22 Aug 2023 07:31:54 +0000
Christophe Leroy <christophe.leroy at csgroup.eu> wrote:

> Le 18/08/2023 à 18:23, Erhard Furtner a écrit :
> > On Fri, 18 Aug 2023 15:47:38 +0000
> > Christophe Leroy <christophe.leroy at csgroup.eu> wrote:
> >   
> >> I'm wondering if the problem is just linked to the kernel being built
> >> with CONFIG_SMP or if it is the actual startup of a secondary CPU that
> >> cause the freeze.
> >>
> >> Please leave the btext_unmap() in place because I think it is important
> >> to keep it, and start the kernel with the following parameter:
> >>
> >> nr_cpus=1  
> > 
> > With btext_unmap() back and place and nr_cpus=1 set the freeze still happens after the 1st btext_unmap:129 on cold boots:
> > 
> > [    0.000000] printk: bootconsole [udbg0] enabled
> > [    0.000000] Total memory = 2048MB; using 4096kB for hash table
> > [    0.000000] mapin_ram:125
> > [    0.000000] mmu_mapin_ram:169 0 30000000 1400000 2000000
> > [    0.000000] __mmu_mapin_ram:146 0 1400000
> > [    0.000000] __mmu_mapin_ram:155 1400000
> > [    0.000000] __mmu_mapin_ram:146 1400000 30000000
> > [    0.000000] __mmu_mapin_ram:155 20000000
> > [    0.000000] __mapin_ram_chunk:107 20000000 30000000
> > [    0.000000] __mapin_ram_chunk:117
> > [    0.000000] mapin_ram:134
> > [    0.000000] kasan_mmu_init:129
> > [    0.000000] kasan_mmu_init:132 0
> > [    0.000000] kasan_mmu_init:137
> > [    0.000000] btext_unmap:129
> >   
> 
> Thanks,
> 
> Can you replace the call to btext_unmap() by a call to btext_map() at 
> the end of MMU_init() ?
> 
> If that gives no interesting result, can you leave the call to 
> btext_unmap() and add a call to btext_map() at the very begining of 
> function start_kernel() in init/main.c (You may have to add a include of 
> asm/btext.h)
> 
> With that I hope we can see more stuff.

Ok, I tested out both methods.

  1.) Replace btext_unmap() with btext_map() at the end of MMU_init().

Warm boot again is unspectacular (attached). On cold boots I sometimes get:

printk: bootconsole [udbg0] enabled
Total memory = 2048MB; using 4096kB for hash table
mapin_ram:125
mmu_mapin_ram:169 0 30000000 1400000 2000000
__mmu_mapin_ram:146 0 1400000
__mmu_mapin_ram:155 1400000
__mmu_mapin_ram:146 1400000 30000000
__mmu_mapin_ram:155 20000000
__mapin_ram_chunk:107 20000000 30000000
__mapin_ram_chunk:117
mapin_ram:134
kasan_mmu_init:129
kasan_mmu_init:132 0
kasan_mmu_init:137
ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023

which shows one line (Linux version...) more than before. Most of the time I get this more interesting output however:

kasan_mmu_init:129
kasan_mmu_init:132 0
kasan_mmu_init:137
Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023
KASAN init done
list_add corruption. prev->next should be next (c17100c0), but was 2c030000. (prev=c036ac7c).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:30!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Tainted: G               T  ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q
NIP:  c0012c64 LR: c0012c58 CTR: 00000000
REGS: c1717d10 TRAP: 0700   Tainted: G               T   (∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q)
MSR:  00021032 <ME,IR,DR,RI>  CR: 24000484 XER: 00000000

GPR00: 00000000 c1717dc0 c1551c40 00000000 00000000 00000000 00000000 00000000
GPR08: 00000000 00000000 00000000 00000000 00000000 00000000 00dd6f30 021f6e90
GPR16: 021f69b0 02201994 00dd6f3c efff3190 00000000 c1717f10 c1455dd8 c11ec6c0
GPR24: 00001032 c0fab540 c1717e10 00000005 c1740000 c1740000 c1746380 c1555a20
NIP [c0012c64] die+0xd8/0x39c
LR [c0012c58] die+0xcc/0x39c
Call Trace:
[c1717dc0] [c0012c58] die+0xcc/0x39c (unreliable)
[c1717e00] [c00047f0] ProgramCheck_virt+0x100/0x150
--- interrupt: 700 at __list_add_valid+0xe8/0x120
NIP:  c0854ca0 LR: c0854ca0 CTR: 00000000
REGS: c1717e10 TRAP: 0700   Tainted: G                T   (∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q)
MSR:  00021032 <ME,IR,DR,RI>  CR: 24000488  XER: 00000000

GPR00: 00000000 c1717ec0 c1551c40 0000005d 00000000 00000000 00000000 00000000
GPR08: 00000000 00000000 00000000 00000000 00000000 00000000 00dd6f30 021f6e90
GPR16: 021f69b0 02201994 00dd6f3c efff3190 00000000 c1717f10 c1455dd8 c11ec6c0
GPR24: f82e2fde c11ec680 fefefefe c11ec700 effec7a8 c036ac7c c036ac7c c17100c0
NIP [c0854ca0] __list_add_valid+0xe8/@x120
LR [c0854ca0] __list_add_valid+0xe8/@x120
--- interrupt: 700
[c1717ee8] [c0c18764] of_alias_scan+0x330/0x44c
[c1717f70] [c140b0fc] setup_arch+0x78/0x44c
[c1717fc0] [c14045b0] start_kernel+0x78/0x2d8
[c1717ff0] [000035d0] 0x35d0
Code: 3fa0c174 915f0060 39290001 913e0040 480f602d 38600001 488189f1 387db620 4835654d 813db620 2c090000 40820008 <0fe00000> 80de0080 3fa0c0fb 3ee0c172
---[ end trace 0000000000000000 ]---
Oops: Exception in kernel mode, sig: 5 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Tainted: G               T  ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q
NIP:  c0854ca0 LR: c0854ca0 CTR: 00000000
REGS: c1717e10 TRAP: 0700   Tainted: G               T   (∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎irty⊥q)
MSR:  00021032 <ME,IR,DR,RI>  CR: 24000488 XER: 00000000

GPR00: 00000000 c1717ec0 c1551c40 0000005d 00000000 00000000 00000000 00000000
GPR08: 00000000 00000000 00000000 00000000 00000000 00000000 00dd6f30 021f6e90
GPR16: 021f69b0 02201994 00dd6f3c efff3190 00000000 c1717f10 c1455dd8 c11ec6c0
GPR24: f82e2fde c11ec680 fefefefe c11ec700 effec7a8 c036ac7c c036ac7c c17100c0
NIP [c0854ca0] __list_add_valid+0xe8/@x120
LR [c0854ca0] __list_add_valid+0xe8/0x120
Call Trace:
[c1717ec0] [c0854ca0] __list_add_valid+0xe8/0x120 (unreliable)
[c1717ee0] [c0c18764] of_alias_scan+0x330/0x44c
[c1717f70] [c140b0fc] setup_arch+0x78/0x44c
[c1717fc0] [c14045b0] start_kernel+0x78/0x2d8
[c1717ff0] [000035d0] 0x35d0
Code: 7fc5f378 38636060 7f84e378 38630200 4b8b9ec9 0fe00000 3c68c110 7fa6eb78 7fe4fb78 38630180 4b8b9ead <0fe00000> 3c60c110 7fe6fb78 7fa5eb78
---[ end trace 0000000000000000 ]---


  2.) Add btext_map() at the very begining of function start_kernel() in init/main.c:

On cold boots I sometimes get:

printk: bootconsole [udbg0] enabled
Total memory = 2048MB; using 4096kB for hash table
mapin_ram:125
mmu_mapin_ram:169 0 30000000 1400000 2000000
__mmu_mapin_ram:146 0 1400000
__mmu_mapin_ram:155 1400000
__mmu_mapin_ram:146 1400000 30000000
__mmu_mapin_ram:155 20000000
__mapin_ram_chunk:107 20000000 30000000
__mapin_ram_chunk:117
mapin_ram:134
kasan_mmu_init:129
kasan_mmu_init:132 0
kasan_mmu_init:137
btext_unmap:129
btext_unmap:131
Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Wed Aug 23 13:59:00 CEST 2023

which shows one line (Linux version...) more than before. Most of the time I get this more interesting output however:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c
Modules linked in:
BUG: Kernel NULL pointer dereference on read at 0x00000050
Faulting instruction address: 0xc014e3bc
Thread overran stack, or stack corrupted
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c
Modules linked in:
BUG: Kernel NULL pointer dereference on read at 0x00000050
Faulting instruction address: 0xc014e3bc
Thread overran stack, or stack corrupted
[...]

Repeated 10-11 times. In both cases I needed to transcribe the dmesg from a picture I took from the screen + OCR. Hope the numbers are correct.

Regards,
Erhard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg_65-rc7_g4_00
Type: application/octet-stream
Size: 51084 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20230824/4dbc09cc/attachment-0001.obj>


More information about the Linuxppc-dev mailing list