Fail to boot 5.15 on mpc8347 with either debug_pagealloc or nobats

Christophe Leroy christophe.leroy at csgroup.eu
Sat Dec 4 21:01:07 AEDT 2021



Le 03/12/2021 à 19:43, Maxime Bizon a écrit :
> 
> On Fri, 2021-12-03 at 12:49 +0000, Christophe Leroy wrote:
> 
> Hello,
> 
>> I need to think a bit more about it to find the cleanest solution
>> that works for all platforms.
> 
> Maybe related, when enabling KASAN on that same platform, it oopses early.

Unrelated I think, but thanks for the report.

> 
> I have picked the patch "powerpc/32s: Fix shift-out-of-bounds in KASAN
> init", and that does not fix it
> 
> 
> For some mem= values like 769M, all BATs are used for kernel linear
> mapping, and there are none left to map the KASAN shadow area in
> kasan/book3s_32.c => no oops
> 
> If I don't compile kasan/book3s_32.c and use weak implementation => no
> oops
> 
> 
> But for mem=768M, it oopses
> 
> I added some debugs in kasan init and dumped BATs content (BAT 7 is my
> debug BAT for uart)
> 
> [    0.000000] kasan_init_region: start=0xc0000000 size:0x30000000
> [    0.000000] kasan_init_region: k_start:0xf8000000 k_end:0xfe000000 k_size:0x6000000 k_size_base=0x2000000
> [    0.000000] kasan_init_region: IF{} k_size_more:0x4000000
> [    0.000000] setbat index=3 virt:0xf8000000 phys:0x2a000000 size:0x2000000
> [    0.000000] setbat index=4 virt:0xfa000000 phys:0x2c000000 size:0x4000000
> [    0.000000] kasan_init_region: final k_cur=0xfe000000
> [    0.000000]
> [    0.000000] ---[ Data Block Address Translation ]---
> [    0.000000] 0: 0xc0000000-0xcfffffff 0x00000000       256M Kernel rw      m
> [    0.000000] 1: 0xd0000000-0xdfffffff 0x10000000       256M Kernel rw      m
> [    0.000000] 2: 0xe0000000-0xefffffff 0x20000000       256M Kernel rw      m
> [    0.000000] 3: 0xf8000000-0xf9ffffff 0x2a000000        32M Kernel rw      m
> [    0.000000] 4: 0xfa000000-0xfdffffff 0x2c000000        64M Kernel rw      m
> [    0.000000] 5:         -
> [    0.000000] 6:         -
> [    0.000000] 7: 0xb0000000-0xb00fffff 0xe0000000         1M Kernel rw    i   g
> [    0.000000] BUG: Unable to handle kernel data access on read at 0xfd3fce00
> [    0.000000] Faulting instruction address: 0xc013ed84
> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.000000] BE PAGE_SIZE=4K
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #379
> [    0.000000] NIP:  c013ed84 LR: c0140264 CTR: 00000020
> [    0.000000] REGS: c0b07dd0 TRAP: 0300   Not tainted  (5.15.0+)
> [    0.000000] MSR:  00001032 <ME,IR,DR,RI>  CR: 28222448  XER: 00000000
> [    0.000000] DAR: fd3fce00 DSISR: 20000000
> [    0.000000] GPR00: fd3fd000 c0b07e80 c09c8a20 0000003f 00001000 00000001 c08c67a8 e9fe7fff
> [    0.000000] GPR08: e9fe7000 fd3fce00 00000020 fd3fcfff 00000000 00000000 3ff9c5f0 3fffd79c
> [    0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    0.000000] GPR24: 00000000 feffffff 00000591 c0b22000 ffffffff 00000000 e9fe7000 00001000
> [    0.000000] NIP [c013ed84] kasan_check_range+0x98/0x2c0
> [    0.000000] LR [c0140264] memset+0x34/0x80
> [    0.000000] Call Trace:
> [    0.000000] [c0b07e80] [c08c630c] memblock_alloc_internal+0x9c/0x108 (unreliable)
> [    0.000000] [c0b07e90] [feffffff] 0xfeffffff
> [    0.000000] [c0b07eb0] [c08c67a8] memblock_alloc_try_nid+0xf4/0x128
> [    0.000000] [c0b07f30] [c08bb7ac] kasan_init_shadow_page_tables+0x84/0x1cc
> [    0.000000] [c0b07f60] [c08bba40] kasan_init+0xdc/0x184
> [    0.000000] [c0b07f90] [c08b8108] setup_arch+0x18/0x1c4
> [    0.000000] [c0b07fc0] [c08b3fd4] start_kernel+0x5c/0x2d4
> [    0.000000] [c0b07ff0] [000033c0] 0x33c0
> [    0.000000] Instruction dump:
> [    0.000000] 93e1000c 83c90000 83e90004 7fdffb79 83c10008 83e1000c 408201cc 2c030000
> [    0.000000] 39290008 41820034 554af87e 7d4903a6 <80690000> 81490004 7c6a5379 408201a8
> 
> 
> It makes no sense to me that we get that fault with a valid BAT
> covering that area, BAT are not supposed to be checked first ?
> 


In fact BAT4 is wrong. Both virtual and physical address of a 64M BAT 
must be 64M aligned. I think the display is wrong as well (You took it 
from ptdump ?), BEPI and BRPN must be anded with complement of BL.

So here your 64M BAT maps 0xf8000000-0xfbffffff, therefore the address 
0xfd3fce00 is not mapped by any BAT hence the OOPS.

Christophe


More information about the Linuxppc-dev mailing list