Fail to boot 5.15 on mpc8347 with either debug_pagealloc or nobats

Christophe Leroy christophe.leroy at csgroup.eu
Fri Dec 3 23:49:06 AEDT 2021


Hi Maxime,


Le 03/12/2021 à 01:44, Maxime Bizon a écrit :
> 
> Hello Christophe,
> 
> I have a mpc8347 board booting 5.15 fine, but it does not boot with
> CONFIG_DEBUG_PAGEALLOC=y (and enabled) or "nobats".
> 
> Those two options worked fine on my previous kernel (5.4)
> 
> 
> Nothing is output on serial console when I set those options, so I had
> to hack a bit:
> 
> 1) to ease debugging, I used mem=256M, so only a few BATs are used
> 
> 2) I hijacked BAT 7 and the vm space at b000xxxx (modules, unused for
> me) to set a constant mapping so my uart always works (patch attached
> in case)
> 
> 
> I got this:
> 
> mmu_mapin_ram: base:0x0 top:0x10000000 border:0x600000
> mmu_mapin_ram: updated base:0x0 top:0x600000
> setbat index=0 virt=0xc0000000 phys=0x0 size=0x400000
> setbat index=1 virt=0xc0400000 phys=0x400000 size=0x200000
> [    0.000000] kernel tried to execute exec-protected page (c0600944) - exploit attempt? (uid: 0)
> [    0.000000] BUG: Unable to handle kernel instruction fetch
> [    0.000000] Faulting instruction address: 0xc0600944
> [    0.000000] Thread overran stack, or stack corrupted
> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.000000] BE PAGE_SIZE=4K DEBUG_PAGEALLOC
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #237
> [    0.000000] NIP:  c0600944 LR: 00003390 CTR: 00000000
> [    0.000000] REGS: c07bbf40 TRAP: 0400   Not tainted  (5.15.0+)
> [    0.000000] MSR:  20001032 <ME,IR,DR,RI>  CR: 48222444  XER: 20000000
> [    0.000000]
> [    0.000000] GPR00: c0003364 c07bbff0 c0707580 c0600944 00001032 00000000 c077dc8c 00000000
> [    0.000000] GPR08: c07d0000 00000001 c07d0000 ffffffff 88222444 00000000 3ff9c5f0 3fffd79c
> [    0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    0.000000] GPR24: 00000000 00000000 40000000 3ff9c5d8 03ffb000 00000000 3fffe4d4 03ffb000
> [    0.000000] NIP [c0600944] start_kernel+0x0/0x600
> [    0.000000] LR [00003390] 0x3390
> [    0.000000] Call Trace:
> [    0.000000] [c07bbff0] [c0003364] start_here+0x4c/0x90 (unreliable)
> [    0.000000] Instruction dump:
> [    0.000000] 83e1000c 7c0803a6 38210010 4e800020 4e800020 4ba05bb8 4e800020 4e800020
> [    0.000000] 4e800020 4e800020 4e800020 4e800020 <9421ffc0> 3c60c070 7c0802a6 38637580
> [    0.000000] random: get_random_bytes called from oops_exit+0x44/0x84 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000]
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] Rebooting in 30 seconds..
> 
> 
> 
> It gets a page fault when jumping on start_kernel(), it seems the
> init.text section is not mapped, at least not in execute.
> 
> debug_pagealloc_enabled changes behaviour of mmu_mapin_ram(), it will
> map only up to __init_begin with BATs, and use page mappings for the
> reminder.
> 
> my debug prints confirm that it seems to do a correct job at it, I've
> also verified that __mapin_ram_chunk() was indeed mapping the while
> init.text area with prot=PAGE_KERNEL_TEXT
> 
> but it's as if those page mappings had no effect at all, and I am not
> familiar with PPC MMU to dig further
> 
> Simply changing mmu_mapin_ram() to not to enter the "if
> debug_pagealloc_enabled_or_kfence()" makes the kernel boot fine.
> 
> If you have any guidance that would be appreciated
> 

Thanks for the report.

This problem doesn't happen on powermac on QEMU.
I was able to reproduce this problem on an mpc8321 board.

It happens when CONFIG_MODULES is not defined, in that case the 
Instruction TLB miss exception handler doesn't expect such exception at 
all because all kernel text is expected to be mapped with IBATs. 
However, due to DEBUGPAGE_ALLOC, only main text is mapped with BATs, not 
inittext. That's a mistake. inittext should still be mapped with BATs.

When CONFIG_MODULES is set it works.

One way to fix it is to drop this CONFIG_MODULES #ifdefs in instruction 
TLB miss handler (In kernel/head_book3s_32.S), but that would kill 
performance for just the sake init.

Another way to fix it is to set an IBAT coverring up to _einittext. This 
IBAT should be removed by mark_initmem_nx() at the end of init ... but 
... it looks like we have a problem there as well: as we have not mapped 
_sinittext by DBATs, mmu_mark_initmem_nx() is not called.
Also, as we are setting an IBAT, we shoudn't set the pages executable, 
at least X bit should be cleared. But the way it is done, if we call 
mark_initmem_nx() then mark_initmem_nx() would call set_memory_attr() to 
clear X bit from the pages. At the end it's not a problem because the 
kernel segments are marked NX, but it's not clean.

So at the end it seems to be a mess around DEBUGPAGE_ALLOC and 
STRICT_KERNEL_RWX. All this being amplified by those 'nobats' and 
'noltlbs' options that are pointless from a functionnal point of view.

I need to think a bit more about it to find the cleanest solution that 
works for all platforms.

Christophe


More information about the Linuxppc-dev mailing list