KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)
Christophe Leroy
christophe.leroy at csgroup.eu
Thu Aug 31 15:32:46 AEST 2023
Le 28/08/2023 à 01:17, Erhard Furtner a écrit :
> On Thu, 24 Aug 2023 21:36:26 +1000
> Michael Ellerman <mpe at ellerman.id.au> wrote:
>
>>> printk: bootconsole [udbg0] enabled
>>> Total memory = 2048MB; using 4096kB for hash table
>>> mapin_ram:125
>>> mmu_mapin_ram:169 0 30000000 1400000 2000000
>>> __mmu_mapin_ram:146 0 1400000
>>> __mmu_mapin_ram:155 1400000
>>> __mmu_mapin_ram:146 1400000 30000000
>>> __mmu_mapin_ram:155 20000000
>>> __mapin_ram_chunk:107 20000000 30000000
>>> __mapin_ram_chunk:117
>>> mapin_ram:134
>>> kasan_mmu_init:129
>>> kasan_mmu_init:132 0
>>> kasan_mmu_init:137
>>> ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
>>> Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023
>>>
>>> which shows one line (Linux version...) more than before. Most of the time I get this more interesting output however:
>>>
>>> kasan_mmu_init:129
>>> kasan_mmu_init:132 0
>>> kasan_mmu_init:137
>>> Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023
>>> KASAN init done
>>> list_add corruption. prev->next should be next (c17100c0), but was 2c030000. (prev=c036ac7c).
>>> ------------[ cut here ]------------
>>> kernel BUG at lib/list_debug.c:30!
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 0 at arch/powerpc/include/asm/machdep.h:227 die+0xd8/0x39c
>>
>> This is a WARN hit while handling the original bug.
>>
>> Can you apply this patch to avoid that happening, so we can see the
>> original but better.
>>
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index eeff136b83d9..341a0635e131 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -198,8 +198,6 @@ static unsigned long oops_begin(struct pt_regs *regs)
>> die_owner = cpu;
>> console_verbose();
>> bust_spinlocks(1);
>> - if (machine_is(powermac))
>> - pmac_backlight_unblank();
>> return flags;
>> }
>> NOKPROBE_SYMBOL(oops_begin);
>>
>>
>> cheers
>
> Ok, so I tested now:
> Replace btext_unmap() with btext_map() at the end of MMU_init() + Michaels patch.
>
> With the patch I get interesting output less often, but when I do it's:
>
> printk: bootconsole [udbg0] enabled
> Total memory = 2048MB; using 4096kB for hash table
> mapin_ram:125
> mmu_mapin_ram:169 0 30000000 1400000 2000000
> __mmu_mapin_ram:146 0 1400000
> __mmu_mapin_ram:155 1400000
> __mmu_mapin_ram:146 1400000 30000000
> __mmu_mapin_ram:155 20000000
> __mapin_ram_chunk:107 20000000 30000000
> __mapin_ram_chunk:117
> mapin_ram:134
> kasan_mmu_init:129
> kasan_mmu_init:132 0
> kasan_mmu_init:137
> Linux version 6.5.0-rc7-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #4 SMP Wed Aug 23 12:59:11 CEST 2023
> KASAN init done
> BUG: spinlock bad magic on CPU#0, swapper/0
> lock: 0xc16cbc60, .magic: c036ab84, .owner: <none>/-1, .owner_cpu: -1
> CPU: 0 PID: 0 Comm: swapper Tainted: G T xxxxxxxxxxx
> Call Trace:
> [c1717c20] [c0f4e288] dump_stack_lvl+0x60/0xa4 (unreliable)
> [c1717c40] [c01065e8] do_raw_spin_lock+0x15c/0x1a8
> [c1717c70] [c0fa3890] _raw_spin_lock_irqsave+0x20/0x40
> [c1717c90] [c0c140ec] of_find_property+0x3c/0x140
> [c1717cc0] [c0c14204] of_get_property+0x14/0x4c
> [c1717ce0] [c0c22c6c] unlatten_dt_nodes+0x76c/0x894
> [c1717f10] [c0c22e88] __unflatten_device_tree+0xf4/0x244
> [c1717f50] [c1458050] unflatten_device_tree+0x48/0x84
> [c1717f70] [c140b100] setup_arch+0x78/0x44c
> [c1717fc0] [c14045b8] start_kernel+0x78/0x2d8
> [c1717ff0] [000035d0] 0x35d0
Ok so there is some corrupted memory somewhere.
Can you try what happens when you remove the call to kasan_init() at the
start of setup_arch() in arch/powerpc/kernel/setup-common.c
I'd also be curious to know what happens when CONFIG_DEBUG_SPINLOCK is
disabled.
Another question which I'm no sure I asked already: Is it a new problem
you have got with recent kernels or is it just that you never tried such
a config with older kernels ?
Also, when you say you need to start with another SMP kernel first and
then you don't have the problem anymore until the next cold reboot, do
you mean you have some old kernel with KASAN that works, or is it a
kernel without KASAN that you have to start first ?
Thanks
Christophe
>
>
> and then the freeze. Or less often I get:
>
> [...]
> Modules linked in: _various ASCII chars_ |(EK) _various ASCII chars_ §=(EKTN)
> BUG: Unable to handle kernel data access on read at 0x813f0200
> Faulting instruction address: 0xc014e444
> Thread overran stack, or stack corrupted
> Oops: Kernel access of bad area, sig: 11 [#3544]
> BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2
> Modules linked in: _various ASCII chars_ §=(EKTN)
> BUG: Unable to handle kernel data access on read at 0x813f0200
> Faulting instruction address: 0xc014e444
> Thread overran stack, or stack corrupted
> Oops: Kernel access of bad area, sig: 11 [#3545]
> BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2
>
>
> Number after "sig: 11" gets counted up rapidly to #3545 so I can't follow the output on the OF console. Remaining output on screen before the freeze are [#3535] to [#3545] but apart from the numbers the adresses in this output do not change. _various ASCII chars_ in the "Modules linked in:" stay the same but are special characters so hard to transcribe.
>
> Hope that helps.
>
> Regards,
> Erhard
More information about the Linuxppc-dev
mailing list