KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

Erhard Furtner erhard_f at mailbox.org
Thu Feb 29 10:55:46 AEDT 2024


On Thu, 14 Sep 2023 04:54:17 +0000
Christophe Leroy <christophe.leroy at csgroup.eu> wrote:

> Le 12/09/2023 à 19:39, Christophe Leroy a écrit :
> > 
> > 
> > Le 12/09/2023 à 17:59, Erhard Furtner a écrit :  
> >>
> >> printk: bootconsole [udbg0] enabled
> >> Total memory = 2048MB; using 4096kB for hash table
> >> mapin_ram:125
> >> mmu_mapin_ram:169 0 30000000 1400000 2000000
> >> __mmu_mapin_ram:146 0 1400000
> >> __mmu_mapin_ram:155 1400000
> >> __mmu_mapin_ram:146 1400000 30000000
> >> __mmu_mapin_ram:155 20000000
> >> __mapin_ram_chunk:107 20000000 30000000
> >> __mapin_ram_chunk:117
> >> mapin_ram:134
> >> kasan_mmu_init:129
> >> kasan_mmu_init:132 0
> >> kasan_mmu_init:137
> >> ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
> >> Linux version 6.6.0-rc1-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Tue Sep 12 16:50:47 CEST 2023
> >> kasan_init_region: c0000000 30000000 f8000000 fe000000
> >> kasan_init_region: loop f8000000 fe000000
> >>
> >>
> >> So I get no "kasan_init_region: setbat" line and don't reach "KASAN init done".  
> > 
> > Ah ok, maybe your CPU only has 4 BATs and they are all used, following
> > change would tell us.
> > 
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 850783cfa9c7..bd26767edce7 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -86,6 +86,7 @@ int __init find_free_bat(void)
> >    		if (!(bat[1].batu & 3))
> >    			return b;
> >    	}
> > +	pr_err("NO FREE BAT (%d)\n", n);
> >    	return -1;
> >    }
> > 
> > 
> > Or you have 8 BATs in which case it's an alignment problem, you need to
> > increase CONFIG_DATA_SHIFT to 23, for that you need CONFIG_ADVANCED and
> > CONFIG_DATA_SHIFT_BOOL
> > 
> > But regardless of that there is a problem we need to find out, because
> > it should work without BATs.
> > 
> > As the BATs allocation fails, it falls back to :
> > 
> > 	phys = memblock_phys_alloc_range(k_end - k_start, PAGE_SIZE, 0,
> > 						 MEMBLOCK_ALLOC_ANYWHERE);
> > 		if (!phys)
> > 			return -ENOMEM;
> > 	}
> > 
> > 	ret = kasan_init_shadow_page_tables(k_start, k_end);
> > 	if (ret)
> > 		return ret;
> > 
> > 	for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
> > 		pmd_t *pmd = pmd_off_k(k_cur);
> > 		pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_start), PAGE_KERNEL);
> > 
> > 		__set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> > 	}
> > 	flush_tlb_kernel_range(k_start, k_end);
> > 	memset(kasan_mem_to_shadow(start), 0, k_end - k_start);
> > 
> > 
> > While the __weak function that you confirmed working is:
> > 
> > 	ret = kasan_init_shadow_page_tables(k_start, k_end);
> > 	if (ret)
> > 		return ret;
> > 
> > 	block = memblock_alloc(k_end - k_start, PAGE_SIZE);
> > 	if (!block)
> > 		return -ENOMEM;
> > 
> > 	for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
> > 		pmd_t *pmd = pmd_off_k(k_cur);
> > 		void *va = block + k_cur - k_start;
> > 		pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
> > 
> > 		__set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> > 	}
> > 	flush_tlb_kernel_range(k_start, k_end);
> > 
> > 
> > I'm having hard time to understand what's could be wrong at the first place.
> > 
> > Could you try following change:
> > 
> > diff --git a/arch/powerpc/mm/kasan/book3s_32.c
> > b/arch/powerpc/mm/kasan/book3s_32.c
> > index 9954b7a3b7ae..e04f21908c6a 100644
> > --- a/arch/powerpc/mm/kasan/book3s_32.c
> > +++ b/arch/powerpc/mm/kasan/book3s_32.c
> > @@ -38,7 +38,7 @@ int __init kasan_init_region(void *start, size_t size)
> > 
> >    	if (k_nobat < k_end) {
> >    		phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0,
> > -						 MEMBLOCK_ALLOC_ANYWHERE);
> > +						 MEMBLOCK_ALLOC_ACCESSIBLE);
> >    		if (!phys)
> >    			return -ENOMEM;
> >    	}
> > 
> > And also that one:
> > 
> > 
> > diff --git a/arch/powerpc/mm/kasan/init_32.c
> > b/arch/powerpc/mm/kasan/init_32.c
> > index a70828a6d935..bc1c075489f4 100644
> > --- a/arch/powerpc/mm/kasan/init_32.c
> > +++ b/arch/powerpc/mm/kasan/init_32.c
> > @@ -84,6 +84,9 @@ kasan_update_early_region(unsigned long k_start,
> > unsigned long k_end, pte_t pte)
> >    {
> >    	unsigned long k_cur;
> > 
> > +	if (k_start == k_end)
> > +		return;
> > +
> >    	for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) {
> >    		pmd_t *pmd = pmd_off_k(k_cur);
> >    		pte_t *ptep = pte_offset_kernel(pmd, k_cur);
> > 
> > 
> >   
> 
> I tested the two vmlinux you sent me offlist, they both start without 
> problem on QEMU.
> 
> Regarding the use of BATs, in fact a shift of 23 is still not enough to 
> get free BATs for KASAN. But at least it allows you to map all linear 
> mem with BATS whereas a shift of 22 would require 9 BATs :
> 
> With shift 22 you have BATs with size : 4+4+8+16+32+64+128+256+256
> With shift 23 you have BATs with size : 8+8+16+32+64+128+256+256
> 
> So lets forget that for the moment, allthought you may try with 
> CONFIG_STRICT_KERNEL_RWX, in that case you should have enough BATs.
> 
> But lets try to refocus on the real problem.
> 
> In your last mail you say you tried with all patches. Did it include the 
> two above changes ?
> 
> If not can you perform the tests with those two changes in addition, 
> first one by one then both together depending on the result ?
> 
> Many thanks for your help and perseverance
> Christophe

Revisited this issue with kernel v6.8-rc6 on the same machine.

Now this strange KASAN cold boot issue is gone or at least I can no longer reproduce it. Be it with KASAN_OUTLINE or KASAN_INLINE, SMP boot works just fine on my G4 DP. Which is a good thing. :)

Regards,
Erhard


More information about the Linuxppc-dev mailing list