KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

Christophe Leroy christophe.leroy at csgroup.eu
Thu Sep 14 14:54:17 AEST 2023



Le 12/09/2023 à 19:39, Christophe Leroy a écrit :
> 
> 
> Le 12/09/2023 à 17:59, Erhard Furtner a écrit :
>>
>> printk: bootconsole [udbg0] enabled
>> Total memory = 2048MB; using 4096kB for hash table
>> mapin_ram:125
>> mmu_mapin_ram:169 0 30000000 1400000 2000000
>> __mmu_mapin_ram:146 0 1400000
>> __mmu_mapin_ram:155 1400000
>> __mmu_mapin_ram:146 1400000 30000000
>> __mmu_mapin_ram:155 20000000
>> __mapin_ram_chunk:107 20000000 30000000
>> __mapin_ram_chunk:117
>> mapin_ram:134
>> kasan_mmu_init:129
>> kasan_mmu_init:132 0
>> kasan_mmu_init:137
>> ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
>> Linux version 6.6.0-rc1-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Tue Sep 12 16:50:47 CEST 2023
>> kasan_init_region: c0000000 30000000 f8000000 fe000000
>> kasan_init_region: loop f8000000 fe000000
>>
>>
>> So I get no "kasan_init_region: setbat" line and don't reach "KASAN init done".
> 
> Ah ok, maybe your CPU only has 4 BATs and they are all used, following
> change would tell us.
> 
> diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> index 850783cfa9c7..bd26767edce7 100644
> --- a/arch/powerpc/mm/book3s32/mmu.c
> +++ b/arch/powerpc/mm/book3s32/mmu.c
> @@ -86,6 +86,7 @@ int __init find_free_bat(void)
>    		if (!(bat[1].batu & 3))
>    			return b;
>    	}
> +	pr_err("NO FREE BAT (%d)\n", n);
>    	return -1;
>    }
> 
> 
> Or you have 8 BATs in which case it's an alignment problem, you need to
> increase CONFIG_DATA_SHIFT to 23, for that you need CONFIG_ADVANCED and
> CONFIG_DATA_SHIFT_BOOL
> 
> But regardless of that there is a problem we need to find out, because
> it should work without BATs.
> 
> As the BATs allocation fails, it falls back to :
> 
> 	phys = memblock_phys_alloc_range(k_end - k_start, PAGE_SIZE, 0,
> 						 MEMBLOCK_ALLOC_ANYWHERE);
> 		if (!phys)
> 			return -ENOMEM;
> 	}
> 
> 	ret = kasan_init_shadow_page_tables(k_start, k_end);
> 	if (ret)
> 		return ret;
> 
> 	for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
> 		pmd_t *pmd = pmd_off_k(k_cur);
> 		pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_start), PAGE_KERNEL);
> 
> 		__set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> 	}
> 	flush_tlb_kernel_range(k_start, k_end);
> 	memset(kasan_mem_to_shadow(start), 0, k_end - k_start);
> 
> 
> While the __weak function that you confirmed working is:
> 
> 	ret = kasan_init_shadow_page_tables(k_start, k_end);
> 	if (ret)
> 		return ret;
> 
> 	block = memblock_alloc(k_end - k_start, PAGE_SIZE);
> 	if (!block)
> 		return -ENOMEM;
> 
> 	for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
> 		pmd_t *pmd = pmd_off_k(k_cur);
> 		void *va = block + k_cur - k_start;
> 		pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
> 
> 		__set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> 	}
> 	flush_tlb_kernel_range(k_start, k_end);
> 
> 
> I'm having hard time to understand what's could be wrong at the first place.
> 
> Could you try following change:
> 
> diff --git a/arch/powerpc/mm/kasan/book3s_32.c
> b/arch/powerpc/mm/kasan/book3s_32.c
> index 9954b7a3b7ae..e04f21908c6a 100644
> --- a/arch/powerpc/mm/kasan/book3s_32.c
> +++ b/arch/powerpc/mm/kasan/book3s_32.c
> @@ -38,7 +38,7 @@ int __init kasan_init_region(void *start, size_t size)
> 
>    	if (k_nobat < k_end) {
>    		phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0,
> -						 MEMBLOCK_ALLOC_ANYWHERE);
> +						 MEMBLOCK_ALLOC_ACCESSIBLE);
>    		if (!phys)
>    			return -ENOMEM;
>    	}
> 
> And also that one:
> 
> 
> diff --git a/arch/powerpc/mm/kasan/init_32.c
> b/arch/powerpc/mm/kasan/init_32.c
> index a70828a6d935..bc1c075489f4 100644
> --- a/arch/powerpc/mm/kasan/init_32.c
> +++ b/arch/powerpc/mm/kasan/init_32.c
> @@ -84,6 +84,9 @@ kasan_update_early_region(unsigned long k_start,
> unsigned long k_end, pte_t pte)
>    {
>    	unsigned long k_cur;
> 
> +	if (k_start == k_end)
> +		return;
> +
>    	for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) {
>    		pmd_t *pmd = pmd_off_k(k_cur);
>    		pte_t *ptep = pte_offset_kernel(pmd, k_cur);
> 
> 
> 

I tested the two vmlinux you sent me offlist, they both start without 
problem on QEMU.

Regarding the use of BATs, in fact a shift of 23 is still not enough to 
get free BATs for KASAN. But at least it allows you to map all linear 
mem with BATS whereas a shift of 22 would require 9 BATs :

With shift 22 you have BATs with size : 4+4+8+16+32+64+128+256+256
With shift 23 you have BATs with size : 8+8+16+32+64+128+256+256

So lets forget that for the moment, allthought you may try with 
CONFIG_STRICT_KERNEL_RWX, in that case you should have enough BATs.

But lets try to refocus on the real problem.

In your last mail you say you tried with all patches. Did it include the 
two above changes ?

If not can you perform the tests with those two changes in addition, 
first one by one then both together depending on the result ?

Many thanks for your help and perseverance
Christophe


More information about the Linuxppc-dev mailing list