KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)
Erhard Furtner
erhard_f at mailbox.org
Thu Feb 29 10:55:46 AEDT 2024
On Thu, 14 Sep 2023 04:54:17 +0000
Christophe Leroy <christophe.leroy at csgroup.eu> wrote:
> Le 12/09/2023 à 19:39, Christophe Leroy a écrit :
> >
> >
> > Le 12/09/2023 à 17:59, Erhard Furtner a écrit :
> >>
> >> printk: bootconsole [udbg0] enabled
> >> Total memory = 2048MB; using 4096kB for hash table
> >> mapin_ram:125
> >> mmu_mapin_ram:169 0 30000000 1400000 2000000
> >> __mmu_mapin_ram:146 0 1400000
> >> __mmu_mapin_ram:155 1400000
> >> __mmu_mapin_ram:146 1400000 30000000
> >> __mmu_mapin_ram:155 20000000
> >> __mapin_ram_chunk:107 20000000 30000000
> >> __mapin_ram_chunk:117
> >> mapin_ram:134
> >> kasan_mmu_init:129
> >> kasan_mmu_init:132 0
> >> kasan_mmu_init:137
> >> ioremap() called early from btext_map+0x64/0xdc. Use early_ioremap() instead
> >> Linux version 6.6.0-rc1-PMacG4-dirty (root at T1000) (gcc (Gentoo 12.3.1_p20230526 p2) 12.3.1 20230526, GNU ld (Gentoo 2.40 p7) 2.40.0) #5 SMP Tue Sep 12 16:50:47 CEST 2023
> >> kasan_init_region: c0000000 30000000 f8000000 fe000000
> >> kasan_init_region: loop f8000000 fe000000
> >>
> >>
> >> So I get no "kasan_init_region: setbat" line and don't reach "KASAN init done".
> >
> > Ah ok, maybe your CPU only has 4 BATs and they are all used, following
> > change would tell us.
> >
> > diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> > index 850783cfa9c7..bd26767edce7 100644
> > --- a/arch/powerpc/mm/book3s32/mmu.c
> > +++ b/arch/powerpc/mm/book3s32/mmu.c
> > @@ -86,6 +86,7 @@ int __init find_free_bat(void)
> > if (!(bat[1].batu & 3))
> > return b;
> > }
> > + pr_err("NO FREE BAT (%d)\n", n);
> > return -1;
> > }
> >
> >
> > Or you have 8 BATs in which case it's an alignment problem, you need to
> > increase CONFIG_DATA_SHIFT to 23, for that you need CONFIG_ADVANCED and
> > CONFIG_DATA_SHIFT_BOOL
> >
> > But regardless of that there is a problem we need to find out, because
> > it should work without BATs.
> >
> > As the BATs allocation fails, it falls back to :
> >
> > phys = memblock_phys_alloc_range(k_end - k_start, PAGE_SIZE, 0,
> > MEMBLOCK_ALLOC_ANYWHERE);
> > if (!phys)
> > return -ENOMEM;
> > }
> >
> > ret = kasan_init_shadow_page_tables(k_start, k_end);
> > if (ret)
> > return ret;
> >
> > for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE) {
> > pmd_t *pmd = pmd_off_k(k_cur);
> > pte_t pte = pfn_pte(PHYS_PFN(phys + k_cur - k_start), PAGE_KERNEL);
> >
> > __set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> > }
> > flush_tlb_kernel_range(k_start, k_end);
> > memset(kasan_mem_to_shadow(start), 0, k_end - k_start);
> >
> >
> > While the __weak function that you confirmed working is:
> >
> > ret = kasan_init_shadow_page_tables(k_start, k_end);
> > if (ret)
> > return ret;
> >
> > block = memblock_alloc(k_end - k_start, PAGE_SIZE);
> > if (!block)
> > return -ENOMEM;
> >
> > for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
> > pmd_t *pmd = pmd_off_k(k_cur);
> > void *va = block + k_cur - k_start;
> > pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
> >
> > __set_pte_at(&init_mm, k_cur, pte_offset_kernel(pmd, k_cur), pte, 0);
> > }
> > flush_tlb_kernel_range(k_start, k_end);
> >
> >
> > I'm having hard time to understand what's could be wrong at the first place.
> >
> > Could you try following change:
> >
> > diff --git a/arch/powerpc/mm/kasan/book3s_32.c
> > b/arch/powerpc/mm/kasan/book3s_32.c
> > index 9954b7a3b7ae..e04f21908c6a 100644
> > --- a/arch/powerpc/mm/kasan/book3s_32.c
> > +++ b/arch/powerpc/mm/kasan/book3s_32.c
> > @@ -38,7 +38,7 @@ int __init kasan_init_region(void *start, size_t size)
> >
> > if (k_nobat < k_end) {
> > phys = memblock_phys_alloc_range(k_end - k_nobat, PAGE_SIZE, 0,
> > - MEMBLOCK_ALLOC_ANYWHERE);
> > + MEMBLOCK_ALLOC_ACCESSIBLE);
> > if (!phys)
> > return -ENOMEM;
> > }
> >
> > And also that one:
> >
> >
> > diff --git a/arch/powerpc/mm/kasan/init_32.c
> > b/arch/powerpc/mm/kasan/init_32.c
> > index a70828a6d935..bc1c075489f4 100644
> > --- a/arch/powerpc/mm/kasan/init_32.c
> > +++ b/arch/powerpc/mm/kasan/init_32.c
> > @@ -84,6 +84,9 @@ kasan_update_early_region(unsigned long k_start,
> > unsigned long k_end, pte_t pte)
> > {
> > unsigned long k_cur;
> >
> > + if (k_start == k_end)
> > + return;
> > +
> > for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) {
> > pmd_t *pmd = pmd_off_k(k_cur);
> > pte_t *ptep = pte_offset_kernel(pmd, k_cur);
> >
> >
> >
>
> I tested the two vmlinux you sent me offlist, they both start without
> problem on QEMU.
>
> Regarding the use of BATs, in fact a shift of 23 is still not enough to
> get free BATs for KASAN. But at least it allows you to map all linear
> mem with BATS whereas a shift of 22 would require 9 BATs :
>
> With shift 22 you have BATs with size : 4+4+8+16+32+64+128+256+256
> With shift 23 you have BATs with size : 8+8+16+32+64+128+256+256
>
> So lets forget that for the moment, allthought you may try with
> CONFIG_STRICT_KERNEL_RWX, in that case you should have enough BATs.
>
> But lets try to refocus on the real problem.
>
> In your last mail you say you tried with all patches. Did it include the
> two above changes ?
>
> If not can you perform the tests with those two changes in addition,
> first one by one then both together depending on the result ?
>
> Many thanks for your help and perseverance
> Christophe
Revisited this issue with kernel v6.8-rc6 on the same machine.
Now this strange KASAN cold boot issue is gone or at least I can no longer reproduce it. Be it with KASAN_OUTLINE or KASAN_INLINE, SMP boot works just fine on my G4 DP. Which is a good thing. :)
Regards,
Erhard
More information about the Linuxppc-dev
mailing list