[powerpc][next-20210625] Kernel warning(arch/powerpc/kernel/interrupt.c:518) during boot

Nicholas Piggin npiggin at gmail.com
Mon Jun 28 13:52:00 AEST 2021


Excerpts from Sachin Sant's message of June 27, 2021 9:23 pm:
> 
>> On 27-Jun-2021, at 3:36 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
>>> 
>>> So there's definitely IRQMASK=0 and no MSR[EE]=0 in this frame, which is 
>>> what the warning was.
>>> 
>>> I'd say either something hasn't set PACA_IRQ_HARD_DIS properly, so EE 
>>> doesn't get enabled when irqs are restored, or maybe the  change to
>>> arch_local_irq_restore(). Less likely that the stack got messed up.
>>> 
>>> Can you try run with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y ?
>> 
>> Nevermind, I think I've found the problem. Some code runs in the
>> implicit soft-mask region without expecting to be masked. Working
>> on a fix…
> 
> :-) . I was able to recreate this after few attempts. It seem the warning isn’t
> always triggered during boot. I had to run a kernel compile operation after
> boot to trigger this warning again.
> 
> In case its helpful here is the additional trace with PPC_IRQ_SOFT_MASK_DEBUG.

Thanks. I ended up being able to reproduce as well, quite frequently 
with some extra debug checks that specifically catch more cases.

I've got a few patches under test right now, very stable so far. I'll 
post them out if they survive a nother hour or two stress testing.

The problem is some code (e.g., ret_from_fork) now gets implicitly 
soft-masked where that was not expecting to be. A masked interrupt might 
hit, and then when it moves out of the implicit soft-mask region it
does not re-enable interrupts. Some types of pending interrupts will 
clear MSR[EE], and that ends up causing this bug on the next interrupt
that happens.

Not a wonderful escape :\  thanks for finding it. The fixes aren't too
bad, fortunately.

Thanks,
Nick

> 
> [   92.106731] ------------[ cut here ]------------
> [   92.106738] WARNING: CPU: 45 PID: 12757 at arch/powerpc/kernel/irq.c:255 arch_local_irq_restore+0x1d0/0x200
> [   92.106753] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse
> [   92.106828] CPU: 45 PID: 12757 Comm: sh Kdump: loaded Tainted: G        W         5.13.0-rc7-next-20210625 #1
> [   92.106841] NIP:  c0000000000164d0 LR: c000000000cedaa8 CTR: 0000000000000000
> [   92.106849] REGS: c00000008dfeb7e0 TRAP: 0700   Tainted: G        W          (5.13.0-rc7-next-20210625)
> [   92.106859] MSR:  8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 28004222  XER: 00000000
> [   92.106892] CFAR: c00000000001632c IRQMASK: 0 
>                GPR00: c000000000ceda98 c00000008dfeba80 c000000002921e00 0000000000000000 
>                GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000000ff 
>                GPR08: 0000000000000001 0000000000000000 0000000000000001 0000000000000017 
>                GPR12: 0000000024004822 c000000007fb9200 000000012efd81d4 000000012ee50000 
>                GPR16: 0000000000000001 00000100268a0e00 000001002687ec10 0000000114200c40 
>                GPR20: 00003fffa93f8000 0000000000000000 00003fffa93f9300 000000012efb1988 
>                GPR24: 000000012ee7fe7c 000000012efccba0 000000012ee50000 c00000008d5d7600 
>                GPR28: c0000000314c0bc0 c000000040d9f100 c0000008beb5861c 4b72201a3063fe13 
> [   92.107024] NIP [c0000000000164d0] arch_local_irq_restore+0x1d0/0x200
> [   92.107035] LR [c000000000cedaa8] _raw_spin_unlock_irqrestore+0x88/0xb0
> [   92.107047] Call Trace:
> [   92.107052] [c00000008dfeba80] [c00000008dfebb50] 0xc00000008dfebb50 (unreliable)
> [   92.107065] [c00000008dfebab0] [238c5bf052df0858] 0x238c5bf052df0858
> [   92.107076] [c00000008dfebae0] [c0000000008178e8] get_random_u64+0x88/0x100
> [   92.107090] [c00000008dfebb20] [c000000000020134] arch_randomize_brk+0xb4/0xd8
> [   92.107105] [c00000008dfebb50] [c0000000005430b0] load_elf_binary+0xe70/0x1220
> [   92.107119] [c00000008dfebc40] [c00000000047ded0] bprm_execve+0x410/0x800
> [   92.107132] [c00000008dfebd10] [c00000000047e8ec] do_execveat_common.isra.44+0x21c/0x240
> [   92.107145] [c00000008dfebd80] [c00000000047e964] sys_execve+0x54/0x70
> [   92.107157] [c00000008dfebdb0] [c000000000032334] system_call_exception+0x164/0x2e0
> [   92.107169] [c00000008dfebe10] [c00000000000c464] system_call_common+0xf4/0x258
> [   92.107185] --- interrupt: c00 at 0x3fff9bb6b8a8
> [   92.107193] NIP:  00003fff9bb6b8a8 LR: 00003fff9bb6c240 CTR: 0000000000000000
> [   92.107202] REGS: c00000008dfebe80 TRAP: 0c00   Tainted: G        W          (5.13.0-rc7-next-20210625)
> [   92.107213] MSR:  800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28004224  XER: 00000000
> [   92.107243] IRQMASK: 0 
>                GPR00: 000000000000000b 00003fffc36a1440 00003fff9bc87300 00000100268a67d0 
>                GPR04: 0000010026887e50 0000010026882c50 fefefefefefefeff 7f7f7f7f7f7f7f7f 
>                GPR08: 00000100268a67d0 0000000000000000 0000000000000000 0000000000000000 
>                GPR12: 0000000000000000 00003fff9bce3780 0000000114200db4 0000000000000000 
>                GPR16: 0000000000000001 00000100268a0e00 000001002687ec10 0000000114200c40 
>                GPR20: 00000001141dd820 0000000000000000 00000001141dd740 0000000114204358 
>                GPR24: 0000000114203948 0000010026876454 0000000000000001 0000010026882c50 
>                GPR28: 0000010026887e50 0000010026882c50 00000100268a67d0 00003fffc36a1440 
> [   92.107369] NIP [00003fff9bb6b8a8] 0x3fff9bb6b8a8
> [   92.107378] LR [00003fff9bb6c240] 0x3fff9bb6c240
> [   92.107386] --- interrupt: c00
> [   92.107393] Instruction dump:
> [   92.107400] 7d2000a6 71298000 40820048 39200000 992d0152 39400000 992d0153 614a8002 
> [   92.107427] 7d410164 4bfffe6c 60000000 60000000 <0fe00000> 4bfffe5c 60000000 60000000 
> [   92.107451] ---[ end trace 5f1d49fb99f3613d ]—
> 
> Complete dmesg log attached.
> 
> Thanks
> -Sachin
> 
> 


More information about the Linuxppc-dev mailing list