[powerpc][next-20210625] Kernel warning(arch/powerpc/kernel/interrupt.c:518) during boot
Nicholas Piggin
npiggin at gmail.com
Mon Jun 28 13:52:00 AEST 2021
Excerpts from Sachin Sant's message of June 27, 2021 9:23 pm:
>
>> On 27-Jun-2021, at 3:36 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
>>>
>>> So there's definitely IRQMASK=0 and no MSR[EE]=0 in this frame, which is
>>> what the warning was.
>>>
>>> I'd say either something hasn't set PACA_IRQ_HARD_DIS properly, so EE
>>> doesn't get enabled when irqs are restored, or maybe the change to
>>> arch_local_irq_restore(). Less likely that the stack got messed up.
>>>
>>> Can you try run with CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y ?
>>
>> Nevermind, I think I've found the problem. Some code runs in the
>> implicit soft-mask region without expecting to be masked. Working
>> on a fix…
>
> :-) . I was able to recreate this after few attempts. It seem the warning isn’t
> always triggered during boot. I had to run a kernel compile operation after
> boot to trigger this warning again.
>
> In case its helpful here is the additional trace with PPC_IRQ_SOFT_MASK_DEBUG.
Thanks. I ended up being able to reproduce as well, quite frequently
with some extra debug checks that specifically catch more cases.
I've got a few patches under test right now, very stable so far. I'll
post them out if they survive a nother hour or two stress testing.
The problem is some code (e.g., ret_from_fork) now gets implicitly
soft-masked where that was not expecting to be. A masked interrupt might
hit, and then when it moves out of the implicit soft-mask region it
does not re-enable interrupts. Some types of pending interrupts will
clear MSR[EE], and that ends up causing this bug on the next interrupt
that happens.
Not a wonderful escape :\ thanks for finding it. The fixes aren't too
bad, fortunately.
Thanks,
Nick
>
> [ 92.106731] ------------[ cut here ]------------
> [ 92.106738] WARNING: CPU: 45 PID: 12757 at arch/powerpc/kernel/irq.c:255 arch_local_irq_restore+0x1d0/0x200
> [ 92.106753] Modules linked in: dm_mod bonding nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse
> [ 92.106828] CPU: 45 PID: 12757 Comm: sh Kdump: loaded Tainted: G W 5.13.0-rc7-next-20210625 #1
> [ 92.106841] NIP: c0000000000164d0 LR: c000000000cedaa8 CTR: 0000000000000000
> [ 92.106849] REGS: c00000008dfeb7e0 TRAP: 0700 Tainted: G W (5.13.0-rc7-next-20210625)
> [ 92.106859] MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28004222 XER: 00000000
> [ 92.106892] CFAR: c00000000001632c IRQMASK: 0
> GPR00: c000000000ceda98 c00000008dfeba80 c000000002921e00 0000000000000000
> GPR04: 0000000000000000 0000000000000000 0000000000000000 00000000000000ff
> GPR08: 0000000000000001 0000000000000000 0000000000000001 0000000000000017
> GPR12: 0000000024004822 c000000007fb9200 000000012efd81d4 000000012ee50000
> GPR16: 0000000000000001 00000100268a0e00 000001002687ec10 0000000114200c40
> GPR20: 00003fffa93f8000 0000000000000000 00003fffa93f9300 000000012efb1988
> GPR24: 000000012ee7fe7c 000000012efccba0 000000012ee50000 c00000008d5d7600
> GPR28: c0000000314c0bc0 c000000040d9f100 c0000008beb5861c 4b72201a3063fe13
> [ 92.107024] NIP [c0000000000164d0] arch_local_irq_restore+0x1d0/0x200
> [ 92.107035] LR [c000000000cedaa8] _raw_spin_unlock_irqrestore+0x88/0xb0
> [ 92.107047] Call Trace:
> [ 92.107052] [c00000008dfeba80] [c00000008dfebb50] 0xc00000008dfebb50 (unreliable)
> [ 92.107065] [c00000008dfebab0] [238c5bf052df0858] 0x238c5bf052df0858
> [ 92.107076] [c00000008dfebae0] [c0000000008178e8] get_random_u64+0x88/0x100
> [ 92.107090] [c00000008dfebb20] [c000000000020134] arch_randomize_brk+0xb4/0xd8
> [ 92.107105] [c00000008dfebb50] [c0000000005430b0] load_elf_binary+0xe70/0x1220
> [ 92.107119] [c00000008dfebc40] [c00000000047ded0] bprm_execve+0x410/0x800
> [ 92.107132] [c00000008dfebd10] [c00000000047e8ec] do_execveat_common.isra.44+0x21c/0x240
> [ 92.107145] [c00000008dfebd80] [c00000000047e964] sys_execve+0x54/0x70
> [ 92.107157] [c00000008dfebdb0] [c000000000032334] system_call_exception+0x164/0x2e0
> [ 92.107169] [c00000008dfebe10] [c00000000000c464] system_call_common+0xf4/0x258
> [ 92.107185] --- interrupt: c00 at 0x3fff9bb6b8a8
> [ 92.107193] NIP: 00003fff9bb6b8a8 LR: 00003fff9bb6c240 CTR: 0000000000000000
> [ 92.107202] REGS: c00000008dfebe80 TRAP: 0c00 Tainted: G W (5.13.0-rc7-next-20210625)
> [ 92.107213] MSR: 800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28004224 XER: 00000000
> [ 92.107243] IRQMASK: 0
> GPR00: 000000000000000b 00003fffc36a1440 00003fff9bc87300 00000100268a67d0
> GPR04: 0000010026887e50 0000010026882c50 fefefefefefefeff 7f7f7f7f7f7f7f7f
> GPR08: 00000100268a67d0 0000000000000000 0000000000000000 0000000000000000
> GPR12: 0000000000000000 00003fff9bce3780 0000000114200db4 0000000000000000
> GPR16: 0000000000000001 00000100268a0e00 000001002687ec10 0000000114200c40
> GPR20: 00000001141dd820 0000000000000000 00000001141dd740 0000000114204358
> GPR24: 0000000114203948 0000010026876454 0000000000000001 0000010026882c50
> GPR28: 0000010026887e50 0000010026882c50 00000100268a67d0 00003fffc36a1440
> [ 92.107369] NIP [00003fff9bb6b8a8] 0x3fff9bb6b8a8
> [ 92.107378] LR [00003fff9bb6c240] 0x3fff9bb6c240
> [ 92.107386] --- interrupt: c00
> [ 92.107393] Instruction dump:
> [ 92.107400] 7d2000a6 71298000 40820048 39200000 992d0152 39400000 992d0153 614a8002
> [ 92.107427] 7d410164 4bfffe6c 60000000 60000000 <0fe00000> 4bfffe5c 60000000 60000000
> [ 92.107451] ---[ end trace 5f1d49fb99f3613d ]—
>
> Complete dmesg log attached.
>
> Thanks
> -Sachin
>
>
More information about the Linuxppc-dev
mailing list