[PATCH 06/10] powerpc: nmi_enter for system reset
Michael Ellerman
mpe at ellerman.id.au
Tue Feb 7 15:06:58 AEDT 2017
Nicholas Piggin <npiggin at gmail.com> writes:
> System reset is a non-maskable interrupt from Linux's point of view
> (occurs under local_irq_disable()), so it should use nmi_enter/exit.
...
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 802aa6bbe97b..c65c88fb6482 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -278,6 +278,14 @@ void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
>
> void system_reset_exception(struct pt_regs *regs)
> {
> + /*
> + * Avoid crashes in case of nested NMI exceptions. Recoverability
> + * is determined by RI and in_nmi
> + */
> + bool nested = in_nmi();
> + if (!nested)
> + nmi_enter();
> +
> /* See if any machine dependent calls */
> if (ppc_md.system_reset_exception) {
> if (ppc_md.system_reset_exception(regs))
This breaks my QS22 (Cell blade), I get lots of RCU stalls such as:
INFO: rcu_sched self-detected stall on CPU
0-...: (5249 ticks this GP) idle=ad6/1/1 softirq=3/3 fqs=3
(t=5250 jiffies g=-298 c=-299 q=1289)
rcu_sched kthread starved for 5234 jiffies! g18446744073709551318 c18446744073709551317 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
rcu_sched S 0 8 2 0x00000800
Call Trace:
[c0000003fb9d7950] [c000000000014730] .__switch_to+0x218/0x2b0
[c0000003fb9d7a00] [c0000000006a0668] .__schedule+0x268/0x778
[c0000003fb9d7ae0] [c0000000006a0bb0] .schedule+0x38/0xb0
[c0000003fb9d7b60] [c0000000006a7ba4] .schedule_timeout+0x184/0x2f0
[c0000003fb9d7c50] [c000000000106c5c] .rcu_gp_kthread+0x5ec/0xa60
[c0000003fb9d7d70] [c0000000000c69d0] .kthread+0x148/0x188
[c0000003fb9d7e30] [c00000000000ba70] .ret_from_kernel_thread+0x58/0x68
And I never get to userspace.
This is because cbe_system_reset_exception() doesn't like being called
after nmi_enter() - though I don't know exactly what the problem is.
Moving the nmi_enter() after the ppc_md hook (and fixing up the goto
etc.) fixes it, but that's not really a great solution.
I suspect it will also break pasemi, because it does something similar.
I'm not clear on how best to fix it ATM.
cheers
More information about the Linuxppc-dev
mailing list