[PATCH 06/10] powerpc: nmi_enter for system reset

Michael Ellerman mpe at ellerman.id.au
Tue Feb 7 15:06:58 AEDT 2017


Nicholas Piggin <npiggin at gmail.com> writes:

> System reset is a non-maskable interrupt from Linux's point of view
> (occurs under local_irq_disable()), so it should use nmi_enter/exit.
...
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 802aa6bbe97b..c65c88fb6482 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -278,6 +278,14 @@ void _exception(int signr, struct pt_regs *regs, int code, unsigned long addr)
>  
>  void system_reset_exception(struct pt_regs *regs)
>  {
> +	/*
> +	 * Avoid crashes in case of nested NMI exceptions. Recoverability
> +	 * is determined by RI and in_nmi
> +	 */
> +	bool nested = in_nmi();
> +	if (!nested)
> +		nmi_enter();
> +
>  	/* See if any machine dependent calls */
>  	if (ppc_md.system_reset_exception) {
>  		if (ppc_md.system_reset_exception(regs))


This breaks my QS22 (Cell blade), I get lots of RCU stalls such as:

  INFO: rcu_sched self-detected stall on CPU
  	0-...: (5249 ticks this GP) idle=ad6/1/1 softirq=3/3 fqs=3 
  	 (t=5250 jiffies g=-298 c=-299 q=1289)
  rcu_sched kthread starved for 5234 jiffies! g18446744073709551318 c18446744073709551317 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
  rcu_sched       S    0     8      2 0x00000800
  Call Trace:
  [c0000003fb9d7950] [c000000000014730] .__switch_to+0x218/0x2b0
  [c0000003fb9d7a00] [c0000000006a0668] .__schedule+0x268/0x778
  [c0000003fb9d7ae0] [c0000000006a0bb0] .schedule+0x38/0xb0
  [c0000003fb9d7b60] [c0000000006a7ba4] .schedule_timeout+0x184/0x2f0
  [c0000003fb9d7c50] [c000000000106c5c] .rcu_gp_kthread+0x5ec/0xa60
  [c0000003fb9d7d70] [c0000000000c69d0] .kthread+0x148/0x188
  [c0000003fb9d7e30] [c00000000000ba70] .ret_from_kernel_thread+0x58/0x68

And I never get to userspace.

This is because cbe_system_reset_exception() doesn't like being called
after nmi_enter() - though I don't know exactly what the problem is.

Moving the nmi_enter() after the ppc_md hook (and fixing up the goto
etc.) fixes it, but that's not really a great solution.

I suspect it will also break pasemi, because it does something similar.

I'm not clear on how best to fix it ATM.

cheers


More information about the Linuxppc-dev mailing list