[PATCH v6 14/39] powerpc/perf: move perf irq/nmi handling details into traps.c
Nicholas Piggin
npiggin at gmail.com
Wed Jan 20 15:21:45 AEDT 2021
Excerpts from Nicholas Piggin's message of January 20, 2021 1:09 pm:
> Excerpts from Athira Rajeev's message of January 19, 2021 8:24 pm:
>>
>> [ 883.900762] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
>> [ 883.901381] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE 5.11.0-rc3+ #34
>> --
>> [ 883.901999] NIP [c0000000000168d0] replay_soft_interrupts+0x70/0x2f0
>> [ 883.902032] LR [c00000000003b2b8] interrupt_exit_kernel_prepare+0x1e8/0x240
>> [ 883.902063] Call Trace:
>> [ 883.902085] [c000000001c96f50] [c00000000003b2b8] interrupt_exit_kernel_prepare+0x1e8/0x240 (unreliable)
>> [ 883.902139] [c000000001c96fb0] [c00000000000fd88] interrupt_return+0x158/0x200
>> [ 883.902185] --- interrupt: ea0 at __rb_reserve_next+0xc0/0x5b0
>> [ 883.902224] NIP: c0000000002d8980 LR: c0000000002d897c CTR: c0000000001aad90
>> [ 883.902262] REGS: c000000001c97020 TRAP: 0ea0 Tainted: G OE (5.11.0-rc3+)
>> [ 883.902301] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28000484 XER: 20040000
>> [ 883.902387] CFAR: c00000000000fe00 IRQMASK: 0
>> --
>> [ 883.902757] NIP [c0000000002d8980] __rb_reserve_next+0xc0/0x5b0
>> [ 883.902786] LR [c0000000002d897c] __rb_reserve_next+0xbc/0x5b0
>> [ 883.902824] --- interrupt: ea0
>> [ 883.902848] [c000000001c97360] [c0000000002d8fcc] ring_buffer_lock_reserve+0x15c/0x580
>> [ 883.902894] [c000000001c973f0] [c0000000002e82fc] trace_function+0x4c/0x1c0
>> [ 883.902930] [c000000001c97440] [c0000000002f6f50] function_trace_call+0x140/0x190
>> [ 883.902976] [c000000001c97470] [c00000000007d6f8] ftrace_call+0x4/0x44
>> [ 883.903021] [c000000001c97660] [c000000000dcf70c] __do_softirq+0x15c/0x3d4
>> [ 883.903066] [c000000001c97750] [c00000000015fc68] irq_exit+0x198/0x1b0
>> [ 883.903102] [c000000001c97780] [c000000000dc1790] timer_interrupt+0x170/0x3b0
>> [ 883.903148] [c000000001c977e0] [c000000000016994] replay_soft_interrupts+0x134/0x2f0
>> [ 883.903193] [c000000001c979d0] [c00000000003b2b8] interrupt_exit_kernel_prepare+0x1e8/0x240
>> [ 883.903240] [c000000001c97a30] [c00000000000fd88] interrupt_return+0x158/0x200
>> [ 883.903276] --- interrupt: ea0 at arch_local_irq_restore+0x70/0xc0
>
> You got a 0xea0 interrupt in the ftrace code. I wonder where it is
> looping. Do you see more soft lockup messages?
We should probably fix this recursion too. I was vaguely aware of it and
thought it might have existed with the old interrupt exit and replay
code as well and was pretty well bounded, but I'm not entirely sure it's
okay. And now that I've thought about it a bit harder, I think there is
actualy a simple way to fix it -
[PATCH] powerpc/64: prevent replayed interrupt handlers from running
softirqs
Running softirqs enables interrupts, which can then end up recursing
into the irq soft-mask code we're trying to adjust, including replaying
interrupts itself which may not be bounded. This abridged trace shows
how this can occur:
NIP replay_soft_interrupts
LR interrupt_exit_kernel_prepare
Call Trace:
interrupt_exit_kernel_prepare (unreliable)
interrupt_return
--- interrupt: ea0 at __rb_reserve_next
NIP __rb_reserve_next
LR __rb_reserve_next
Call Trace:
ring_buffer_lock_reserve
trace_function
function_trace_call
ftrace_call
__do_softirq
irq_exit
timer_interrupt
replay_soft_interrupts
interrupt_exit_kernel_prepare
interrupt_return
--- interrupt: ea0 at arch_local_irq_restore
Fix this by disabling bhs (softirqs) around the interrupt replay.
Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
---
arch/powerpc/kernel/irq.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 681abb7c0507..bb0d4fc8df89 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -189,6 +189,18 @@ void replay_soft_interrupts(void)
unsigned char happened = local_paca->irq_happened;
struct pt_regs regs;
+ /*
+ * Prevent softirqs from being run when an interrupt handler returns
+ * and calls irq_exit(), because softirq processing enables interrupts.
+ * If an interrupt is taken, it may then call replay_soft_interrupts
+ * on its way out, which gets messy and recursive.
+ *
+ * softirqs created by replayed interrupts will be run at the end of
+ * this function when bhs are enabled (if they were enabled in our
+ * caller).
+ */
+ local_bh_disable();
+
ppc_save_regs(®s);
regs.softe = IRQS_ENABLED;
@@ -264,6 +276,8 @@ void replay_soft_interrupts(void)
trace_hardirqs_off();
goto again;
}
+
+ local_bh_enable();
}
notrace void arch_local_irq_restore(unsigned long mask)
--
2.23.0
More information about the Linuxppc-dev
mailing list