[PATCH 2/2] powerpc: Add ppc64 hard lockup detector support
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Tue Aug 12 09:42:19 EST 2014
On Tue, Aug 12, 2014 at 09:31:37AM +1000, Anton Blanchard wrote:
> The hard lockup detector uses a PMU event as a periodic NMI to
> detect if we are stuck (where stuck means no timer interrupts have
> occurred).
>
> Ben's rework of the ppc64 soft disable code has made ppc64 PMU
> exceptions a partial NMI. They can get disabled if an external interrupt
> comes in, but otherwise PMU interrupts will fire in interrupt disabled
> regions.
>
> I wrote a kernel module to test this patch and noticed we sometimes
> missed hard lockup warnings. The RCU code detected the stall first and
> issued an IPI to backtrace all CPUs. Unfortunately an IPI is an external
> interrupt and that will hard disable interrupts, preventing the hard
> lockup detector from going off.
If it helps, commit bc1dce514e9b (rcu: Don't use NMIs to dump other
CPUs' stacks) makes RCU avoid this behavior. It instead reads the
stacks out remotely when this commit is applied. It is in -tip, and
should make mainline this merge window. Corresponding patch below.
Thanx, Paul
------------------------------------------------------------------------
rcu: Don't use NMIs to dump other CPUs' stacks
Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks. This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.
Signed-off-by: Paul E. McKenney <paulmck at linux.vnet.ibm.com>
Reviewed-by: Lai Jiangshan <laijs at cn.fujitsu.com>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3f93033d3c61..8f3e4d43d736 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1013,10 +1013,7 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
}
/*
- * Dump stacks of all tasks running on stalled CPUs. This is a fallback
- * for architectures that do not implement trigger_all_cpu_backtrace().
- * The NMI-triggered stack traces are more accurate because they are
- * printed by the target CPU.
+ * Dump stacks of all tasks running on stalled CPUs.
*/
static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
{
@@ -1094,7 +1091,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
(long)rsp->gpnum, (long)rsp->completed, totqlen);
if (ndetected == 0)
pr_err("INFO: Stall ended before state dump start\n");
- else if (!trigger_all_cpu_backtrace())
+ else
rcu_dump_cpu_stacks(rsp);
/* Complain about tasks blocking the grace period. */
@@ -1125,8 +1122,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
pr_cont(" (t=%lu jiffies g=%ld c=%ld q=%lu)\n",
jiffies - rsp->gp_start,
(long)rsp->gpnum, (long)rsp->completed, totqlen);
- if (!trigger_all_cpu_backtrace())
- dump_stack();
+ rcu_dump_cpu_stacks(rsp);
raw_spin_lock_irqsave(&rnp->lock, flags);
if (ULONG_CMP_GE(jiffies, ACCESS_ONCE(rsp->jiffies_stall)))
More information about the Linuxppc-dev
mailing list