[PATCH] powerpc/watchdog: provide more data in watchdog messages

Nicholas Piggin npiggin at gmail.com
Wed May 2 18:38:46 AEST 2018


On Tue, 01 May 2018 23:07:28 +1000
Balbir Singh <bsingharora at gmail.com> wrote:

> On Tue, 2018-05-01 at 12:22 +1000, Nicholas Piggin wrote:
> > Provide timebase and timebase of last heartbeat in watchdog lockup
> > messages. Also provide a stack trace of when a CPU becomes un-stuck,
> > which can be useful -- it could be where irqs are re-enabled, so it
> > may be the end of the critical section which is responsible for the
> > latency.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> > ---
> > 
> > A lockup + unstuck event now looks like this (with irqtrace enabled):
> > 
> > watchdog: CPU 1 self-detected hard LOCKUP @ udelay+0x40/0x60
> > watchdog: CPU 1 TB:82611697355, last heartbeat TB:75431975757  
> 
> Can we divide TB with tb_ticks_per_sec, TB itself is not very useful, the
> delta maybe, but it needs more work on behalf of the person looking
> at the output.

I kind of prefer being able to examine register values and compare
directly with these logs, e.g., in mambo or xmon.

But maybe end user prefers something friendlier. What about like

watchdog: CPU 1 self-detected hard LOCKUP @ udelay+0x40/0x60
watchdog: CPU 1 no heartbeat for 14.02s (TB:82611697355, last TB:75431975757)  

?

> > @@ -245,8 +260,6 @@ void soft_nmi_interrupt(struct pt_regs *regs)
> >  
> >  	tb = get_tb();
> >  	if (tb - per_cpu(wd_timer_tb, cpu) >= wd_panic_timeout_tb) {
> > -		per_cpu(wd_timer_tb, cpu) = tb;
> > -  
> 
> Is this related to the print improvements? It looks like you don't want
> to reset the tb, but I would split it out


Yeah there isn't any real reason to reset it since we get marked as
stuck which prevents further messages, and it clobbrs our last heartbeat
value. I'll put it into its own change.

Thanks,
Nick


More information about the Linuxppc-dev mailing list