[PATCH] powerpc/watchdog: provide more data in watchdog messages

Balbir Singh bsingharora at gmail.com
Wed May 2 18:55:15 AEST 2018


On Wed, May 2, 2018 at 6:38 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
> On Tue, 01 May 2018 23:07:28 +1000
> Balbir Singh <bsingharora at gmail.com> wrote:
>
>> On Tue, 2018-05-01 at 12:22 +1000, Nicholas Piggin wrote:
>> > Provide timebase and timebase of last heartbeat in watchdog lockup
>> > messages. Also provide a stack trace of when a CPU becomes un-stuck,
>> > which can be useful -- it could be where irqs are re-enabled, so it
>> > may be the end of the critical section which is responsible for the
>> > latency.
>> >
>> > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> > ---
>> >
>> > A lockup + unstuck event now looks like this (with irqtrace enabled):
>> >
>> > watchdog: CPU 1 self-detected hard LOCKUP @ udelay+0x40/0x60
>> > watchdog: CPU 1 TB:82611697355, last heartbeat TB:75431975757
>>
>> Can we divide TB with tb_ticks_per_sec, TB itself is not very useful, the
>> delta maybe, but it needs more work on behalf of the person looking
>> at the output.
>
> I kind of prefer being able to examine register values and compare
> directly with these logs, e.g., in mambo or xmon.
>
> But maybe end user prefers something friendlier. What about like
>
> watchdog: CPU 1 self-detected hard LOCKUP @ udelay+0x40/0x60
> watchdog: CPU 1 no heartbeat for 14.02s (TB:82611697355, last TB:75431975757)
>
> ?

That's better, if you really need the TB values, assuming we've
handled overflows, which should practically never happen right? :)

>
>> > @@ -245,8 +260,6 @@ void soft_nmi_interrupt(struct pt_regs *regs)
>> >
>> >     tb = get_tb();
>> >     if (tb - per_cpu(wd_timer_tb, cpu) >= wd_panic_timeout_tb) {
>> > -           per_cpu(wd_timer_tb, cpu) = tb;
>> > -
>>
>> Is this related to the print improvements? It looks like you don't want
>> to reset the tb, but I would split it out
>
>
> Yeah there isn't any real reason to reset it since we get marked as
> stuck which prevents further messages, and it clobbrs our last heartbeat
> value. I'll put it into its own change.
>

Sounds good!
Balbir Singh.


More information about the Linuxppc-dev mailing list