[PATCH v1] powerpc/watchdog: help remote CPUs to flush NMI printk output
Nicholas Piggin
npiggin at gmail.com
Fri Nov 19 20:15:59 AEDT 2021
Excerpts from Daniel Axtens's message of November 12, 2021 4:08 pm:
> Hi,
>
>> The printk layer at the moment does not seem to have a good way to force
>> flush printk messages that are created in NMI context, except in the
>> panic path.
>>
>> NMI-context printk messages normally get to the console with irq_work,
>> but that won't help if the CPU is stuck with irqs disabled, as can be
>> the case for hard lockup watchdog messages.
>>
>> The watchdog currently flushes the printk buffers after detecting a
>> lockup on remote CPUs, but they may not have processed their NMI IPI
>> yet by that stage, or they may have self-detected a lockup in which
>> case they won't go via this NMI IPI path.
>>
>> Improve the situation by having NMI-context mark a flag if it called
>> printk, and have watchdog timer interrupts check if that flag was set
>> and try to flush if it was. Latency is not a big problem because we
>> were already stuck for a while, just need to try to make sure the
>> messages eventually make it out.
Sorry just coming back to this now the printk fix was merged upstream.
> Initially I was surprised that this doesn't affect the printk code
> itself, just the powerpc code...
Yeah I'm actually not sure how other watchdogs handle this. If they
use nmi_trigger_cpumask_backtrace() then that does have a flush.
>> Cc: Laurent Dufour <ldufour at linux.ibm.com>
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> ---
>> This patch is actually based on top of this one which is planned to go
>> upstream in rc1/2. https://marc.info/?l=linux-kernel&m=163626070312052&w=2
>>
>> Prior to commit 93d102f094be that is fixed by the above, we had a printk
>> flush function with a different name but basically does the same job, so
>> this patch can be backported, just needs some care. I'm posting it for
>> review now for feedback so it's ready to go when the printk patch is
>> upstream.
>>
>> Thanks,
>> Nick
>>
>> arch/powerpc/kernel/watchdog.c | 29 +++++++++++++++++++++++------
>> 1 file changed, 23 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>> index b6533539386b..a7b6b0691203 100644
>> --- a/arch/powerpc/kernel/watchdog.c
>> +++ b/arch/powerpc/kernel/watchdog.c
>> @@ -86,6 +86,7 @@ static DEFINE_PER_CPU(u64, wd_timer_tb);
>> /* SMP checker bits */
>> static unsigned long __wd_smp_lock;
>> static unsigned long __wd_reporting;
>> +static unsigned long __wd_nmi_output;
>> static cpumask_t wd_smp_cpus_pending;
>> static cpumask_t wd_smp_cpus_stuck;
>> static u64 wd_smp_last_reset_tb;
>> @@ -154,6 +155,18 @@ static void wd_lockup_ipi(struct pt_regs *regs)
>> else
>> dump_stack();
>>
>> + /*
>> + * We printk()ed from NMI context, the contents may not get flushed
>> + * if we return to a context with interrupts disabled because
>> + * printk uses irq_work to schedule flushes of NMI output.
>> + * __wd_nmi_output says there has been output from NMI context, so
>> + * other CPUs are recruited to help flush it.
>> + *
>> + * xchg is not needed here (it could be a simple atomic store), but
>> + * it gives the memory ordering and atomicity required.
>> + */
>> + xchg(&__wd_nmi_output, 1);
>> +
>> /* Do not panic from here because that can recurse into NMI IPI layer */
>> }
>
> I think, looking at this and the other site where __wd_nmi_output is
> set, that this works because you set the flag only when you are done
> printing from the non-panic lockup context on this CPU. I was initially
> worried that you set this flag part way through printing, and then it
> might get cleared by another CPU while you're still trying to print.
> However, in this function it's right at the end - there's nothing else
> left to do, and ...
>
>> DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>> @@ -386,6 +401,8 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>> print_irqtrace_events(current);
>> show_regs(regs);
>>
>> + xchg(&__wd_nmi_output, 1); // see wd_lockup_ipi
>> +
>> if (sysctl_hardlockup_all_cpu_backtrace)
>> trigger_allbutself_cpu_backtrace();
>
> in this one, the only things that can happen afterwards are
> - a panic, which does its own flushing, and
>
> - trigger_allbutself_cpu_backtrace(), which seems to just send IPIs, not
> do any printing of its own.
Yeah, on powerpc that actually ends up not using NMI IPIs because that's
dangerous and core code uses it for less severe things than a hard
lockup so we disabled it, so if you can take a regular IPI then you can
run irq work when it returns and print things. If we changed it to use
NMI IPIs I guess we might want to do something, maybe.
> All of which is fine, but I wonder if we need a more descriptive
> variable name or if the comment needs to specify that the flag should
> only be set at certain times.
I did try to add some explanation in that comment. I'll see if that can
be made a bit more readable and expand on it a bit.
Thanks,
Nick
More information about the Linuxppc-dev
mailing list