[Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()
Nicholas Piggin
npiggin at gmail.com
Sat Mar 30 14:10:28 AEDT 2019
Steven Rostedt's on March 30, 2019 1:31 am:
> [ Added Peter ]
>
> On Fri, 29 Mar 2019 19:13:55 +1000
> Nicholas Piggin <npiggin at gmail.com> wrote:
>
>> Suraj Jitindar Singh's on March 29, 2019 3:20 pm:
>> > On Wed, 2019-03-27 at 17:51 +0100, Cédric Le Goater wrote:
>> >> On 3/27/19 5:37 PM, Cédric Le Goater wrote:
>> >> > On 3/27/19 1:36 PM, Sebastian Andrzej Siewior wrote:
>> >> > > With qemu-system-ppc64le -machine pseries -smp 4 I get:
>> >> > >
>> >> > > > # chrt 1 hackbench
>> >> > > > Running in process mode with 10 groups using 40 file
>> >> > > > descriptors each (== 400 tasks)
>> >> > > > Each sender will pass 100 messages of 100 bytes
>> >> > > > Oops: Exception in kernel mode, sig: 4 [#1]
>> >> > > > LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA pSeries
>> >> > > > Modules linked in:
>> >> > > > CPU: 0 PID: 629 Comm: hackbench Not tainted 5.1.0-rc2 #71
>> >> > > > NIP: c000000000046978 LR: c000000000046a38 CTR:
>> >> > > > c0000000000b0150
>> >> > > > REGS: c0000001fffeb8e0 TRAP: 0700 Not tainted (5.1.0-rc2)
>> >> > > > MSR: 8000000000089033 <SF,EE,ME,IR,DR,RI,LE> CR:
>> >> > > > 42000874 XER: 00000000
>> >> > > > CFAR: c000000000046a34 IRQMASK: 1
>> >> > > > GPR00: c0000000000b0170 c0000001fffebb70 c000000000a6ba00
>> >> > > > 0000000028000000
>> >> > >
>> >> > > …
>> >> > > > NIP [c000000000046978] doorbell_core_ipi+0x28/0x30
>> >> > > > LR [c000000000046a38] doorbell_try_core_ipi+0xb8/0xf0
>> >> > > > Call Trace:
>> >> > > > [c0000001fffebb70] [c0000001fffebba0] 0xc0000001fffebba0
>> >> > > > (unreliable)
>> >> > > > [c0000001fffebba0] [c0000000000b0170]
>> >> > > > smp_pseries_cause_ipi+0x20/0x70
>> >> > > > [c0000001fffebbd0] [c00000000004b02c]
>> >> > > > arch_send_call_function_single_ipi+0x8c/0xa0
>> >> > > > [c0000001fffebbf0] [c0000000001de600]
>> >> > > > irq_work_queue_on+0xe0/0x130
>> >> > > > [c0000001fffebc30] [c0000000001340c8]
>> >> > > > rto_push_irq_work_func+0xc8/0x120
>> >> > >
>> >> > > …
>> >> > > > Instruction dump:
>> >> > > > 60000000 60000000 3c4c00a2 384250b0 3d220009 392949c8 81290000
>> >> > > > 3929ffff
>> >> > > > 7d231838 7c0004ac 5463017e 64632800 <7c00191c> 4e800020
>> >> > > > 3c4c00a2 38425080
>> >> > > > ---[ end trace eb842b544538cbdf ]---
>>
>> This is unusual and causing powerpc code to crash because the rt
>> scheduler is telling irq_work_queue_on to queue work on this CPU.
>> Is that something allowed? There's no warnings in there but it must
>> be a rarely tested path, would it be better to ban it?
>>
>> Steven is this queue_work_on to self by design?
>
> I just figured that a irq_work_queue_on() would default to just
> irq_work_queue() if cpu == smp_processor_id().
Yeah it actually doesn't, it raises a CALL_FUNCTION IPI on the
current CPU. Maybe it should do that.
>
> I'm sure this case has been tested (on x86), as I stressed this quite a
> bit, and all it takes is to have an RT task queued on the CPU
> processing the current "push" logic when it's not holding the rq locks
> and read that the current CPU now has an overloaded RT task.
>
> If calling irq_work_queue_on() is really bad to do to the same CPU,
> then we can work on changing the logic to do something different if the
> CPU returned from rto_next_cpu() is the current CPU.
I don't actually know that it is bad. It does seem to violate the
principle of least surprise for arch code though.
>From this bug report, I'm guessing it's pretty rare case to hit so
I would worry about subtle bugs. And I just cc'ed you in case your
code was actually not intending to do this.
> Or would it be possible to just change irq_work_queue_on() to call
> irq_work_queue() if cpu == smp_processor_id()?
Something like this?
kernel/irq_work: Do not raise an IPI when queueing work on the local CPU
The QEMU powerpc/pseries machine model was not expecting a self-IPI,
and it may be a bit surprising thing to do, so have irq_work_queue_on
do local queueing when target is current CPU.
Suggested-by: Steven Rostedt <rostedt at goodmis.org>
Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
---
kernel/irq_work.c | 78 ++++++++++++++++++++++++++---------------------
1 file changed, 43 insertions(+), 35 deletions(-)
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 6b7cdf17ccf8..f0e539d0f879 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -56,61 +56,69 @@ void __weak arch_irq_work_raise(void)
*/
}
-/*
- * Enqueue the irq_work @work on @cpu unless it's already pending
- * somewhere.
- *
- * Can be re-enqueued while the callback is still in progress.
- */
-bool irq_work_queue_on(struct irq_work *work, int cpu)
+/* Enqueue on current CPU, work must already be claimed and preempt disabled */
+static void __irq_work_queue(struct irq_work *work)
{
- /* All work should have been flushed before going offline */
- WARN_ON_ONCE(cpu_is_offline(cpu));
-
-#ifdef CONFIG_SMP
-
- /* Arch remote IPI send/receive backend aren't NMI safe */
- WARN_ON_ONCE(in_nmi());
+ /* If the work is "lazy", handle it from next tick if any */
+ if (work->flags & IRQ_WORK_LAZY) {
+ if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
+ tick_nohz_tick_stopped())
+ arch_irq_work_raise();
+ } else {
+ if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
+ arch_irq_work_raise();
+ }
+}
+/* Enqueue the irq work @work on the current CPU */
+bool irq_work_queue(struct irq_work *work)
+{
/* Only queue if not already pending */
if (!irq_work_claim(work))
return false;
- if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
- arch_send_call_function_single_ipi(cpu);
-
-#else /* #ifdef CONFIG_SMP */
- irq_work_queue(work);
-#endif /* #else #ifdef CONFIG_SMP */
+ /* Queue the entry and raise the IPI if needed. */
+ preempt_disable();
+ __irq_work_queue(work);
+ preempt_enable();
return true;
}
+EXPORT_SYMBOL_GPL(irq_work_queue);
-/* Enqueue the irq work @work on the current CPU */
-bool irq_work_queue(struct irq_work *work)
+/*
+ * Enqueue the irq_work @work on @cpu unless it's already pending
+ * somewhere.
+ *
+ * Can be re-enqueued while the callback is still in progress.
+ */
+bool irq_work_queue_on(struct irq_work *work, int cpu)
{
+#ifndef CONFIG_SMP
+ return irq_work_queue(work);
+
+#else /* #ifndef CONFIG_SMP */
+ /* All work should have been flushed before going offline */
+ WARN_ON_ONCE(cpu_is_offline(cpu));
+
/* Only queue if not already pending */
if (!irq_work_claim(work))
return false;
- /* Queue the entry and raise the IPI if needed. */
preempt_disable();
-
- /* If the work is "lazy", handle it from next tick if any */
- if (work->flags & IRQ_WORK_LAZY) {
- if (llist_add(&work->llnode, this_cpu_ptr(&lazy_list)) &&
- tick_nohz_tick_stopped())
- arch_irq_work_raise();
- } else {
- if (llist_add(&work->llnode, this_cpu_ptr(&raised_list)))
- arch_irq_work_raise();
- }
-
+ if (cpu != smp_processor_id()) {
+ /* Arch remote IPI send/receive backend aren't NMI safe */
+ WARN_ON_ONCE(in_nmi());
+ if (llist_add(&work->llnode, &per_cpu(raised_list, cpu)))
+ arch_send_call_function_single_ipi(cpu);
+ } else
+ __irq_work_queue(work);
preempt_enable();
return true;
+#endif /* #else #ifndef CONFIG_SMP */
}
-EXPORT_SYMBOL_GPL(irq_work_queue);
+
bool irq_work_needs_cpu(void)
{
--
2.20.1
More information about the Linuxppc-dev
mailing list