[Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()

Michael Ellerman mpe at ellerman.id.au
Thu Dec 19 23:45:26 AEDT 2019


"Jason A. Donenfeld" <Jason at zx2c4.com> writes:
> Hi folks,
>
> I'm actually still experiencing this sporadically in the WireGuard test 
> suite, which you can see being run on https://build.wireguard.com/ . 

Fancy dashboard you got there :)

> About 50% of the time the powerpc64 build will fail at a place like this:
>
> [   65.147823] Oops: Exception in kernel mode, sig: 4 [#1]
> [   65.149198] LE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=4 NUMA pSeries
> [   65.149595] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1+ #1
> [   65.149745] NIP:  c000000000033330 LR: c00000000007eda0 CTR: c00000000007ed80
> [   65.149934] REGS: c000000000a47970 TRAP: 0700   Not tainted  (5.5.0-rc1+)
> [   65.150032] MSR:  800000000288b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> > CR: 48008288  XER: 00000000
> [   65.150352] CFAR: c0000000000332bc IRQMASK: 1
> [   65.150352] GPR00: c000000000036508 c000000000a47c00 c000000000a4c100 0000000000000001
> [   65.150352] GPR04: c000000000a50998 0000000000000000 c000000000a50908 000000000f509000
> [   65.150352] GPR08: 0000000028000000 0000000000000000 0000000000000000 c00000000ff24f00
> [   65.150352] GPR12: c00000000007ed80 c000000000ad9000 0000000000000000 0000000000000000
> [   65.150352] GPR16: 00000000008c9190 00000000008c94a8 00000000008c92f8 00000000008c98b0
> [   65.150352] GPR20: 00000000008f2f88 fffffffffffffffd 0000000000000014 0000000000e6c100
> [   65.150352] GPR24: 0000000000e6c100 0000000000000001 0000000000000000 c000000000a50998
> [   65.150352] GPR28: c000000000a9e280 c000000000a50aa4 0000000000000002 0000000000000000
> [   65.151591] NIP [c000000000033330] doorbell_try_core_ipi+0xd0/0xf0
> [   65.151750] LR [c00000000007eda0] smp_pseries_cause_ipi+0x20/0x70
> [   65.151913] Call Trace:
> [   65.152109] [c000000000a47c00] [c0000000000c7c9c] _nohz_idle_balance+0xbc/0x300 (unreliable)
> [   65.152370] [c000000000a47c30] [c000000000036508] smp_send_reschedule+0x98/0xb0
> [   65.152711] [c000000000a47c50] [c0000000000c1634] kick_ilb+0x114/0x140
> [   65.152962] [c000000000a47ca0] [c0000000000c86d8] newidle_balance+0x4e8/0x500
> [   65.153213] [c000000000a47d20] [c0000000000c8788] pick_next_task_fair+0x48/0x3a0
> [   65.153424] [c000000000a47d80] [c000000000466620] __schedule+0xf0/0x430
> [   65.153612] [c000000000a47de0] [c000000000466b04] schedule_idle+0x34/0x70
> [   65.153786] [c000000000a47e10] [c0000000000c0bc8] do_idle+0x1a8/0x220
> [   65.154121] [c000000000a47e70] [c0000000000c0e94] cpu_startup_entry+0x34/0x40
> [   65.154313] [c000000000a47ea0] [c00000000000ef1c] rest_init+0x10c/0x124
> [   65.154414] [c000000000a47ee0] [c000000000500004] start_kernel+0x568/0x594
> [   65.154585] [c000000000a47f90] [c00000000000a7cc] start_here_common+0x1c/0x330
> [   65.154854] Instruction dump:
> [   65.155191] 38210030 e8010010 7c0803a6 4e800020 3d220004 39295228 81290000 3929ffff
> [   65.155498] 7d284038 7c0004ac 5508017e 65082800 <7c00411c> e94d0178 812a0000 3929ffff
                                                      ^
Again the faulting instruction there is "msgsndp r8"

> [   65.156155] ---[ end trace 6180d12e268ffdaf ]---
> [   65.185452]
> [   66.187490] Kernel panic - not syncing: Fatal exception
>
> This is with "qemu-system-ppc64 -smp 4 -machine pseries" on QEMU 4.0.0.
>
> I'm not totally sure what's going on here. I'm emulating a pseries, and 
> using that with qemu's pseries model, and I see that selecting the 
> pseries forces the selection of 'config PPC_DOORBELL' (twice in the same 
> section, actually).

Noted.

> Then inside the kernel there appears to be some runtime CPU check for
> doorbell support.

Not really. The kernel looks at the CPU revision (PVR) and decides that
it has doorbell support.

> Is this a case in which QEMU is advertising doorbell support that TCG
> doesn't have? Or is something else happening here?

It's a gap in the emulation I guess. qemu doesn't emulate msgsndp, but
it really should because that's a supported instruction since Power8.

I suspect msgsndp wasn't implemented for TCG because TCG doesn't support
more than one thread per core, and you can only send doorbells to other
threads in the same core, and therefore there is no reason to ever use
msgsndp.

That's the message Suraj mentioned up thread, eg:

  $ qemu-system-ppc64 -nographic -vga none -M pseries -smp 2,threads=2 -cpu POWER8 -kernel build~/vmlinux
  qemu-system-ppc64: TCG cannot support more than 1 thread/core on a pseries machine


But I guess we've hit another case of a CPU sending itself an IPI, and
the way the sibling masks are done, CPUs are siblings of themselves, so
the sibling test passes, eg:

int doorbell_try_core_ipi(int cpu)
{
	int this_cpu = get_cpu();
	int ret = 0;

	if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
		doorbell_core_ipi(cpu);



In which case this patch should fix it.

diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
index f17ff1200eaa..e45cb9bba193 100644
--- a/arch/powerpc/kernel/dbell.c
+++ b/arch/powerpc/kernel/dbell.c
@@ -63,7 +63,7 @@ int doorbell_try_core_ipi(int cpu)
 	int this_cpu = get_cpu();
 	int ret = 0;
 
-	if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
+	if (cpu != this_cpu && cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
 		doorbell_core_ipi(cpu);
 		ret = 1;
 	}


The other option would be we disable CPU_FTR_DBELL if we detect we're
running under TCG. But I'm not sure we have a particularly clean way to
detect that.

cheers


More information about the Linuxppc-dev mailing list