[Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()

David? Gibson david at gibson.dropbear.id.au
Fri Dec 20 11:22:07 AEDT 2019


On Thu, Dec 19, 2019 at 02:08:29PM +0100, Cédric Le Goater wrote:
> On 19/12/2019 13:45, Michael Ellerman wrote:
> > "Jason A. Donenfeld" <Jason at zx2c4.com> writes:
> >> Hi folks,
> >>
> >> I'm actually still experiencing this sporadically in the WireGuard test 
> >> suite, which you can see being run on https://build.wireguard.com/ . 
> > 
> > Fancy dashboard you got there :)
> > 
> >> About 50% of the time the powerpc64 build will fail at a place like this:
> >>
> >> [   65.147823] Oops: Exception in kernel mode, sig: 4 [#1]
> >> [   65.149198] LE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=4 NUMA pSeries
> >> [   65.149595] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1+ #1
> >> [   65.149745] NIP:  c000000000033330 LR: c00000000007eda0 CTR: c00000000007ed80
> >> [   65.149934] REGS: c000000000a47970 TRAP: 0700   Not tainted  (5.5.0-rc1+)
> >> [   65.150032] MSR:  800000000288b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> > CR: 48008288  XER: 00000000
> >> [   65.150352] CFAR: c0000000000332bc IRQMASK: 1
> >> [   65.150352] GPR00: c000000000036508 c000000000a47c00 c000000000a4c100 0000000000000001
> >> [   65.150352] GPR04: c000000000a50998 0000000000000000 c000000000a50908 000000000f509000
> >> [   65.150352] GPR08: 0000000028000000 0000000000000000 0000000000000000 c00000000ff24f00
> >> [   65.150352] GPR12: c00000000007ed80 c000000000ad9000 0000000000000000 0000000000000000
> >> [   65.150352] GPR16: 00000000008c9190 00000000008c94a8 00000000008c92f8 00000000008c98b0
> >> [   65.150352] GPR20: 00000000008f2f88 fffffffffffffffd 0000000000000014 0000000000e6c100
> >> [   65.150352] GPR24: 0000000000e6c100 0000000000000001 0000000000000000 c000000000a50998
> >> [   65.150352] GPR28: c000000000a9e280 c000000000a50aa4 0000000000000002 0000000000000000
> >> [   65.151591] NIP [c000000000033330] doorbell_try_core_ipi+0xd0/0xf0
> >> [   65.151750] LR [c00000000007eda0] smp_pseries_cause_ipi+0x20/0x70
> >> [   65.151913] Call Trace:
> >> [   65.152109] [c000000000a47c00] [c0000000000c7c9c] _nohz_idle_balance+0xbc/0x300 (unreliable)
> >> [   65.152370] [c000000000a47c30] [c000000000036508] smp_send_reschedule+0x98/0xb0
> >> [   65.152711] [c000000000a47c50] [c0000000000c1634] kick_ilb+0x114/0x140
> >> [   65.152962] [c000000000a47ca0] [c0000000000c86d8] newidle_balance+0x4e8/0x500
> >> [   65.153213] [c000000000a47d20] [c0000000000c8788] pick_next_task_fair+0x48/0x3a0
> >> [   65.153424] [c000000000a47d80] [c000000000466620] __schedule+0xf0/0x430
> >> [   65.153612] [c000000000a47de0] [c000000000466b04] schedule_idle+0x34/0x70
> >> [   65.153786] [c000000000a47e10] [c0000000000c0bc8] do_idle+0x1a8/0x220
> >> [   65.154121] [c000000000a47e70] [c0000000000c0e94] cpu_startup_entry+0x34/0x40
> >> [   65.154313] [c000000000a47ea0] [c00000000000ef1c] rest_init+0x10c/0x124
> >> [   65.154414] [c000000000a47ee0] [c000000000500004] start_kernel+0x568/0x594
> >> [   65.154585] [c000000000a47f90] [c00000000000a7cc] start_here_common+0x1c/0x330
> >> [   65.154854] Instruction dump:
> >> [   65.155191] 38210030 e8010010 7c0803a6 4e800020 3d220004 39295228 81290000 3929ffff
> >> [   65.155498] 7d284038 7c0004ac 5508017e 65082800 <7c00411c> e94d0178 812a0000 3929ffff
> >                                                       ^
> > Again the faulting instruction there is "msgsndp r8"
> > 
> >> [   65.156155] ---[ end trace 6180d12e268ffdaf ]---
> >> [   65.185452]
> >> [   66.187490] Kernel panic - not syncing: Fatal exception
> >>
> >> This is with "qemu-system-ppc64 -smp 4 -machine pseries" on QEMU 4.0.0.
> >>
> >> I'm not totally sure what's going on here. I'm emulating a pseries, and 
> >> using that with qemu's pseries model, and I see that selecting the 
> >> pseries forces the selection of 'config PPC_DOORBELL' (twice in the same 
> >> section, actually).
> > 
> > Noted.
> > 
> >> Then inside the kernel there appears to be some runtime CPU check for
> >> doorbell support.
> > 
> > Not really. The kernel looks at the CPU revision (PVR) and decides that
> > it has doorbell support.
> > 
> >> Is this a case in which QEMU is advertising doorbell support that TCG
> >> doesn't have? Or is something else happening here?
> > 
> > It's a gap in the emulation I guess. qemu doesn't emulate msgsndp, but
> > it really should because that's a supported instruction since Power8.
> 
> There is a patch for msgsndp in my tree you could try : 
> 
>   https://github.com/legoater/qemu/tree/powernv-5.0
> 
> Currently being reviewed. I have to address some remarks from David before
> it can be merged.

Right.  It needs some polish, but I expect we'll have this merged in
the not too distant future.

> > I suspect msgsndp wasn't implemented for TCG because TCG doesn't support
> > more than one thread per core, and you can only send doorbells to other
> > threads in the same core, and therefore there is no reason to ever use
> > msgsndp.
> 
> There is a need now with KVM emulation under TCG, but, yes, QEMU still lacks
> SMT support.
> 
> > That's the message Suraj mentioned up thread, eg:
> > 
> >   $ qemu-system-ppc64 -nographic -vga none -M pseries -smp 2,threads=2 -cpu POWER8 -kernel build~/vmlinux
> >   qemu-system-ppc64: TCG cannot support more than 1 thread/core on a pseries machine
> > 
> > 
> > But I guess we've hit another case of a CPU sending itself an IPI, and
> > the way the sibling masks are done, CPUs are siblings of themselves, so
> > the sibling test passes, eg:
> > 
> > int doorbell_try_core_ipi(int cpu)
> > {
> > 	int this_cpu = get_cpu();
> > 	int ret = 0;
> > 
> > 	if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
> > 		doorbell_core_ipi(cpu);
> > 
> > 
> > 
> > In which case this patch should fix it.
> > 
> > diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
> > index f17ff1200eaa..e45cb9bba193 100644
> > --- a/arch/powerpc/kernel/dbell.c
> > +++ b/arch/powerpc/kernel/dbell.c
> > @@ -63,7 +63,7 @@ int doorbell_try_core_ipi(int cpu)
> >  	int this_cpu = get_cpu();
> >  	int ret = 0;
> >  
> > -	if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
> > +	if (cpu != this_cpu && cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
> >  		doorbell_core_ipi(cpu);
> >  		ret = 1;
> >  	}
> > 
> > 
> > The other option would be we disable CPU_FTR_DBELL if we detect we're
> > running under TCG. But I'm not sure we have a particularly clean way to
> > detect that.
> 
> does the pseries kernel support cpufeatures in the DT ?
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20191220/352e4a7b/attachment.sig>


More information about the Linuxppc-dev mailing list