[powerpc]WARN : arch/powerpc/platforms/powernv/smp.c:160

Michael Ellerman mpe at ellerman.id.au
Mon Aug 26 13:29:30 AEST 2019


Sachin Sant <sachinp at linux.vnet.ibm.com> writes:
> linux-next is currently broken on POWER8 non virtualized. Kernel
> fails to reach login prompt with following kernel warning
> repeatedly shown during boot.

I don't see it on my test systems.

The backtrace makes it look like you're doing CPU hot_un_plug during
boot, which seems a bit odd.

Or possibly it's just that the cpu_is_offline() test in do_idle() is
returning true due to some bug.

> The problem dates back atleast till next-20190816. 

A bisect would be helpful obviously :)

> [   40.285606] WARNING: CPU: 1 PID: 0 at arch/powerpc/platforms/powernv/smp.c:160 pnv_smp_cpu_kill_self+0x50/0x2d0
> [   40.285609] Modules linked in: kvm_hv kvm sunrpc dm_mirror dm_region_hash dm_log dm_mod ses enclosure scsi_transport_sas sg ipmi_powernv ipmi_devintf powernv_rng uio_pdrv_genirq uio leds_powernv ipmi_msghandler powernv_op_panel ibmpowernv ip_tables ext4 mbcache jbd2 sd_mod ipr tg3 libata ptp pps_core
> [   40.285643] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc5-next-20190823-autotest-autotest #1
> [   40.285644] NIP:  c0000000000b5f40 LR: c000000000055498 CTR: c0000000000b5ef0
> [   40.285646] REGS: c0000007f5527980 TRAP: 0700   Not tainted  (5.3.0-rc5-next-20190823-autotest-autotest)
> [   40.285646] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004028  XER: 00000000
> [   40.285650] CFAR: c000000000055494 IRQMASK: 1 
> [   40.285650] GPR00: c000000000055498 c0000007f5527c10 c00000000148b200 0000000000000000 
> [   40.285650] GPR04: 0000000000000000 c0000007fa897d80 c0000007fa90c800 00000007f9980000 
> [   40.285650] GPR08: 0000000000000000 0000000000000001 0000000000000000 c0000007fa90c800 
> [   40.285650] GPR12: c0000000000b5ef0 c0000007ffffee00 0000000000000800 c000000ffffc11d0 
> [   40.285650] GPR16: 0000000000000001 c000000001035280 0000000000000000 c0000000015303c0 
> [   40.285650] GPR20: c000000000052d60 0000000000000001 c0000007f54cd800 c0000007f54cd880 
> [   40.285650] GPR24: 0000000000080000 c0000007f54cd800 c0000000014bdf78 c0000000014c20d8 
> [   40.285650] GPR28: 0000000000000002 c0000000014c2538 0000000000000001 c0000007f54cd800 
> [   40.285662] NIP [c0000000000b5f40] pnv_smp_cpu_kill_self+0x50/0x2d0
> [   40.285664] LR [c000000000055498] cpu_die+0x48/0x64
> [   40.285665] Call Trace:
> [   40.285667] [c0000007f5527c10] [c000000000f85f10] ppc64_tlb_batch+0x0/0x1220 (unreliable)
> [   40.285669] [c0000007f5527df0] [c000000000055498] cpu_die+0x48/0x64
> [   40.285672] [c0000007f5527e10] [c0000000000226a0] arch_cpu_idle_dead+0x20/0x40
> [   40.285674] [c0000007f5527e30] [c00000000016bd2c] do_idle+0x37c/0x3f0
> [   40.285676] [c0000007f5527ed0] [c00000000016bfac] cpu_startup_entry+0x3c/0x50
> [   40.285678] [c0000007f5527f00] [c000000000055198] start_secondary+0x638/0x680
> [   40.285680] [c0000007f5527f90] [c00000000000ac5c] start_secondary_prolog+0x10/0x14
> [   40.285680] Instruction dump:
> [   40.285681] fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821fe21 e90d1178 
> [   40.285684] f9010198 39000000 892d0988 792907e0 <0b090000> 39200002 7d210164 39200003 
> [   40.285687] ---[ end trace 72c90a064122d9e4 ]—

That WARN shouldn't really kill the boot, do you see anything else?

> Relevant code snippet :
> 156         /*
> 157          * This hard disables local interurpts, ensuring we have no lazy
> 158          * irqs pending.
> 159          */
> 160         WARN_ON(irqs_disabled());  <<===
> 161         hard_irq_disable();
> 162         WARN_ON(lazy_irq_pending());

Even via the path shown above I think we should have IRQs enabled, but I
guess not.

cheers


More information about the Linuxppc-dev mailing list