[powerpc]WARN : arch/powerpc/platforms/powernv/smp.c:160

Sachin Sant sachinp at linux.vnet.ibm.com
Mon Aug 26 20:12:16 AEST 2019



> On 26-Aug-2019, at 8:59 AM, Michael Ellerman <mpe at ellerman.id.au> wrote:
> 
> Sachin Sant <sachinp at linux.vnet.ibm.com> writes:
>> linux-next is currently broken on POWER8 non virtualized. Kernel
>> fails to reach login prompt with following kernel warning
>> repeatedly shown during boot.
> 
> I don't see it on my test systems.
> 
> The backtrace makes it look like you're doing CPU hot_un_plug during
> boot, which seems a bit odd.
> 
There is no explicit hot un plug operation being done. This happens
during boot.

For some reason cpu’s are being off lined.  I had earlier reported
that kernel does not boot till login prompt. I was wrong. Kernel does
boot. Not surr if it’s a side effect of these warnings, SMT is off after
the boot.

# lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0,8,16,24,32,40,48,56,64,72
Off-line CPU(s) list:  1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79
Thread(s) per core:    1
Core(s) per socket:    5
……..
NUMA node0 CPU(s):     0,8,16,24,32
NUMA node1 CPU(s):     40,48,56,64,72
#
# ppc64_cpu --smt
SMT is off
#

I can manually turn on the SMT. 

> Or possibly it's just that the cpu_is_offline() test in do_idle() is
> returning true due to some bug.
> 
>> The problem dates back atleast till next-20190816. 
> 
> A bisect would be helpful obviously :)

Last successful kernel boot was with next-20190808.  
Will attempt a bisect. Started failing with 9th Aug tree.

> 
>> [   40.285606] WARNING: CPU: 1 PID: 0 at arch/powerpc/platforms/powernv/smp.c:160 pnv_smp_cpu_kill_self+0x50/0x2d0
>> [   40.285609] Modules linked in: kvm_hv kvm sunrpc dm_mirror dm_region_hash dm_log dm_mod ses enclosure scsi_transport_sas sg ipmi_powernv ipmi_devintf powernv_rng uio_pdrv_genirq uio leds_powernv ipmi_msghandler powernv_op_panel ibmpowernv ip_tables ext4 mbcache jbd2 sd_mod ipr tg3 libata ptp pps_core
>> [   40.285643] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc5-next-20190823-autotest-autotest #1
>> [   40.285644] NIP:  c0000000000b5f40 LR: c000000000055498 CTR: c0000000000b5ef0
>> [   40.285646] REGS: c0000007f5527980 TRAP: 0700   Not tainted  (5.3.0-rc5-next-20190823-autotest-autotest)
>> [   40.285646] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004028  XER: 00000000
>> [   40.285650] CFAR: c000000000055494 IRQMASK: 1 
>> [   40.285650] GPR00: c000000000055498 c0000007f5527c10 c00000000148b200 0000000000000000 
>> [   40.285650] GPR04: 0000000000000000 c0000007fa897d80 c0000007fa90c800 00000007f9980000 
>> [   40.285650] GPR08: 0000000000000000 0000000000000001 0000000000000000 c0000007fa90c800 
>> [   40.285650] GPR12: c0000000000b5ef0 c0000007ffffee00 0000000000000800 c000000ffffc11d0 
>> [   40.285650] GPR16: 0000000000000001 c000000001035280 0000000000000000 c0000000015303c0 
>> [   40.285650] GPR20: c000000000052d60 0000000000000001 c0000007f54cd800 c0000007f54cd880 
>> [   40.285650] GPR24: 0000000000080000 c0000007f54cd800 c0000000014bdf78 c0000000014c20d8 
>> [   40.285650] GPR28: 0000000000000002 c0000000014c2538 0000000000000001 c0000007f54cd800 
>> [   40.285662] NIP [c0000000000b5f40] pnv_smp_cpu_kill_self+0x50/0x2d0
>> [   40.285664] LR [c000000000055498] cpu_die+0x48/0x64
>> [   40.285665] Call Trace:
>> [   40.285667] [c0000007f5527c10] [c000000000f85f10] ppc64_tlb_batch+0x0/0x1220 (unreliable)
>> [   40.285669] [c0000007f5527df0] [c000000000055498] cpu_die+0x48/0x64
>> [   40.285672] [c0000007f5527e10] [c0000000000226a0] arch_cpu_idle_dead+0x20/0x40
>> [   40.285674] [c0000007f5527e30] [c00000000016bd2c] do_idle+0x37c/0x3f0
>> [   40.285676] [c0000007f5527ed0] [c00000000016bfac] cpu_startup_entry+0x3c/0x50
>> [   40.285678] [c0000007f5527f00] [c000000000055198] start_secondary+0x638/0x680
>> [   40.285680] [c0000007f5527f90] [c00000000000ac5c] start_secondary_prolog+0x10/0x14
>> [   40.285680] Instruction dump:
>> [   40.285681] fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821fe21 e90d1178 
>> [   40.285684] f9010198 39000000 892d0988 792907e0 <0b090000> 39200002 7d210164 39200003 
>> [   40.285687] ---[ end trace 72c90a064122d9e4 ]—
> 
> That WARN shouldn't really kill the boot, do you see anything else?

The machine actually boots till login prompt. 
I have attached the boot log(5.3.0-rc4-next-20190814)

Thanks
-Sachin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: boot.log
Type: application/octet-stream
Size: 263564 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190826/c47f661c/attachment-0001.obj>


More information about the Linuxppc-dev mailing list