[PATCH] powerpc/pseries: Fix scv instruction crash with kexec

Sourabh Jain sourabhjain at linux.ibm.com
Wed Jun 26 19:46:40 AEST 2024


Hello Michael,

On 26/06/24 14:57, Michael Ellerman wrote:
> Nicholas Piggin <npiggin at gmail.com> writes:
>> kexec on pseries disables AIL (reloc_on_exc), required for scv
>> instruction support, before other CPUs have been shut down. This means
>> they can execute scv instructions after AIL is disabled, which causes an
>> interrupt at an unexpected entry location that crashes the kernel.
>>
>> Change the kexec sequence to disable AIL after other CPUs have been
>> brought down.
>>
>> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
>> fixed-location head code probably couldn't easily deal with implementing
>> such high addresses so it was just decided not to support that interrupt
>> at all.
>>
>> Reported-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>   
> Was this reported publicly? I don't remember it.

No, I didn't report this issue publicly.

While debugging a kexec issue, the git bisect pointed to the commit 
mentioned
in the patch description. So, I contacted Nick directly.

`kexec -e` with --smt=off the first kernel hits exception when 
wake_offline_cpus() -> add_cpu() is called
to bring up offline CPUs.

Console log:

[   68.824514] restraintd[899]: * Parsing recipe
[   68.825546] restraintd[899]: * Running recipe
[   68.825591] restraintd[899]: ** Continuing task: 20291 
[/mnt/tests/distribution/reservesys]
[   68.834095] restraintd[899]: ** Preparing metadata
[   68.872927] restraintd[899]: ** Refreshing peer role hostnames: Retries 0
[   68.911107] restraintd[899]: ** Updating env vars
[   68.911737] restraintd[899]: *** Current Time: Tue May 21 09:09:42 
2024  Localwatchdog at:  * Disabled! *
[   68.922803] restraintd[899]: ** Running task: 20291 
[/distribution/reservesys]
[   78.027943] Removing IBM Power 842 compression device
[   78.093777] XFS (sda2): Block device removal (0x20) detected at 
xfs_fs_shutdown+0x34/0x50 [xfs] (fs/xfs/xfs_super.c:1179). Shutting down 
filesystem.
[   78.093894] XFS (sda2): Please unmount the filesystem and rectify the 
problem(s)
[   83.450854] dm-0: writeback error on inode 17086756, offset 569344, 
sector 11026136
[   83.450910] dm-0: writeback error on inode 36421601, offset 0, sector 
20772504
[   84.021819] dm-0: writeback error on inode 36382045, offset 0, sector 
20772536
[   84.094348] dm-0: writeback error on inode 18703102, offset 0, sector 
11021000
[   84.601228] dm-0: writeback error on inode 51268015, offset 0, sector 
27663152
[   84.601468] dm-0: writeback error on inode 58225471, offset 0, sector 
34636080
[   85.370996] kexec_core: Starting new kernel
[   85.391013] kexec: Waking offline cpu 1.
[   85.391038] ------------[ cut here ]------------
[   85.391042] kernel BUG at arch/powerpc/kernel/exceptions-64s.S:501!
[   85.391047] Oops: Exception in kernel mode, sig: 5 [#1]
[   85.391051] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[   85.391056] Modules linked in: bonding tls rfkill pseries_rng 
vmx_crypto drm fuse drm_panel_orientation_quirks xfs libcrc32c sr_mod 
sd_mod cdrom t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror 
dm_region_hash dm_log dm_mod
[   85.391086] CPU: 0 PID: 565 Comm: systemd-journal Kdump: loaded Not 
tainted 6.9.0+ #1
[   85.391092] Hardware name: IBM,9008-22L POWER9 (raw) 0x4e0202 
0xf000005 of:IBM,FW950.A0 (VL950_144) hv:phyp pSeries
[   85.391096] NIP:  c0000000000089a4 LR: 000000000001703c CTR: 
c000000000008980
[   85.391101] REGS: c00000000f76fd60 TRAP: 0700   Not tainted (6.9.0+)
[   85.391106] MSR:  8000000000021031 <SF,ME,IR,DR,LE>  CR: 240022d4  
XER: 00000000
[   85.391116] CFAR: c00000000000899c IRQMASK: 0
[   85.391116] GPR00: 0000000000000003 00007fffc4f783a0 00007fff9f0a7200 
0000010014331bb8
[   85.391116] GPR04: 00007fffc4f7b078 000000000000c4f6 00007fffc4f7b1d0 
00000100143469a0
[   85.391116] GPR08: 00007fff9f489268 00000000440022d4 00007fffc4f78670 
00000000000ac588
[   85.391116] GPR12: 8000000000009003 c000000002f50000 0000000000000000 
0000000000000000
[   85.391116] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[   85.391116] GPR20: 0000000000000000 0000000000000000 0000000127117b48 
00000001271185b8
[   85.391116] GPR24: 0000000127117b90 00007fffc4f7b070 0000010014331540 
00007fffc4f7b078
[   85.391116] GPR28: 0000000000000000 00007fffc4f78f80 000000000000c4f6 
0000010014331ba0
[   85.391173] NIP [c0000000000089a4] data_access_common_virt+0x14/0x220
[   85.391181] LR [000000000001703c] 0x1703c
[   85.391186] Call Trace:
[   85.391189] Code: 48024df9 48000000 60000000 e94d0020 694a0002 
7d400164 60000000 718a4000 7c2a0b78 3821fd30 41c20008 e82d0910 
<0981fd30> f9210160 f9610130 f9810138
[   85.391208] ---[ end trace 0000000000000000 ]---
[   85.394302] pstore: backend (nvram) writing error (-1)
[   85.394306]
[   86.394309] Kernel panic - not syncing: Fatal exception
[   86.399970] Rebooting in 10 seconds..


Thanks,
Sourabh Jain


More information about the Linuxppc-dev mailing list