[PATCH] powerpc/pseries: Fix scv instruction crash with kexec
Sourabh Jain
sourabhjain at linux.ibm.com
Wed Jun 26 19:46:40 AEST 2024
Hello Michael,
On 26/06/24 14:57, Michael Ellerman wrote:
> Nicholas Piggin <npiggin at gmail.com> writes:
>> kexec on pseries disables AIL (reloc_on_exc), required for scv
>> instruction support, before other CPUs have been shut down. This means
>> they can execute scv instructions after AIL is disabled, which causes an
>> interrupt at an unexpected entry location that crashes the kernel.
>>
>> Change the kexec sequence to disable AIL after other CPUs have been
>> brought down.
>>
>> As a refresher, the real-mode scv interrupt vector is 0x17000, and the
>> fixed-location head code probably couldn't easily deal with implementing
>> such high addresses so it was just decided not to support that interrupt
>> at all.
>>
>> Reported-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>
> Was this reported publicly? I don't remember it.
No, I didn't report this issue publicly.
While debugging a kexec issue, the git bisect pointed to the commit
mentioned
in the patch description. So, I contacted Nick directly.
`kexec -e` with --smt=off the first kernel hits exception when
wake_offline_cpus() -> add_cpu() is called
to bring up offline CPUs.
Console log:
[ 68.824514] restraintd[899]: * Parsing recipe
[ 68.825546] restraintd[899]: * Running recipe
[ 68.825591] restraintd[899]: ** Continuing task: 20291
[/mnt/tests/distribution/reservesys]
[ 68.834095] restraintd[899]: ** Preparing metadata
[ 68.872927] restraintd[899]: ** Refreshing peer role hostnames: Retries 0
[ 68.911107] restraintd[899]: ** Updating env vars
[ 68.911737] restraintd[899]: *** Current Time: Tue May 21 09:09:42
2024 Localwatchdog at: * Disabled! *
[ 68.922803] restraintd[899]: ** Running task: 20291
[/distribution/reservesys]
[ 78.027943] Removing IBM Power 842 compression device
[ 78.093777] XFS (sda2): Block device removal (0x20) detected at
xfs_fs_shutdown+0x34/0x50 [xfs] (fs/xfs/xfs_super.c:1179). Shutting down
filesystem.
[ 78.093894] XFS (sda2): Please unmount the filesystem and rectify the
problem(s)
[ 83.450854] dm-0: writeback error on inode 17086756, offset 569344,
sector 11026136
[ 83.450910] dm-0: writeback error on inode 36421601, offset 0, sector
20772504
[ 84.021819] dm-0: writeback error on inode 36382045, offset 0, sector
20772536
[ 84.094348] dm-0: writeback error on inode 18703102, offset 0, sector
11021000
[ 84.601228] dm-0: writeback error on inode 51268015, offset 0, sector
27663152
[ 84.601468] dm-0: writeback error on inode 58225471, offset 0, sector
34636080
[ 85.370996] kexec_core: Starting new kernel
[ 85.391013] kexec: Waking offline cpu 1.
[ 85.391038] ------------[ cut here ]------------
[ 85.391042] kernel BUG at arch/powerpc/kernel/exceptions-64s.S:501!
[ 85.391047] Oops: Exception in kernel mode, sig: 5 [#1]
[ 85.391051] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 85.391056] Modules linked in: bonding tls rfkill pseries_rng
vmx_crypto drm fuse drm_panel_orientation_quirks xfs libcrc32c sr_mod
sd_mod cdrom t10_pi sg ibmvscsi ibmveth scsi_transport_srp dm_mirror
dm_region_hash dm_log dm_mod
[ 85.391086] CPU: 0 PID: 565 Comm: systemd-journal Kdump: loaded Not
tainted 6.9.0+ #1
[ 85.391092] Hardware name: IBM,9008-22L POWER9 (raw) 0x4e0202
0xf000005 of:IBM,FW950.A0 (VL950_144) hv:phyp pSeries
[ 85.391096] NIP: c0000000000089a4 LR: 000000000001703c CTR:
c000000000008980
[ 85.391101] REGS: c00000000f76fd60 TRAP: 0700 Not tainted (6.9.0+)
[ 85.391106] MSR: 8000000000021031 <SF,ME,IR,DR,LE> CR: 240022d4
XER: 00000000
[ 85.391116] CFAR: c00000000000899c IRQMASK: 0
[ 85.391116] GPR00: 0000000000000003 00007fffc4f783a0 00007fff9f0a7200
0000010014331bb8
[ 85.391116] GPR04: 00007fffc4f7b078 000000000000c4f6 00007fffc4f7b1d0
00000100143469a0
[ 85.391116] GPR08: 00007fff9f489268 00000000440022d4 00007fffc4f78670
00000000000ac588
[ 85.391116] GPR12: 8000000000009003 c000000002f50000 0000000000000000
0000000000000000
[ 85.391116] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 85.391116] GPR20: 0000000000000000 0000000000000000 0000000127117b48
00000001271185b8
[ 85.391116] GPR24: 0000000127117b90 00007fffc4f7b070 0000010014331540
00007fffc4f7b078
[ 85.391116] GPR28: 0000000000000000 00007fffc4f78f80 000000000000c4f6
0000010014331ba0
[ 85.391173] NIP [c0000000000089a4] data_access_common_virt+0x14/0x220
[ 85.391181] LR [000000000001703c] 0x1703c
[ 85.391186] Call Trace:
[ 85.391189] Code: 48024df9 48000000 60000000 e94d0020 694a0002
7d400164 60000000 718a4000 7c2a0b78 3821fd30 41c20008 e82d0910
<0981fd30> f9210160 f9610130 f9810138
[ 85.391208] ---[ end trace 0000000000000000 ]---
[ 85.394302] pstore: backend (nvram) writing error (-1)
[ 85.394306]
[ 86.394309] Kernel panic - not syncing: Fatal exception
[ 86.399970] Rebooting in 10 seconds..
Thanks,
Sourabh Jain
More information about the Linuxppc-dev
mailing list