[powerpc/merge] Possible stack corruption while running selftests
Sachin Sant
sachinp at linux.ibm.com
Thu Mar 24 17:54:19 AEDT 2022
I am seeing random crashes(at least to me) with powerpc/selftests on P10 LPAR
running powerpc/merge branch code. mitigation-patching.sh test was running
in both the instances.
In the latest instance it seems like a possible stack corruption ??
[ 711.005150] count-cache-flush: hardware flush enabled.
[ 711.005153] link-stack-flush: software flush enabled.
[ 711.015306] barrier-nospec: using ORI speculation barrier
[ 711.030889] kernel tried to execute exec-protected page (c00000000a70fc80) - exploit attempt? (uid: 0)
[ 711.030902] BUG: Unable to handle kernel instruction fetch
[ 711.030905] Faulting instruction address: 0xc00000000a70fc80
[ 711.030909] Thread overran stack, or stack corrupted
[ 711.030913] Oops: Kernel access of bad area, sig: 11 [#1]
[ 711.030917] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 711.030924] Modules linked in: dm_mod nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables bonding libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto sch_fq_codel ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse
[ 711.030960] CPU: 31 PID: 165 Comm: migration/31 Not tainted 5.17.0-ge8833c5edc59 #1
[ 711.030965] Stopper: multi_cpu_stop+0x0/0x230 <- stop_machine_cpuslocked+0x188/0x1e0
[ 711.030977] NIP: c00000000a70fc80 LR: c00000000a70fc80 CTR: c000000000293f90
[ 711.030981] REGS: c00000000a70f9a0 TRAP: 0400 Not tainted (5.17.0-ge8833c5edc59)
[ 711.030986] MSR: 800000001280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48002822 XER: 00000000
[ 711.031001] CFAR: c000000000216628 IRQMASK: 0
[ 711.031001] GPR00: c00000000a70fc80 c00000000a70fc40 c000000002a1fe00 0000000000c57415
[ 711.031001] GPR04: 0000000000000000 c000000efa36ab80 c000000efa36ab70 c00000000001e688
[ 711.031001] GPR08: 0000000000000000 c000000efa3ef480 0000000000000000 c000000efa3ee600
[ 711.031001] GPR12: 0000000000000000 c000000effbe5a80 c00000000018fc98 c0000000072a5f80
[ 711.031001] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 711.031001] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 711.031001] GPR24: 0000000000000001 0000000000000002 0000000000000003 c000000002a62138
[ 711.031001] GPR28: c00000024224fb08 0000000000000001 c00000024224fb2c 0000000000000001
[ 711.031054] NIP [c00000000a70fc80] 0xc00000000a70fc80
[ 711.031058] LR [c00000000a70fc80] 0xc00000000a70fc80
[ 711.031062] Call Trace:
[ 711.031065] [c00000000a70fc40] [c00000000a70fc80] 0xc00000000a70fc80 (unreliable)
[ 711.031071] [c00000000a70fcb0] [c000000000293ce4] cpu_stopper_thread+0xe4/0x240
[ 711.031077] [c00000000a70fd60] [0000000119a59724] 0x119a59724
[ 711.031083] BUG: Unable to handle kernel data access on read at 0xc0000014ffffc000
[ 711.031088] Faulting instruction address: 0xc00000000001ccfc
[ 711.031091] Thread overran stack, or stack corrupted
[ 711.031093] Oops: Kernel access of bad area, sig: 11 [#2]
[ 711.031097] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 711.031101] Modules linked in: dm_mod nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables bonding libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto sch_fq_codel ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse
[ 711.031128] CPU: 31 PID: 165 Comm: Not tainted 5.17.0-ge8833c5edc59 #1
[ 711.031134] BUG: Unable to handle kernel data access at 0xc10000000214ab60
[ 711.031138] Faulting instruction address: 0xc000000000293e70
[ 711.031141] Thread overran stack, or stack corrupted
[ 711.031144] Oops: Kernel access of bad area, sig: 11 [#3]
………..
………..
In another instance I saw following crash in ibmveth
[ 714.823524] count-cache-flush: hardware flush enabled.
[ 714.823528] link-stack-flush: software flush enabled.
[ 714.828529] barrier-nospec: using ORI speculation barrier
[ 715.181552] ------------[ cut here ]------------
[ 715.181558] kernel BUG at drivers/net/ethernet/ibm/ibmveth.c:402!
[ 715.181563] Oops: Exception in kernel mode, sig: 5 [#1]
[ 715.181568] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 715.181572] Modules linked in: dm_mod nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables bonding libcrc32c nfnetlink sunrpc pseries_rng xts vmx_crypto sch_fq_codel ext4 mbcache jbd2 sd_mod t10_pi sg ibmvscsi ibmveth scsi_transport_srp fuse
[ 715.181604] CPU: 0 PID: 12 Comm: migration/0 Not tainted 5.17.0-ge8833c5edc59 #1
[ 715.181609] Stopper: multi_cpu_stop+0x0/0x230 <- stop_machine_cpuslocked+0x188/0x1e0
[ 715.181620] NIP: c008000000a91fdc LR: c000000000aca5d4 CTR: c008000000a91e48
[ 715.181624] REGS: c00000000772f300 TRAP: 0700 Not tainted (5.17.0-ge8833c5edc59)
[ 715.181628] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 42004422 XER: 00000000
[ 715.181640] CFAR: c008000000a91f14 IRQMASK: 0
[ 715.181640] GPR00: c000000000aca5d4 c00000000772f5a0 c008000000ac8000 c00000003a4c0a10
[ 715.181640] GPR04: 0000000000000010 000000002d890000 000000012d890000 0000000000000001
[ 715.181640] GPR08: c00000003a4c0a90 c00000005f4135a4 0000000000000000 c008000000a94858
[ 715.181640] GPR12: 0000000000004000 c000000002d20000 c00000000018fc98 c00000003a4c0a10
[ 715.181640] GPR16: 0000000000000101 0000000000000000 00000000000086dd 0000000000000004
[ 715.181640] GPR20: 000000000000dd86 0000000000000000 0000000000000080 000000000000003c
[ 715.181640] GPR24: 000000000000003c 0000000000000080 c00000003a4c0a00 0000000000000010
[ 715.181640] GPR28: 000000000000003c 0000000000000000 0000000000000000 c00000003a4c0000
[ 715.181695] NIP [c008000000a91fdc] ibmveth_poll+0x194/0x860 [ibmveth]
[ 715.181703] LR [c000000000aca5d4] __napi_poll+0x64/0x300
[ 715.181709] Call Trace:
[ 715.181711] [c00000000772f5a0] [c00000000772f5e0] 0xc00000000772f5e0 (unreliable)
[ 715.181718] [c00000000772f6a0] [c000000000aca5d4] __napi_poll+0x64/0x300
[ 715.181723] [c00000000772f720] [c000000000acadfc] net_rx_action+0x33c/0x3f0
[ 715.181729] [c00000000772f7e0] [c000000000d21a9c] __do_softirq+0x15c/0x3d0
[ 715.181737] [c00000000772f8d0] [c00000000015ecf8] irq_exit+0x178/0x1c0
[ 715.181743] [c00000000772f900] [c0000000000168fc] do_IRQ+0xfc/0x280
[ 715.181749] [c00000000772f930] [c0000000000090e8] hardware_interrupt_common_virt+0x218/0x220
[ 715.181757] --- interrupt: 500 at stop_machine_yield+0x8/0x10
[ 715.181762] NIP: c000000000293f88 LR: c0000000002940d8 CTR: c000000000293f90
[ 715.181766] REGS: c00000000772f9a0 TRAP: 0500 Not tainted (5.17.0-ge8833c5edc59)
[ 715.181770] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48004422 XER: 00000000
[ 715.181783] CFAR: 0000000000000000 IRQMASK: 0
[ 715.181783] GPR00: c0000000002940fc c00000000772fc40 c000000002a1fe00 c000000002a62138
[ 715.181783] GPR04: 0000000000000000 c000000ef900ab80 c000000ef900ab70 c00000000001e688
[ 715.181783] GPR08: 0000000000000000 c000000ef908f480 0000000000000000 000000000098967f
[ 715.181783] GPR12: 0000000000000000 c000000002d20000 c00000000018fc98 c0000000072a0f80
[ 715.181783] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 715.181783] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 715.181783] GPR24: 0000000000000001 0000000000000002 0000000000000003 c000000002a62138
[ 715.181783] GPR28: c00000024119faf8 0000000000000001 c00000024119fb1c 0000000000000001
[ 715.181836] NIP [c000000000293f88] stop_machine_yield+0x8/0x10
[ 715.181841] LR [c0000000002940d8] multi_cpu_stop+0x148/0x230
[ 715.181845] --- interrupt: 500
[ 715.181847] [c00000000772fc40] [c0000000002940fc] multi_cpu_stop+0x16c/0x230 (unreliable)
[ 715.181854] [c00000000772fcb0] [c000000000293ce4] cpu_stopper_thread+0xe4/0x240
[ 715.181859] [c00000000772fd60] [c000000000196114] smpboot_thread_fn+0x1e4/0x250
[ 715.181866] [c00000000772fdc0] [c00000000018fdb4] kthread+0x124/0x130
[ 715.181871] [c00000000772fe10] [c00000000000cf04] ret_from_kernel_thread+0x5c/0x64
[ 715.181877] Instruction dump:
[ 715.181880] 7ce89850 7b980020 7f9707b4 78e70fe0 0b070000 79083e24 78c50020 7d0f4214
[ 715.181890] 80e801b8 7ce72850 78e70fe0 68e70001 <0b070000> 2e2a0000 e94801e8 78c61f48
[ 715.181901] ---[ end trace 0000000000000000 ]—
The kernel eventually panics.
I have not been able to reliably recreate these crashes.
Have attached the relevant dmesg and crash logs from both the instances
(merge-crash-1.txt & merge-crash-2.txt)
- Sachin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: merge-crash-1.txt.gz
Type: application/x-gzip
Size: 11450 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20220324/950839b5/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: merge-crash-2.txt.gz
Type: application/x-gzip
Size: 9087 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20220324/950839b5/attachment-0003.bin>
More information about the Linuxppc-dev
mailing list