Kernel panic from malloc() on SUSE 15.1?

Michael Ellerman mpe at ellerman.id.au
Thu Nov 5 21:19:22 AEDT 2020


Carl Jacobsen <cjacobsen at storix.com> writes:
> The panic (on a call to malloc from static linked libcrypto) looks like
> this:

Thanks.

This doesn't make a lot of sense.

> Bad kernel stack pointer 7fffffffeac0 at 700

"at 700" is the regs->nip value, and suggests we're trying to handle a
program check, which is either a trap or BUG or WARN, or illegal
instruction or several other things.

> Oops: Bad kernel stack pointer, sig: 6 [#1]
> SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp ip6t_rpfilter
> ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat
> ebtable_broute br_netfilter bridge stp llc ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
> nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
> x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
> xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
> ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> Supported: Yes, External
> CPU: 0 PID: 14144 Comm: rand_test_no_pt Tainted: G 4.12.14-197.18-default #1 SLE15-SP1
> task: c00000002fa23b80 task.stack: c000000032824000
> NIP: 0000000000000700 LR: 0000000010004ad0 CTR: 0000000000000000
> REGS: c00000001ec2fd40 TRAP: 0300   Tainted: G (4.12.14-197.18-default)

But then here it says TRAP = 0x300, which is != 0x700.

The trap number is hardcoded in the bad stack handling code, and I don't
see how we can end up with nip == 0x700 but the trap value == 0x300.

> MSR: 8000000000001000 <SF,ME> CR: 44000844  XER: 20000000

And here the MSR says you were in big endian mode, but you said before
your machine was ppc64le.

> CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
> GPR00: 0000000020000000 00007fffffffeac0 00000000102af788 fffffffffffffffd
> GPR04: 0000000000000020 0000000000000030 00000000102b0550 0000000000000001
> GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0520 800000010280f033
> GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 00007fffb7fef4b8
> GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000 00007fffffffeac0

The rest of the regs look like user space values, not kernel.

> NIP [0000000000000700] 0x700
> LR [0000000010004ad0] 0x10004ad0
> Call Trace:
> Instruction dump:
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
> ---[ end trace cc04515f274cfbf6 ]---


What hardware is this on?

Can you try booting with ppc_tm=off on the kernel command line, and see
if that changes anything?

Can you put your compiled test program up somewhere we can download it
and look at? Or post the disassembly?

cheers


More information about the Linuxppc-dev mailing list