Kernel panic from malloc() on SUSE 15.1?

Carl Jacobsen cjacobsen at storix.com
Fri Nov 6 06:44:06 AEDT 2020


On Thu, Nov 5, 2020 at 2:19 AM Michael Ellerman <mpe at ellerman.id.au> wrote:

> Carl Jacobsen <cjacobsen at storix.com> writes:
> > The panic (on a call to malloc from static linked libcrypto) looks like
> > this:
>
> What hardware is this on?
>

Thank you for looking into this.

The system that's panicking identifies like this:
    # uname -a
    Linux sl151pwr8 4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC
2019
    (d75059b) ppc64le ppc64le ppc64le GNU/Linux
    #
    # cat /etc/os-release
    NAME="SLES"
    VERSION="15-SP1"
    VERSION_ID="15.1"
    PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
    ID="sles"
    ID_LIKE="suse"
    ANSI_COLOR="0;32"
    CPE_NAME="cpe:/o:suse:sles:15:sp1"

The system is an LPAR running under PowerVM vios version 2.2.3.4.
The underlying hardware is machine type-model 8284-22A.


> Can you try booting with ppc_tm=off on the kernel command line, and see
> if that changes anything?
>

Yes. Output is down below. Doesn't appear to change much, but I don't have
the background to interpret the registers.


> Can you put your compiled test program up somewhere we can download it
> and look at? Or post the disassembly?
>

Here's the source file:
    https://www.storix.com/download/support/misc/rand_test.c

Here's the resulting executable:
    https://www.storix.com/download/support/misc/rand_test

Executable is linked to libcrypto from openssl-1.1.1g, configured with:
    ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static

Executable is built (on SUSE 12) with:
    gcc -ggdb3 -o rand_test rand_test.c libcrypto.a


And running the executable (on SUSE 15.1) through gdb goes like this:

    # gdb --args ./rand_test
    GNU gdb (GDB; SUSE Linux Enterprise 15) 8.3.1
    << snip intro text >>
    Reading symbols from ./rand_test...
    (gdb) b main
    Breakpoint 1 at 0x1000288c: file rand_test.c, line 6.
    (gdb) r
    Starting program: /tmp/ossl/rand_test

    Breakpoint 1, main (argc=1, argv=0x7ffffffff798) at rand_test.c:6
    6           int has_enough_data = RAND_status();
    (gdb) s
    RAND_status () at crypto/rand/rand_lib.c:958
    958         const RAND_METHOD *meth = RAND_get_rand_method();
    (gdb)
    RAND_get_rand_method () at crypto/rand/rand_lib.c:844
    844         const RAND_METHOD *tmp_meth = NULL;
    (gdb)
    846         if (!RUN_ONCE(&rand_init, do_rand_init))
    (gdb)
    CRYPTO_THREAD_run_once (once=0x102a7d88 <rand_init>,
init=0x10002f30 <do_rand_init_ossl_>) at crypto/threads_none.c:67
    67          if (*once != 0)
    (gdb)
    70          init();
    (gdb)
    do_rand_init_ossl_ () at crypto/rand/rand_lib.c:306
    306     DEFINE_RUN_ONCE_STATIC(do_rand_init)
    (gdb)
    do_rand_init () at crypto/rand/rand_lib.c:309
    309         rand_engine_lock = CRYPTO_THREAD_lock_new();
    (gdb)
    CRYPTO_THREAD_lock_new () at crypto/threads_none.c:24
    24          if ((lock = OPENSSL_zalloc(sizeof(unsigned int))) == NULL) {
    (gdb)
    CRYPTO_zalloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24)
at crypto/mem.c:230
    230         void *ret = CRYPTO_malloc(num, file, line);
    (gdb)
    CRYPTO_malloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24)
at crypto/mem.c:194
    194         void *ret = NULL;
    (gdb)
    197         if (malloc_impl != NULL && malloc_impl != CRYPTO_malloc)
    (gdb)
    200         if (num == 0)
    (gdb)
    204         if (allow_customize) {
    (gdb)
    210             allow_customize = 0;
    (gdb)
    222         ret = malloc(num);
    (gdb)
    Bad kernel stack pointer 7fffffffef20 at 700
    Oops: Bad kernel stack pointer, sig: 6 [#1]
    SMP NR_CPUS=2048
    NUMA
    pSeries
    Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp
ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute br_netfilter bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
    Supported: Yes, External
    CPU: 4 PID: 3082 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    task: c00000002e226100 task.stack: c0000000387c8000
    NIP: 0000000000000700 LR: 0000000010004acc CTR: 0000000000000000
    REGS: c00000001ebffd40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    MSR: 8000000000001000 <SF,ME>
      CR: 44000844  XER: 20000000
    CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
    GPR00: 0000000020000000 00007fffffffef20 00000000102af788
fffffffffffffffd
    GPR04: 0000000000000020 0000000000000030 00000000102b0760
0000000000000001
    GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000010280f033
    GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000
0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR24: 0000000000000000 0000000000000000 0000000000000000
00007fffb7fef4b8
    GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000
00007fffffffef20
    NIP [0000000000000700] 0x700
    LR [0000000010004acc] 0x10004acc
    Call Trace:
    Instruction dump:
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
    ---[ end trace 167d5d3b2e8a06e9 ]---

    Sending IPI to other CPUs
    IPI complete
    kexec: Starting switchover sequence.
    I'm in purgatory
     -> smp_release_cpus()
    spinning_secondaries = 0
     <- smp_release_cpus()
    Kernel panic - not syncing: Out of memory and no killable processes...

    CPU: 4 PID: 1 Comm: swapper/4 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
    Call Trace:
    [c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
    [c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
    [c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
    [c000000012457380] [c0000000082f7ed4]
__alloc_pages_nodemask+0x1004/0x10b0
    [c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
    [c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
    [c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
    [c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
    [c0000000124576b0] [c0000000082e8884]
grab_cache_page_write_begin+0x34/0x70
    [c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
    [c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
    [c0000000124577b0] [c0000000082ea2e0]
__generic_file_write_iter+0x250/0x2a0
    [c000000012457810] [c0000000082ea53c]
generic_file_write_iter+0x20c/0x2e0
    [c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
    [c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
    [c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
    [c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
    [c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
    [c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
    [c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
    [c000000012457a70] [c000000008d62268] unxz+0x210/0x398
    [c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
    [c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
    [c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
    [c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
    [c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
    [c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
    ------------[ cut here ]------------


Doing the same thing but with ppc_tm=off...
    # cat /proc/cmdline
    BOOT_IMAGE=/boot/vmlinux-4.12.14-197.18-default
root=UUID=0e795e37-3692-465a-a037-c2935a9fde7a mitigations=auto quiet
crashkernel=197M ppc_tm=off


Results in a panic at the same point, with a few registers changed:

    << snip down to panic at malloc >>
    (gdb)
    Bad kernel stack pointer 7fffffffef20 at 700
    Oops: Bad kernel stack pointer, sig: 6 [#1]
    SMP NR_CPUS=2048
    NUMA
    pSeries
    Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp
ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute br_netfilter bridge stp llc ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
    Supported: Yes, External
    CPU: 2 PID: 3079 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    task: c00000002f6bcc00 task.stack: c0000000321fc000
    NIP: 0000000000000700 LR: 0000000010004acc CTR: 0000000000000000
    REGS: c00000001ec17d40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    MSR: 8000000000001000 <SF,ME>
      CR: 44000844  XER: 20000000
    CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
    GPR00: 0000000020000000 00007fffffffef20 00000000102af788
fffffffffffffffd
    GPR04: 0000000000000020 0000000000000030 00000000102b0760
0000000000000001
    GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000000280f033
    GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000
0000000000000000
    GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR20: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
    GPR24: 0000000000000000 0000000000000000 0000000000000000
00007fffb7fef4b8
    GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000
00007fffffffef20
    NIP [0000000000000700] 0x700
    LR [0000000010004acc] 0x10004acc
    Call Trace:
    Instruction dump:
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
    ---[ end trace 436f626dd098548c ]---

    Sending IPI to other CPUs
    IPI complete
    kexec: Starting switchover sequence.
    I'm in purgatory
     -> smp_release_cpus()
    spinning_secondaries = 0
     <- smp_release_cpus()
    Kernel panic - not syncing: Out of memory and no killable processes...

    CPU: 2 PID: 1 Comm: swapper/2 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
    Call Trace:
    [c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
    [c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
    [c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
    [c000000012457380] [c0000000082f7ed4]
__alloc_pages_nodemask+0x1004/0x10b0
    [c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
    [c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
    [c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
    [c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
    [c0000000124576b0] [c0000000082e8884]
grab_cache_page_write_begin+0x34/0x70
    [c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
    [c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
    [c0000000124577b0] [c0000000082ea2e0]
__generic_file_write_iter+0x250/0x2a0
    [c000000012457810] [c0000000082ea53c]
generic_file_write_iter+0x20c/0x2e0
    [c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
    [c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
    [c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
    [c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
    [c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
    [c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
    [c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
    [c000000012457a70] [c000000008d62268] unxz+0x210/0x398
    [c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
    [c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
    [c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
    [c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
    [c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
    [c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
    ------------[ cut here ]------------


Diffing the panic output looks like this (highlighting register changes?):

    74,75c79,80
    < CPU: 4 PID: 3082 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    < task: c00000002e226100 task.stack: c0000000387c8000
    ---
    > CPU: 2 PID: 3079 Comm: rand_test Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
    > task: c00000002f6bcc00 task.stack: c0000000321fc000
    77c82
    < REGS: c00000001ebffd40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    ---
    > REGS: c00000001ec17d40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
    83c88
    < GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000010280f033
    ---
    > GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0730
800000000280f033
    95c100
    < ---[ end trace 167d5d3b2e8a06e9 ]---
    ---
    > ---[ end trace 436f626dd098548c ]---
    106c111
    < CPU: 4 PID: 1 Comm: swapper/4 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
    ---
    > CPU: 2 PID: 1 Comm: swapper/2 Not tainted 4.12.14-197.18-default #1
SLE15-SP1

-- 
Carl Jacobsen
Storix, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20201105/9cbc820f/attachment-0001.htm>


More information about the Linuxppc-dev mailing list