Kernel panic from malloc() on SUSE 15.1?
Michael Ellerman
mpe at ellerman.id.au
Fri Nov 6 23:25:38 AEDT 2020
Carl Jacobsen <cjacobsen at storix.com> writes:
> On Thu, Nov 5, 2020 at 2:19 AM Michael Ellerman <mpe at ellerman.id.au> wrote:
>
>> Carl Jacobsen <cjacobsen at storix.com> writes:
>> > The panic (on a call to malloc from static linked libcrypto) looks like
>> > this:
>>
>> What hardware is this on?
>>
>
> Thank you for looking into this.
>
> The system that's panicking identifies like this:
> # uname -a
> Linux sl151pwr8 4.12.14-197.18-default #1 SMP Tue Sep 17 14:26:49 UTC
> 2019
> (d75059b) ppc64le ppc64le ppc64le GNU/Linux
> #
> # cat /etc/os-release
> NAME="SLES"
> VERSION="15-SP1"
> VERSION_ID="15.1"
> PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
> ID="sles"
> ID_LIKE="suse"
> ANSI_COLOR="0;32"
> CPE_NAME="cpe:/o:suse:sles:15:sp1"
>
> The system is an LPAR running under PowerVM vios version 2.2.3.4.
> The underlying hardware is machine type-model 8284-22A.
OK thanks. That's a Power8.
>> Can you try booting with ppc_tm=off on the kernel command line, and see
>> if that changes anything?
>
> Yes. Output is down below. Doesn't appear to change much, but I don't have
> the background to interpret the registers.
Yeah looks like that's not the problem.
>> Can you put your compiled test program up somewhere we can download it
>> and look at? Or post the disassembly?
>>
>
> Here's the source file:
> https://www.storix.com/download/support/misc/rand_test.c
>
> Here's the resulting executable:
> https://www.storix.com/download/support/misc/rand_test
Thanks.
So something seems to have gone wrong linking this, I see eg:
0000000010004a8c <syscall_random>:
10004a8c: 2b 10 40 3c lis r2,4139
10004a90: 88 f7 42 38 addi r2,r2,-2168
10004a94: a6 02 08 7c mflr r0
10004a98: 10 00 01 f8 std r0,16(r1)
10004a9c: f8 ff e1 fb std r31,-8(r1)
10004aa0: 81 ff 21 f8 stdu r1,-128(r1)
10004aa4: 78 0b 3f 7c mr r31,r1
10004aa8: 60 00 7f f8 std r3,96(r31)
10004aac: 68 00 9f f8 std r4,104(r31)
10004ab0: 00 00 00 60 nop
10004ab4: 30 80 22 e9 ld r9,-32720(r2)
10004ab8: 00 00 a9 2f cmpdi cr7,r9,0
10004abc: 30 00 9e 41 beq cr7,10004aec <syscall_random+0x60>
10004ac0: 60 00 7f e8 ld r3,96(r31)
10004ac4: 68 00 9f e8 ld r4,104(r31)
10004ac8: 39 b5 ff 4b bl 10000000 <_init-0x1f00>
Notice that last bl (branch and link) to 0x10000000. But there's no text
at 0x10000000, that's the start of the page which happens to be the ELF
magic.
I've seen something like this before, but I can't remember when/where so
I haven't been able to track down what the problem was.
Anyway hopefully someone on the list will know.
That still doesn't explain the kernel crash though.
> Executable is linked to libcrypto from openssl-1.1.1g, configured with:
> ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
>
> Executable is built (on SUSE 12) with:
> gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
> And running the executable (on SUSE 15.1) through gdb goes like this:
>
> # gdb --args ./rand_test
> GNU gdb (GDB; SUSE Linux Enterprise 15) 8.3.1
> << snip intro text >>
> Reading symbols from ./rand_test...
> (gdb) b main
> Breakpoint 1 at 0x1000288c: file rand_test.c, line 6.
> (gdb) r
> Starting program: /tmp/ossl/rand_test
>
> Breakpoint 1, main (argc=1, argv=0x7ffffffff798) at rand_test.c:6
> 6 int has_enough_data = RAND_status();
> (gdb) s
> RAND_status () at crypto/rand/rand_lib.c:958
> 958 const RAND_METHOD *meth = RAND_get_rand_method();
> (gdb)
> RAND_get_rand_method () at crypto/rand/rand_lib.c:844
> 844 const RAND_METHOD *tmp_meth = NULL;
> (gdb)
> 846 if (!RUN_ONCE(&rand_init, do_rand_init))
> (gdb)
> CRYPTO_THREAD_run_once (once=0x102a7d88 <rand_init>, > init=0x10002f30 <do_rand_init_ossl_>) at crypto/threads_none.c:67
> 67 if (*once != 0)
> (gdb)
> 70 init();
> (gdb)
> do_rand_init_ossl_ () at crypto/rand/rand_lib.c:306
> 306 DEFINE_RUN_ONCE_STATIC(do_rand_init)
> (gdb)
> do_rand_init () at crypto/rand/rand_lib.c:309
> 309 rand_engine_lock = CRYPTO_THREAD_lock_new();
> (gdb)
> CRYPTO_THREAD_lock_new () at crypto/threads_none.c:24
> 24 if ((lock = OPENSSL_zalloc(sizeof(unsigned int))) == NULL) {
> (gdb)
> CRYPTO_zalloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:230
> 230 void *ret = CRYPTO_malloc(num, file, line);
> (gdb)
> CRYPTO_malloc (num=4, file=0x1023a500 "crypto/threads_none.c", line=24) > at crypto/mem.c:194
> 194 void *ret = NULL;
> (gdb)
> 197 if (malloc_impl != NULL && malloc_impl != CRYPTO_malloc)
> (gdb)
> 200 if (num == 0)
> (gdb)
> 204 if (allow_customize) {
> (gdb)
> 210 allow_customize = 0;
> (gdb)
> 222 ret = malloc(num);
> (gdb)
> Bad kernel stack pointer 7fffffffef20 at 700
On my machine it doesn't crash the kernel, so I can catch it later. For
me it's here:
Program received signal SIGILL, Illegal instruction.
0x0000000010000004 in ?? ()
(gdb) bt
#0 0x0000000010000004 in ?? ()
#1 0x0000000010004acc in syscall_random (buf=0x102b0730, buflen=32)
at crypto/rand/rand_unix.c:371
#2 0x00000000100053fc in rand_pool_acquire_entropy (pool=0x102b06e0)
at crypto/rand/rand_unix.c:636
#3 0x0000000010002b58 in rand_drbg_get_entropy (drbg=0x102b02e0,
pout=0x7ffffffff3f0, entropy=256, min_len=32, max_len=2147483647,
prediction_resistance=0) at crypto/rand/rand_lib.c:198
#4 0x000000001001ed9c in RAND_DRBG_instantiate (drbg=0x102b02e0,
pers=0x10248d00 <ossl_pers_string> "OpenSSL NIST SP 800-90A DRBG",
perslen=28) at crypto/rand/drbg_lib.c:338
#5 0x0000000010020300 in drbg_setup (parent=0x0) at crypto/rand/drbg_lib.c:895
#6 0x0000000010020414 in do_rand_drbg_init () at crypto/rand/drbg_lib.c:924
#7 0x000000001002034c in do_rand_drbg_init_ossl_ ()
at crypto/rand/drbg_lib.c:909
#8 0x0000000010005d1c in CRYPTO_THREAD_run_once (
once=0x102ab4d8 <rand_drbg_init>,
init=0x1002032c <do_rand_drbg_init_ossl_>) at crypto/threads_none.c:70
#9 0x00000000100209c4 in RAND_DRBG_get0_master ()
at crypto/rand/drbg_lib.c:1102
#10 0x0000000010020914 in drbg_status () at crypto/rand/drbg_lib.c:1084
#11 0x0000000010004a58 in RAND_status () at crypto/rand/rand_lib.c:961
#12 0x0000000010002890 in main (argc=1, argv=0x7ffffffffa68) at rand_test.c:6
(gdb)
ie. in the syscall_random() that I mentioned above.
You should be able to catch it there too if you do:
(gdb) b *0x10000000
(gdb) r
Hopefully it will stop without crashing the kernel, and then a `bt` will
show that you're in the same place as me.
If you can get that to work, when you're stopped there, can you do an
`info registers` and send us the output.
cheers
More information about the Linuxppc-dev
mailing list