Re: rcutorture’s init segfaults in ppc64le VM
Paul Menzel
pmenzel at molgen.mpg.de
Tue Feb 8 23:12:40 AEDT 2022
Dear Michael,
Thank you for looking into this.
Am 08.02.22 um 11:09 schrieb Michael Ellerman:
> Paul Menzel writes:
[…]
>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>> 5.17-rc2+ with rcutorture tests
>
> I'm not sure if that's the host kernel version or the version you're
> using of rcutorture? Can you tell us the sha1 of your host kernel and of
> the tree you're running rcutorture from?
The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
I am unable to find the exact sha1.
$ more /proc/version
Linux version 5.17.0-rc1+
(pmenzel at flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu
clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
17:13:04 CET 2022
The Linux tree, from where I run rcutorture from, is at commit
dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
$ git log --oneline -6
207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems
with rcutorture on ppc64le: allmodconfig(2) and other failures
8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
a447541d925f ata: libata-sata: remove debounce delay by default
afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>> $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>
>> the built init
>>
>> $ file tools/testing/selftests/rcutorture/initrd/init
>> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>
> Mine looks pretty much identical:
>
> $ file tools/testing/selftests/rcutorture/initrd/init
> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>
>> segfaults in QEMU. From one of the log files
>
> But mine doesn't segfault, it runs fine and the test completes.
>
> What qemu version are you using?
>
> I tried 4.2.1 and 6.2.0, both worked.
$ qemu-system-ppc64le --version
QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
Sorry, that was the wrong path/test. The correct one for the excerpt
below is:
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
(For TREE03, QEMU does not start the Linux kernel at all, that means no
output after:
Booting Linux via __start() @ 0x0000000000400000 ...
)
>> [ 1.119803][ T1] Run /init as init process
>> [ 1.122011][ T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
>> [ 1.124863][ T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
>> [ 1.128823][ T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
>
> The disassembly from 3c40100f is:
> lis r2,4111
> addi r2,r2,31744
> mr r9,r1
> rldicr r1,r1,0,59
> li r0,0
> stdu r1,-128(r1) <- fault
> mtlr r0
> std r0,0(r1)
> ld r8,-32752(r2)
>
>
> I think you'll find that's the code at the ELF entry point. You can
> check with:
>
> $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
> Entry point address: 0x10000c0c
>
> $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c
> 10000c0c: 0e 10 40 3c lis r2,4110
> 10000c10: 00 7b 42 38 addi r2,r2,31488
> 10000c14: 78 0b 29 7c mr r9,r1
> 10000c18: e4 06 21 78 rldicr r1,r1,0,59
> 10000c1c: 00 00 00 38 li r0,0
> 10000c20: 81 ff 21 f8 stdu r1,-128(r1)
> 10000c24: a6 03 08 7c mtlr r0
> 10000c28: 00 00 01 f8 std r0,0(r1)
> 10000c2c: 10 80 02 e9 ld r8,-32752(r2)
>
> The fault you're seeing is the first store using the stack pointer (r1),
> which is setup by the kernel.
>
> The fault address f0656d90 is weirdly low, the stack should be up near 128TB.
>
> I'm not sure how we end up with a bad r1.
>
> Can you dump some info about the kernel that was built, something like:
>
> $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux
>
> And maybe paste/attach the full log, maybe there's a clue somewhere.
You can now download the content of
`/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
[1, 65 MB].
Can you reproduce the segmentation fault with the line below?
$ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8
-net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
stdio -m 512 -kernel
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
-append "debug_boot_weak_hash panic=-1 console=ttyS0
torture.disable_onoff_at_boot locktorture.onoff_interval=3
locktorture.onoff_holdoff=30 locktorture.stat_interval=15
locktorture.shutdown_secs=60 locktorture.verbose=1"
Kind regards,
Paul
[1]:
https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz
More information about the Linuxppc-dev
mailing list