rcutorture’s init segfaults in ppc64le VM
Michael Ellerman
mpe at ellerman.id.au
Fri Feb 11 12:48:23 AEDT 2022
Paul Menzel <pmenzel at molgen.mpg.de> writes:
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>> Paul Menzel writes:
>
> […]
>
>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>> 5.17-rc2+ with rcutorture tests
>>
>> I'm not sure if that's the host kernel version or the version you're
>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>> the tree you're running rcutorture from?
>
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
> I am unable to find the exact sha1.
>
> $ more /proc/version
> Linux version 5.17.0-rc1+
> (pmenzel at flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu
> clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> 17:13:04 CET 2022
OK. In general rc1 kernels can have issues, so it might be worth
rebooting the host into either v5.17-rc3 or a distro or stable kernel.
Just to rule out any issues on the host.
> The Linux tree, from where I run rcutorture from, is at commit
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>
> $ git log --oneline -6
> 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems
> with rcutorture on ppc64le: allmodconfig(2) and other failures
> 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
> a447541d925f ata: libata-sata: remove debounce delay by default
> afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
> f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
> dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>
>>> $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>
>>> the built init
>>>
>>> $ file tools/testing/selftests/rcutorture/initrd/init
>>> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>>
>> Mine looks pretty much identical:
>>
>> $ file tools/testing/selftests/rcutorture/initrd/init
>> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>>
>>> segfaults in QEMU. From one of the log files
>>
>> But mine doesn't segfault, it runs fine and the test completes.
>>
>> What qemu version are you using?
>>
>> I tried 4.2.1 and 6.2.0, both worked.
>
> $ qemu-system-ppc64le --version
> QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
> Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
OK, that's one difference between our setups, but I'd be surprised if it
explains this bug, but I guess anything's possible.
>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
> Sorry, that was the wrong path/test. The correct one for the excerpt
> below is:
>
>
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>
> (For TREE03, QEMU does not start the Linux kernel at all, that means no
> output after:
>
> Booting Linux via __start() @ 0x0000000000400000 ...
OK yeah I see that too.
Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
seems to fix it.
I still see some preempt related warnings, we clearly have some bugs
with preempt enabled.
> You can now download the content of
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
> [1, 65 MB].
>
> Can you reproduce the segmentation fault with the line below?
>
> $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
> stdio -m 512 -kernel
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
> -append "debug_boot_weak_hash panic=-1 console=ttyS0
> torture.disable_onoff_at_boot locktorture.onoff_interval=3
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15
> locktorture.shutdown_secs=60 locktorture.verbose=1"
That works fine for me, boots and runs the test, then shuts down.
I assume you see the segfault on every boot, not intermittently?
So the differences between our setups are the host kernel and the qemu
version. Can you try a different host kernel easily?
The other thing would be to try a different qemu version, you might need
to build from source, but it's not that hard :)
cheers
More information about the Linuxppc-dev
mailing list