rcutorture’s init segfaults in ppc64le VM

Michael Ellerman mpe at ellerman.id.au
Fri Feb 11 12:48:23 AEDT 2022


Paul Menzel <pmenzel at molgen.mpg.de> writes:
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>> Paul Menzel writes:
>
> […]
>
>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>> 5.17-rc2+ with rcutorture tests
>> 
>> I'm not sure if that's the host kernel version or the version you're
>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>> the tree you're running rcutorture from?
>
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
> I am unable to find the exact sha1.
>
>      $ more /proc/version
>      Linux version 5.17.0-rc1+ 
> (pmenzel at flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu 
> clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> 17:13:04 CET 2022

OK. In general rc1 kernels can have issues, so it might be worth
rebooting the host into either v5.17-rc3 or a distro or stable kernel.
Just to rule out any issues on the host.

> The Linux tree, from where I run rcutorture from, is at commit 
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>
>      $ git log --oneline -6
>      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems 
> with rcutorture on ppc64le: allmodconfig(2) and other failures
>      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>      a447541d925f ata: libata-sata: remove debounce delay by default
>      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>
>>>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>
>>> the built init
>>>
>>>       $ file tools/testing/selftests/rcutorture/initrd/init
>>>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>> 
>> Mine looks pretty much identical:
>> 
>>    $ file tools/testing/selftests/rcutorture/initrd/init
>>    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>> 
>>> segfaults in QEMU. From one of the log files
>> 
>> But mine doesn't segfault, it runs fine and the test completes.
>> 
>> What qemu version are you using?
>> 
>> I tried 4.2.1 and 6.2.0, both worked.
>
>      $ qemu-system-ppc64le --version
>      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

OK, that's one difference between our setups, but I'd be surprised if it
explains this bug, but I guess anything's possible.


>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
> Sorry, that was the wrong path/test. The correct one for the excerpt 
> below is:
>
>  
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>
> (For TREE03, QEMU does not start the Linux kernel at all, that means no 
> output after:
>
>      Booting Linux via __start() @ 0x0000000000400000 ...

OK yeah I see that too.

Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
seems to fix it.

I still see some preempt related warnings, we clearly have some bugs
with preempt enabled.

> You can now download the content of 
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` 
> [1, 65 MB].
>
> Can you reproduce the segmentation fault with the line below?
>
>      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
> stdio -m 512 -kernel 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux 
> -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> torture.disable_onoff_at_boot locktorture.onoff_interval=3 
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15 
> locktorture.shutdown_secs=60 locktorture.verbose=1"

That works fine for me, boots and runs the test, then shuts down.

I assume you see the segfault on every boot, not intermittently?

So the differences between our setups are the host kernel and the qemu
version. Can you try a different host kernel easily?

The other thing would be to try a different qemu version, you might need
to build from source, but it's not that hard :)

cheers


More information about the Linuxppc-dev mailing list