Re: rcutorture’s init segfaults in ppc64le VM

Paul Menzel pmenzel at molgen.mpg.de
Sat Feb 12 01:19:42 AEDT 2022


Dear Michael,


Am 11.02.22 um 02:48 schrieb Michael Ellerman:
> Paul Menzel writes:
>> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>>> Paul Menzel writes:
>>
>> […]
>>
>>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>>> 5.17-rc2+ with rcutorture tests
>>>
>>> I'm not sure if that's the host kernel version or the version you're
>>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>>> the tree you're running rcutorture from?
>>
>> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
>> I am unable to find the exact sha1.
>>
>>       $ more /proc/version
>>       Linux version 5.17.0-rc1+ (x at eddb.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
> 
> OK. In general rc1 kernels can have issues, so it might be worth
> rebooting the host into either v5.17-rc3 or a distro or stable kernel.
> Just to rule out any issues on the host.

Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel.

     $ more /proc/version
     Linux version 5.13.0-28-generic (buildd at bos02-ppc64el-013) (gcc 
(Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) 
#31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022

I have to do more tests, but it could be LLVM/clang related.

>> The Linux tree, from where I run rcutorture from, is at commit
>> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>>
>>       $ git log --oneline -6
>>       207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures
>>       8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>>       a447541d925f ata: libata-sata: remove debounce delay by default
>>       afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>>       f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>>       dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>>
>>>>        $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>>
>>>> the built init
>>>>
>>>>        $ file tools/testing/selftests/rcutorture/initrd/init
>>>>        tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>>>
>>> Mine looks pretty much identical:
>>>
>>>     $ file tools/testing/selftests/rcutorture/initrd/init
>>>     tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>>>
>>>> segfaults in QEMU. From one of the log files
>>>
>>> But mine doesn't segfault, it runs fine and the test completes.
>>>
>>> What qemu version are you using?
>>>
>>> I tried 4.2.1 and 6.2.0, both worked.
>>
>>       $ qemu-system-ppc64le --version
>>       QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>>       Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
> 
> OK, that's one difference between our setups, but I'd be surprised if it
> explains this bug, but I guess anything's possible.
> 
>>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>>
>> Sorry, that was the wrong path/test. The correct one for the excerpt
>> below is:
>>
>>   
>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>>
>> (For TREE03, QEMU does not start the Linux kernel at all, that means no
>> output after:
>>
>>       Booting Linux via __start() @ 0x0000000000400000 ...
> 
> OK yeah I see that too.
> 
> Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
> seems to fix it.

Nice find. I have no idea, what that means though.

> I still see some preempt related warnings, we clearly have some bugs
> with preempt enabled.
> 
>> You can now download the content of
>> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
>> [1, 65 MB].
>>
>> Can you reproduce the segmentation fault with the line below?
>>
>>       $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 \
>>       -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 \
>>       -kernel /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux \
>>       -append "debug_boot_weak_hash panic=-1 console=ttyS0 \
>>       torture.disable_onoff_at_boot locktorture.onoff_interval=3 \
>>       locktorture.onoff_holdoff=30 locktorture.stat_interval=15 \
>>       locktorture.shutdown_secs=60 locktorture.verbose=1"
> 
> That works fine for me, boots and runs the test, then shuts down.
> 
> I assume you see the segfault on every boot, not intermittently?
> 
> So the differences between our setups are the host kernel and the qemu
> version. Can you try a different host kernel easily?
> 
> The other thing would be to try a different qemu version, you might need
> to build from source, but it's not that hard :)

Indeed. I needed to find a current Meson, but then it didn’t make a 
difference, as found out above, it’s related to the Linux kernel.


Kind regards,

Paul


More information about the Linuxppc-dev mailing list