Re: rcutorture’s init segfaults in ppc64le VM
Paul Menzel
pmenzel at molgen.mpg.de
Sat Feb 12 01:19:42 AEDT 2022
Dear Michael,
Am 11.02.22 um 02:48 schrieb Michael Ellerman:
> Paul Menzel writes:
>> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>>> Paul Menzel writes:
>>
>> […]
>>
>>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>>> 5.17-rc2+ with rcutorture tests
>>>
>>> I'm not sure if that's the host kernel version or the version you're
>>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>>> the tree you're running rcutorture from?
>>
>> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
>> I am unable to find the exact sha1.
>>
>> $ more /proc/version
>> Linux version 5.17.0-rc1+ (x at eddb.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
>
> OK. In general rc1 kernels can have issues, so it might be worth
> rebooting the host into either v5.17-rc3 or a distro or stable kernel.
> Just to rule out any issues on the host.
Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel.
$ more /proc/version
Linux version 5.13.0-28-generic (buildd at bos02-ppc64el-013) (gcc
(Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37)
#31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022
I have to do more tests, but it could be LLVM/clang related.
>> The Linux tree, from where I run rcutorture from, is at commit
>> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>>
>> $ git log --oneline -6
>> 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures
>> 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>> a447541d925f ata: libata-sata: remove debounce delay by default
>> afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>> f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>> dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>>
>>>> $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>>
>>>> the built init
>>>>
>>>> $ file tools/testing/selftests/rcutorture/initrd/init
>>>> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>>>
>>> Mine looks pretty much identical:
>>>
>>> $ file tools/testing/selftests/rcutorture/initrd/init
>>> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>>>
>>>> segfaults in QEMU. From one of the log files
>>>
>>> But mine doesn't segfault, it runs fine and the test completes.
>>>
>>> What qemu version are you using?
>>>
>>> I tried 4.2.1 and 6.2.0, both worked.
>>
>> $ qemu-system-ppc64le --version
>> QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>> Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
>
> OK, that's one difference between our setups, but I'd be surprised if it
> explains this bug, but I guess anything's possible.
>
>>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>>
>> Sorry, that was the wrong path/test. The correct one for the excerpt
>> below is:
>>
>>
>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>>
>> (For TREE03, QEMU does not start the Linux kernel at all, that means no
>> output after:
>>
>> Booting Linux via __start() @ 0x0000000000400000 ...
>
> OK yeah I see that too.
>
> Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
> seems to fix it.
Nice find. I have no idea, what that means though.
> I still see some preempt related warnings, we clearly have some bugs
> with preempt enabled.
>
>> You can now download the content of
>> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
>> [1, 65 MB].
>>
>> Can you reproduce the segmentation fault with the line below?
>>
>> $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 \
>> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 \
>> -kernel /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux \
>> -append "debug_boot_weak_hash panic=-1 console=ttyS0 \
>> torture.disable_onoff_at_boot locktorture.onoff_interval=3 \
>> locktorture.onoff_holdoff=30 locktorture.stat_interval=15 \
>> locktorture.shutdown_secs=60 locktorture.verbose=1"
>
> That works fine for me, boots and runs the test, then shuts down.
>
> I assume you see the segfault on every boot, not intermittently?
>
> So the differences between our setups are the host kernel and the qemu
> version. Can you try a different host kernel easily?
>
> The other thing would be to try a different qemu version, you might need
> to build from source, but it's not that hard :)
Indeed. I needed to find a current Meson, but then it didn’t make a
difference, as found out above, it’s related to the Linux kernel.
Kind regards,
Paul
More information about the Linuxppc-dev
mailing list