Re: rcutorture’s init segfaults in ppc64le VM

Paul Menzel pmenzel at molgen.mpg.de
Tue Feb 8 23:27:52 AEDT 2022


[Correct sha1 for test for 2022.02.01-21.52.37]


Am 08.02.22 um 13:12 schrieb Paul Menzel:
> Dear Michael,
> 
> 
> Thank you for looking into this.
> 
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>> Paul Menzel writes:
> 
> […]
> 
>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>> 5.17-rc2+ with rcutorture tests
>>
>> I'm not sure if that's the host kernel version or the version you're
>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>> the tree you're running rcutorture from?
> 
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
> I am unable to find the exact sha1.
> 
>      $ more /proc/version
>      Linux version 5.17.0-rc1+ (pmenzel at flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
> 
> The Linux tree, from where I run rcutorture from, is at commit 
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
> 
>      $ git log --oneline -6
>      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures
>      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>      a447541d925f ata: libata-sata: remove debounce delay by default
>      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3

I was able to reproduce this with the above, but the report and the 
attached logs at the end are from:

     $ git log --oneline -6 b37a34a8cf5a
     b37a34a8cf5a Problems with rcutorture on ppc64le: allmodconfig(2) 
and other failures
     9a78ddead89a ata: libata-sata: improve sata_link_debounce()
     567da2eaf099 ata: libata-sata: remove debounce delay by default
     70ae61851660 ata: libata-sata: introduce struct sata_deb_timing
     9ebb6433d9c3 ata: libata-sata: Simplify sata_link_resume() interface
     26291c54e111 (tag: v5.17-rc2) Linux 5.17-rc2

>>>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>
>>> the built init
>>>
>>>       $ file tools/testing/selftests/rcutorture/initrd/init
>>>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>>
>> Mine looks pretty much identical:
>>
>>    $ file tools/testing/selftests/rcutorture/initrd/init
>>    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>>
>>> segfaults in QEMU. From one of the log files
>>
>> But mine doesn't segfault, it runs fine and the test completes.
>>
>> What qemu version are you using?
>>
>> I tried 4.2.1 and 6.2.0, both worked.
> 
>      $ qemu-system-ppc64le --version
>      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project 
> developers
> 
>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log 
>>>
> 
> Sorry, that was the wrong path/test. The correct one for the excerpt 
> below is:
> 
> 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log 
> 
> 
> (For TREE03, QEMU does not start the Linux kernel at all, that means no 
> output after:
> 
>      Booting Linux via __start() @ 0x0000000000400000 ...
> )
> 
>>>       [    1.119803][    T1] Run /init as init process
>>>       [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
>>>       [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
>>>       [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
>>
>> The disassembly from 3c40100f is:
>>    lis     r2,4111
>>    addi    r2,r2,31744
>>    mr      r9,r1
>>    rldicr  r1,r1,0,59
>>    li      r0,0
>>    stdu    r1,-128(r1)        <- fault
>>    mtlr    r0
>>    std     r0,0(r1)
>>    ld      r8,-32752(r2)
>>
>>
>> I think you'll find that's the code at the ELF entry point. You can
>> check with:
>>
>>   $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep 
>> Entry
>>     Entry point address:               0x10000c0c
>>
>>   $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep 
>> -m 1 -A 8 10000c0c
>>      10000c0c:   0e 10 40 3c     lis     r2,4110
>>      10000c10:   00 7b 42 38     addi    r2,r2,31488
>>      10000c14:   78 0b 29 7c     mr      r9,r1
>>      10000c18:   e4 06 21 78     rldicr  r1,r1,0,59
>>      10000c1c:   00 00 00 38     li      r0,0
>>      10000c20:   81 ff 21 f8     stdu    r1,-128(r1)
>>      10000c24:   a6 03 08 7c     mtlr    r0
>>      10000c28:   00 00 01 f8     std     r0,0(r1)
>>      10000c2c:   10 80 02 e9     ld      r8,-32752(r2)
>>
>> The fault you're seeing is the first store using the stack pointer (r1),
>> which is setup by the kernel.
>>
>> The fault address f0656d90 is weirdly low, the stack should be up near 
>> 128TB.
>>
>> I'm not sure how we end up with a bad r1.
>>
>> Can you dump some info about the kernel that was built, something like:
>>
>> $ file 
>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux 
>>
>> And maybe paste/attach the full log, maybe there's a clue somewhere.
> 
> You can now download the content of 
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` 
> [1, 65 MB].
> 
> Can you reproduce the segmentation fault with the line below?
> 
>      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
> stdio -m 512 -kernel 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux 
> -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> torture.disable_onoff_at_boot locktorture.onoff_interval=3 
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15 
> locktorture.shutdown_secs=60 locktorture.verbose=1"
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> [1]: https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz


More information about the Linuxppc-dev mailing list