Re: rcutorture’s init segfaults in ppc64le VM

Zhouyi Zhou zhouzhouyi at gmail.com
Thu Mar 10 13:37:12 AEDT 2022


Dear Paul

I try to reproduce the bug in ppc64 VM in Oregon State University
using the vmlinux extracted from
https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

the ppc64 VM in which I run the qemu without hardware acceleration is:
Linux version 5.4.0-100-generic (buildd at bos02-ppc64el-021) (gcc
version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb
3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)


The qemu command I use to test:
cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
$qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
-m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
console=ttyS0 rcutorture.onoff_interval=200
rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
rcutorture.verbose=1"

The console.log is uploaded to:
http://154.223.142.244/logs/20220310/console.paul.log
The log tells us it is illegal instruction that causes the trouble:
[    4.246387][    T1] init[1]: illegal instruction (4) at 1002c308
nip 1002c308 lr 10001684 code 1 in init[10000000+d0000]
[    4.251400][    T1] init[1]: code: f90d88c0 f92a0008 f9480008
7c2004ac 2c2d0000 f9490000 386d88d0 380000e8
[    4.253416][    T1] init[1]: code: 41820098 e92d8f98 75290010
4182008c <44000001> 2c2d0000 60000000 8902f438


Meanwhile, the vmlinux compiled by myself runs smoothly.

Then I modify mkinitrd.sh to let it panic manually:
http://154.223.142.244/logs/20220310/mkinitrd.sh
The log tells us it is a segfault (instead of a illegal instruction):
http://154.223.142.244/logs/20220310/console.zhouyi.log

Then I use gdb to debug the init in host:
ubuntu at zhouzhouyi-1:~/newkernel/linux-next$ gdb
tools/testing/selftests/rcutorture/initrd/init
(gdb) run
Starting program:
/home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init

Program received signal SIGSEGV, Segmentation fault.
0x0000000010000b2c in ?? ()
(gdb) x/10i $pc
=> 0x10000b2c:    stw     r9,0(r9)
   0x10000b30:    trap
   0x10000b34:    .long 0x0
   0x10000b38:    .long 0x0
   0x10000b3c:    .long 0x0
   0x10000b40:    lis     r2,4110
   0x10000b44:    addi    r2,r2,31488
   0x10000b48:    mr      r9,r1
   0x10000b4c:    rldicr  r1,r1,0,59
   0x10000b50:    li      r0,0
(gdb) p $r9
$1 = 0
(gdb) x/30x $pc - 0x30
0x10000afc:    0x38840040    0x387f0040    0xf8010040    0x48026919
0x10000b0c:    0x60000000    0xe8010040    0x7c0803a6    0x4bffff24
0x10000b1c:    0x00000000    0x01000000    0x00000180    0x39200000
0x10000b2c:    0x91290000    0x7fe00008    0x00000000    0x00000000
which matches the hex content of
http://154.223.142.244/logs/20220310/console.zhouyi.log:
[    5.077431][    T1] init[1]: segfault (11) at 0 nip 10000b2c lr
10001024 code 1 in init[10000000+d0000]
[    5.087167][    T1] init[1]: code: 38840040 387f0040 f8010040
48026919 60000000 e8010040 7c0803a6 4bffff24
[    5.093987][    T1] init[1]: code: 00000000 01000000 00000180
39200000 <91290000> 7fe00008 00000000 00000000


Conclusions: there might be something wrong when packing the init into
vmlinux in your environment.

I will continue to do research on this interesting problem with you.

Thanks
Kind Regards
Zhouyi



On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel <pmenzel at molgen.mpg.de> wrote:
>
> Dear Michael,
>
>
> Thank you for looking into this.
>
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
> > Paul Menzel writes:
>
> […]
>
> >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> >> 5.17-rc2+ with rcutorture tests
> >
> > I'm not sure if that's the host kernel version or the version you're
> > using of rcutorture? Can you tell us the sha1 of your host kernel and of
> > the tree you're running rcutorture from?
>
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
> I am unable to find the exact sha1.
>
>      $ more /proc/version
>      Linux version 5.17.0-rc1+
> (pmenzel at flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu
> clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> 17:13:04 CET 2022
>
> The Linux tree, from where I run rcutorture from, is at commit
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>
>      $ git log --oneline -6
>      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems
> with rcutorture on ppc64le: allmodconfig(2) and other failures
>      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>      a447541d925f ata: libata-sata: remove debounce delay by default
>      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>
> >>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> >>
> >> the built init
> >>
> >>       $ file tools/testing/selftests/rcutorture/initrd/init
> >>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
> >
> > Mine looks pretty much identical:
> >
> >    $ file tools/testing/selftests/rcutorture/initrd/init
> >    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
> >
> >> segfaults in QEMU. From one of the log files
> >
> > But mine doesn't segfault, it runs fine and the test completes.
> >
> > What qemu version are you using?
> >
> > I tried 4.2.1 and 6.2.0, both worked.
>
>      $ qemu-system-ppc64le --version
>      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
>
> >> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
> Sorry, that was the wrong path/test. The correct one for the excerpt
> below is:
>
>
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>
> (For TREE03, QEMU does not start the Linux kernel at all, that means no
> output after:
>
>      Booting Linux via __start() @ 0x0000000000400000 ...
> )
>
> >>       [    1.119803][    T1] Run /init as init process
> >>       [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
> >>       [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
> >>       [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
> >
> > The disassembly from 3c40100f is:
> >    lis     r2,4111
> >    addi    r2,r2,31744
> >    mr      r9,r1
> >    rldicr  r1,r1,0,59
> >    li      r0,0
> >    stdu    r1,-128(r1)                <- fault
> >    mtlr    r0
> >    std     r0,0(r1)
> >    ld      r8,-32752(r2)
> >
> >
> > I think you'll find that's the code at the ELF entry point. You can
> > check with:
> >
> >   $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
> >     Entry point address:               0x10000c0c
> >
> >   $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c
> >      10000c0c:   0e 10 40 3c     lis     r2,4110
> >      10000c10:   00 7b 42 38     addi    r2,r2,31488
> >      10000c14:   78 0b 29 7c     mr      r9,r1
> >      10000c18:   e4 06 21 78     rldicr  r1,r1,0,59
> >      10000c1c:   00 00 00 38     li      r0,0
> >      10000c20:   81 ff 21 f8     stdu    r1,-128(r1)
> >      10000c24:   a6 03 08 7c     mtlr    r0
> >      10000c28:   00 00 01 f8     std     r0,0(r1)
> >      10000c2c:   10 80 02 e9     ld      r8,-32752(r2)
> >
> > The fault you're seeing is the first store using the stack pointer (r1),
> > which is setup by the kernel.
> >
> > The fault address f0656d90 is weirdly low, the stack should be up near 128TB.
> >
> > I'm not sure how we end up with a bad r1.
> >
> > Can you dump some info about the kernel that was built, something like:
> >
> > $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux
> >
> > And maybe paste/attach the full log, maybe there's a clue somewhere.
>
> You can now download the content of
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
> [1, 65 MB].
>
> Can you reproduce the segmentation fault with the line below?
>
>      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
> stdio -m 512 -kernel
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
> -append "debug_boot_weak_hash panic=-1 console=ttyS0
> torture.disable_onoff_at_boot locktorture.onoff_interval=3
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15
> locktorture.shutdown_secs=60 locktorture.verbose=1"
>
>
> Kind regards,
>
> Paul
>
>
> [1]:
> https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz


More information about the Linuxppc-dev mailing list