❌ FAIL: Test report for kernel 5.3.13-3b5f971.cki (stable-queue)

Michael Ellerman mpe at ellerman.id.au
Mon Dec 2 16:46:40 AEDT 2019


Hi Jan,

Jan Stancek <jstancek at redhat.com> writes:
> ----- Original Message -----
>> 
>> Hello,
>> 
>> We ran automated tests on a recent commit from this kernel tree:
>> 
>>        Kernel repo:
>>        git://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git
>>             Commit: 3b5f97139acc - KVM: PPC: Book3S HV: Flush link stack on
>>             guest exit to host kernel

I can't find this commit, I assume it's roughly the same as:

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/?h=linux-5.3.y&id=0815f75f90178bc7e1933cf0d0c818b5f3f5a20c

>> The results of these automated tests are provided below.
>> 
>>     Overall result: FAILED (see details below)
>>              Merge: OK
>>            Compile: OK
>>              Tests: FAILED
>> 
>> All kernel binaries, config files, and logs are available for download here:
>> 
>>   https://artifacts.cki-project.org/pipelines/314344
>> 
>> One or more kernel tests failed:
>> 
>>     ppc64le:
>>      ❌ LTP
>
> I suspect kernel bug.

Looks that way, but I can't reproduce it on a machine here.

I have the same CPU revision and am booting the exact kernel binary &
modules linked above.

> There were couple of 'math' runtest related failures in recent couple days.
> In all cases, some data file used by test was missing. Presumably because
> binary that generates it crashed.
>
> I managed to reproduce one failure with this CKI build, which I believe
> is the same problem.
>
> We crash early during load, before any LTP code runs:
>
> (gdb) r
> Starting program: /mnt/testarea/ltp/testcases/bin/genasin

What is this /mnt/testarea? Looks like it's setup by some of the beaker
scripts or something?

I'm running LTP out of /home, which is ext4 directly on disk.

I tried getting the tests-beaker stuff working on my machine, but I
couldn't find all the libraries and so on it requires.


> Program received signal SIGBUS, Bus error.
> dl_main (phdr=0x10000040, phnum=<optimized out>, user_entry=0x7fffffffe760, auxv=<optimized out>) at rtld.c:1362
> 1362        switch (ph->p_type)
> (gdb) bt
> #0  dl_main (phdr=0x10000040, phnum=<optimized out>, user_entry=0x7fffffffe760, auxv=<optimized out>) at rtld.c:1362
> #1  0x00007ffff7fcf3c8 in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x7ffff7fb37b0 <dl_main>) at ../elf/dl-sysdep.c:253
> #2  0x00007ffff7fb1d1c in _dl_start_final (arg=arg at entry=0x7fffffffee20, info=info at entry=0x7fffffffe870) at rtld.c:445
> #3  0x00007ffff7fb2f5c in _dl_start (arg=0x7fffffffee20) at rtld.c:537
> #4  0x00007ffff7fb14d8 in _start () from /lib64/ld64.so.2
> (gdb) f 0
> #0  dl_main (phdr=0x10000040, phnum=<optimized out>, user_entry=0x7fffffffe760, auxv=<optimized out>) at rtld.c:1362
> 1362        switch (ph->p_type)
> (gdb) l
> 1357      /* And it was opened directly.  */
> 1358      ++main_map->l_direct_opencount;
> 1359
> 1360      /* Scan the program header table for the dynamic section.  */
> 1361      for (ph = phdr; ph < &phdr[phnum]; ++ph)
> 1362        switch (ph->p_type)
> 1363          {
> 1364          case PT_PHDR:
> 1365            /* Find out the load address.  */
> 1366            main_map->l_addr = (ElfW(Addr)) phdr - ph->p_vaddr;
>
> (gdb) p ph
> $1 = (const Elf64_Phdr *) 0x10000040
>
> (gdb) p *ph
> Cannot access memory at address 0x10000040
>
> (gdb) info proc map
> process 1110670
> Mapped address spaces:
>
>           Start Addr           End Addr       Size     Offset objfile
>           0x10000000         0x10010000    0x10000        0x0 /mnt/testarea/ltp/testcases/bin/genasin
>           0x10010000         0x10030000    0x20000        0x0 /mnt/testarea/ltp/testcases/bin/genasin
>       0x7ffff7f90000     0x7ffff7fb0000    0x20000        0x0 [vdso]
>       0x7ffff7fb0000     0x7ffff7fe0000    0x30000        0x0 /usr/lib64/ld-2.30.so
>       0x7ffff7fe0000     0x7ffff8000000    0x20000    0x20000 /usr/lib64/ld-2.30.so
>       0x7ffffffd0000     0x800000000000    0x30000        0x0 [stack]
>
> (gdb) x/1x 0x10000040
> 0x10000040:     Cannot access memory at address 0x10000040

Yeah that's weird.

> # /mnt/testarea/ltp/testcases/bin/genasin
> Bus error (core dumped)
>
> However, as soon as I copy that binary somewhere else, it works fine:
>
> # cp /mnt/testarea/ltp/testcases/bin/genasin /tmp
> # /tmp/genasin
> # echo $?
> 0

Is /tmp a real disk or tmpfs?

cheers

> # cp /mnt/testarea/ltp/testcases/bin/genasin /mnt/testarea/ltp/testcases/bin/genasin2
> # /mnt/testarea/ltp/testcases/bin/genasin2
> # echo $?
> 0
>
> # /mnt/testarea/ltp/testcases/bin/genasin
> Bus error (core dumped)
>
> # diff /mnt/testarea/ltp/testcases/bin/genasin /mnt/testarea/ltp/testcases/bin/genasin2; echo $?
> 0
>
> # lscpu
> Architecture:                    ppc64le
> Byte Order:                      Little Endian
> CPU(s):                          160
> On-line CPU(s) list:             0-159
> Thread(s) per core:              4
> Core(s) per socket:              20
> Socket(s):                       2
> NUMA node(s):                    2
> Model:                           2.2 (pvr 004e 1202)
> Model name:                      POWER9, altivec supported
> Frequency boost:                 enabled
> CPU max MHz:                     3800.0000
> CPU min MHz:                     2166.0000
> L1d cache:                       1.3 MiB
> L1i cache:                       1.3 MiB
> L2 cache:                        10 MiB
> L3 cache:                        200 MiB
> NUMA node0 CPU(s):               0-79
> NUMA node8 CPU(s):               80-159
> Vulnerability Itlb multihit:     Not affected
> Vulnerability L1tf:              Not affected
> Vulnerability Mds:               Not affected
> Vulnerability Meltdown:          Mitigation; RFI Flush, L1D private per thread
> Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
> Vulnerability Spectre v1:        Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
> Vulnerability Spectre v2:        Mitigation; Indirect branch cache disabled, Software link stack flush
> Vulnerability Tsx async abort:   Not affected


More information about the Linuxppc-dev mailing list