[Bug 206669] Little-endian kernel crashing on POWER8 on heavy big-endian PowerKVM load

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Wed Feb 26 18:26:31 AEDT 2020


https://bugzilla.kernel.org/show_bug.cgi?id=206669

--- Comment #2 from John Paul Adrian Glaubitz (glaubitz at physik.fu-berlin.de) ---
(In reply to npiggin from comment #1)
> Thanks for the report, we need to get more data about the first BUG if 
> we can. What function in your vmlinux contains address 
> 0xc00000000017a778? (use nm or objdump etc)

Seems to be t select_task_rq_fair:

root at watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5 c00000000017a
c000000000448550 T select_estimate_accuracy
c000000000170d20 t select_fallback_rq
c000000000e4c940 D select_idle_mask
c000000000179f10 t select_idle_sibling
c00000000018fd80 t select_task_rq_dl
c00000000017a640 t select_task_rq_fair
c000000000177f50 t select_task_rq_idle
c00000000018c9e0 t select_task_rq_rt
c00000000019c800 t select_task_rq_stop
c000000000927710 t selem_alloc.isra.6
c000000000926e50 t selem_link_map
root at watson:/boot#

> Is that the first message you
> get,
> No warnings or anything else earlier in the dmesg?

Correct. You can see the login prompt of the host VM watson directly after
booting up.

> Also 0xc0000000002659a0 would be interesting.

Looks like that's ring_buffer_record_off:

root at watson:/boot# nm vmlinux-5.4.0-0.bpo.3-powerpc64le |grep -C5
c0000000002659
c0000000002667e0 T ring_buffer_read_finish
c00000000026b4b0 T ring_buffer_read_page
c000000000265e10 T ring_buffer_read_prepare
c000000000265ef0 T ring_buffer_read_prepare_sync
c000000000269ae0 T ring_buffer_read_start
c000000000265950 T ring_buffer_record_disable
c000000000266070 T ring_buffer_record_disable_cpu
c000000000265970 T ring_buffer_record_enable
c0000000002660c0 T ring_buffer_record_enable_cpu
c00000000026d470 T ring_buffer_record_is_on
c00000000026d480 T ring_buffer_record_is_set_on
c000000000265990 T ring_buffer_record_off
c000000000265a10 T ring_buffer_record_on
c000000000266da0 T ring_buffer_reset
c000000000266a90 T ring_buffer_reset_cpu
c000000000267cd0 T ring_buffer_resize
c00000000026d400 T ring_buffer_set_clock
root at watson:/boot#

FWIW, the kernel image comes from this Debian package:

>
> http://snapshot.debian.org/archive/debian/20200211T210433Z/pool/main/l/linux/linux-image-5.4.0-0.bpo.3-powerpc64le_5.4.13-1%7Ebpo10%2B1_ppc64el.deb

> When reproducing, do you ever get a clean trace of the first bug?

I have logged everything that showed in the console during and after the crash.
After that, the machine no longer responds and has to be hard-resetted.

> Could you try setting /proc/sys/kernel/panic_on_oops and reproducing?

I will try that.

Anything to be considered for the kernel running inside the big-endian VM?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the Linuxppc-dev mailing list