kernel panic with "Unrecoverable FP Unavailable Exception 800 at c00000000009e308"

Ryan Wong colorfulshark at gmail.com
Wed Jun 23 19:43:08 AEST 2021


Hi,

Recently I encountered a kernel panic announcing "Unrecoverable FP
Unavailable Exception 800 at c00000000009e308". I have attached the panic
log at the end of the mail.
As I known, this exception occured when the hard floating-point instruction
was executed with FPU disabled, and if the instruction was from kernel
space, kernel would assume it as unrecoverable and panic itself.
*Here is the investigation I have done.*
I checked the MSR firstly, and MSR[PR] = 0 and MSR[FP] = 0, It seems that
the system did match the panic condition.
Because MSR[PR] = 0, the instruction seemed come from kernel, but kernel
would not do floating point calculation normally, so I was quite curious
about the code which triggered the exception. And from the backtrace log,
it should be the "update_min_vruntime" function.
Unfortunately, I didn't see any floating-point operation in that function.
Then I disassembled the vmlinux and found out the disassembly code of that
function, and matched it with the instruction dump:


















*c00000000009e2b8 <.update_min_vruntime>:...c00000000009e2d8:       e9 1f
00 20     ld      r8,32(r31)c00000000009e2dc:       2f a9 00 00     cmpdi
cr7,r9,0c00000000009e2e0:       41 9e 00 68     beq
cr7,c00000000009e348 <.update_min_vruntime+0x90>c00000000009e2e4:       e9
5f 00 30     ld      r10,48(r31)c00000000009e2e8:       e9 29 00 50     ld
     r9,80(r9)c00000000009e2ec:       2f aa 00 00     cmpdi
cr7,r10,0c00000000009e2f0:       41 9e 00 10     beq
cr7,c00000000009e300 <.update_min_vruntime+0x48>c00000000009e2f4:       e9
4a 00 40     ld      r10,64(r10)c00000000009e2f8:       7c e9 50 51
subf.   r7,r9,r10c00000000009e2fc:       41 80 00 24     blt
c00000000009e320 <.update_min_vruntime+0x68>c00000000009e300:       7c e8
48 51     subf.   r7,r8,r9c00000000009e304:       40 81 00 28     ble
c00000000009e32c <.update_min_vruntime+0x74>c00000000009e308:       f9 3f
00 20     std     r9,32(r31)c00000000009e30c:       38 21 00 80     addi
 r1,r1,128c00000000009e310:       e8 01 00 10     ld
 r0,16(r1)c00000000009e314:       eb e1 ff f8     ld      r31,-8(r1)*

And the criminal instruction is
*c00000000009e308:       f9 3f 00 20     std     r9,32(r31)  *

This is nothing to do with floating-point, I could not imagine why it will
trigger the exception.

Do you guys have any idea about this condition, appreciate for your reply.

*Panic log*
...
Linux version 4.1.21 (ryan at ubuntu) (gcc version 5.2.0) #22 SMP PREEMPT Wed
Oct 28 10:04:32 CST 2020
...
<1>Kernel command line: ramdisk_size=0x700000 root=/dev/ram rw init=/init
mem=3840M reserve=256M at 3840M console=ttyS0,115200 crashkernel=128M at 32M
bportals=s1 qportals=s1
...
<0>linux-kernel-bde (16258): Allocating DMA memory using method dmaalloc=0
<0>linux-kernel-bde (16258): _use_dma_mapping:1 _dma_vbase:c000000060000000
_dma_pbase:60000000 _cpu_pbase:60000000 allocated:2000000 dmaalloc:0
<0>linux-kernel-bde (16247): _interrupt_connect d 0
<0>linux-kernel-bde (16247): connect primary isr
<0>linux-kernel-bde (16247): _interrupt_connect(3514):device# = 0,
irq_flags = 128, irq = 41
<1>device eth0.4092 entered promiscuous mode
<1>Unrecoverable FP Unavailable Exception 800 at c00000000009e308
<0>Oops: Unrecoverable FP Unavailable Exception, sig: 6 [#1]
<0>PREEMPT SMP NR_CPUS=4 CoreNet Generic
<0>Modules linked in: linux_user_bde(PO) linux_kernel_bde(PO) dma2(O)
dma(O) watchdog(O) ttyVS(O) gpiodev(O) lbdev(O) spid(O) block2mtd
mpc85xx_edac edac_core sch_fq_codel uio_seville(O) loop [last unloaded:
linux_kernel_bde]
<1>CPU: 1 PID: 7 Comm: rcu_preempt Tainted: P           O    4.1.21 #22
<1>task: c0000000e11a4680 ti: c0000000e11d8000 task.ti: c0000000e11d8000
<0>NIP: c00000000009e308 LR: c00000000009eda4 CTR: c0000000000a2de8
<0>REGS: c0000000e11db4d0 TRAP: 0800   Tainted: P           O     (4.1.21)
<0>MSR: 0000000080029000 <CE,EE,ME>  CR: 44a44242  XER: 00000000
<0>SOFTE: 0
<0>GPR00: c00000000009eda4 c0000000e11db750 c000000001763800
c0000000efe476a0
<0>GPR04: c0000000e11a4680 c0000000efe4fea0 c0000000efe47fa0
c000000001643800
<0>GPR08: 000006b94a32fd58 000006b949bb61f8 0000000000000000
c0000000e11f0000
<0>GPR12: 0000000044a44244 c00000000fffe6c0 0000000000000000
0000000000000000
<0>GPR16: c0000000016a9fa0 c0000000016aa108 00000000000000fa
0000000000000001
<0>GPR20: c00000000176d578 0000000000000000 0000000000000001
0000000000000000
<0>GPR24: 0000000000000001 c000000000b08a18 0000000000000000
c0000000efe47640
<0>NIP [c00000000009e308] .update_min_vruntime+0x50/0xa4
<0>LR [c00000000009eda4] .update_curr+0x80/0x1ec
<0>Call Trace:
<0>[c0000000e11db750] [c0000000e1004560] 0xc0000000e1004560 (unreliable)
<0>[c0000000e11db7d0] [c00000000009eda4] .update_curr+0x80/0x1ec
<0>[c0000000e11db870] [c0000000000a2e80] .dequeue_task_fair+0x98/0xaf0
<0>[c0000000e11db960] [c00000000009376c] .dequeue_task+0x68/0x88
<0>[c0000000e11db9f0] [c000000000ae8f88] .__schedule+0x2f4/0x7b4
<0>[c0000000e11dbaa0] [c000000000ae9484] .schedule+0x3c/0xa8
<0>[c0000000e11dbb20] [c000000000aecc98] .schedule_timeout+0x150/0x2d0
<0>[c0000000e11dbc00] [c0000000000cdbb0] .rcu_gp_kthread+0x6c4/0xad4
<0>[c0000000e11dbd30] [c000000000088aac] .kthread+0x10c/0x12c
<0>[c0000000e11dbe30] [c0000000000009b0] .ret_from_kernel_thread+0x58/0xa8
<0>Instruction dump:
<0>e91f0020 2fa90000 419e0068 e95f0030 e9290050 2faa0000 419e0010 e94a0040
<0>7ce95051 41800024 7ce84851 40810028 <f93f0020> 38210080 e8010010 ebe1fff8
<1>---[ end trace bc398b62ecbb6901 ]---
<0>
<1>note: rcu_preempt[7] exited with preempt_count 2

Thanks,
Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20210623/f4d78eed/attachment.htm>


More information about the Linuxppc-dev mailing list