[PATCH v1] powerpc/64s: Fix unrecoverable MCE crash

Ganesh ganeshgr at linux.ibm.com
Fri Sep 24 04:22:16 AEST 2021


On 9/22/21 7:32 AM, Nicholas Piggin wrote:

> The machine check handler is not considered NMI on 64s. The early
> handler is the true NMI handler, and then it schedules the
> machine_check_exception handler to run when interrupts are enabled.
>
> This works fine except the case of an unrecoverable MCE, where the true
> NMI is taken when MSR[RI] is clear, it can not recover to schedule the
> next handler, so it calls machine_check_exception directly so something
> might be done about it.
>
> Calling an async handler from NMI context can result in irq state and
> other things getting corrupted. This can also trigger the BUG at
> arch/powerpc/include/asm/interrupt.h:168.
>
> Fix this by just making the 64s machine_check_exception handler an NMI
> like it is on other subarchs.
>
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> ---

Hi Nick,

If I inject control memory access error in LPAR on top of this patch
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210906084303.183921-1-ganeshgr@linux.ibm.com/

I see the following warning trace

WARNING: CPU: 130 PID: 7122 at arch/powerpc/include/asm/interrupt.h:319 machine_check_exception+0x310/0x340
  Modules linked in:
  CPU: 130 PID: 7122 Comm: inj_access_err Kdump: loaded Tainted: G   M              5.15.0-rc2-cma-00054-g4a0d59fbaf71-dirty #22
  NIP:  c00000000002f980 LR: c00000000002f7e8 CTR: c000000000a31860
  REGS: c0000039fe51bb20 TRAP: 0700   Tainted: G   M               (5.15.0-rc2-cma-00054-g4a0d59fbaf71-dirty)
  MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 88000222  XER: 20040000
  CFAR: c00000000002f844 IRQMASK: 0
  GPR00: c00000000002f798 c0000039fe51bdc0 c0000000020d0000 0000000000000001
  GPR04: 0000000000000000 4000000000000002 4000000000000000 00000000000019af
  GPR08: 00000077e5ad0000 0000000000000000 c0000077ee16c700 0000000000000080
  GPR12: 0000000088000222 c0000077ee16c700 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 c0000000020fecd8 0000000000000000
  GPR28: 0000000000000000 0000000000000001 0000000000000001 c0000039fe51be80
  NIP [c00000000002f980] machine_check_exception+0x310/0x340
  LR [c00000000002f7e8] machine_check_exception+0x178/0x340
  Call Trace:
  [c0000039fe51bdc0] [c00000000002f798] machine_check_exception+0x128/0x340 (unreliable)
  [c0000039fe51be10] [c0000000000086ec] machine_check_common+0x1ac/0x1b0
  --- interrupt: 200 at 0x10000968
  NIP:  0000000010000968 LR: 0000000010000958 CTR: 0000000000000000
  REGS: c0000039fe51be80 TRAP: 0200   Tainted: G   M               (5.15.0-rc2-cma-00054-g4a0d59fbaf71-dirty)
  MSR:  8000000002a0f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 22000824  XER: 00000000
  CFAR: 000000000000021c DAR: 00007fffb00c0000 DSISR: 02000008 IRQMASK: 0
  GPR00: 0000000022000824 00007fffc9647770 0000000010027f00 00007fffb00c0000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 00007fffb00c0000 0000000000000001 0000000000000000
  GPR12: 0000000000000000 00007fffb015a330 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000085c
  GPR28: 00007fffc9647d18 0000000000000001 00000000100009b0 00007fffc9647770
  NIP [0000000010000968] 0x10000968
  LR [0000000010000958] 0x10000958
  --- interrupt: 200

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20210923/b0059f2b/attachment.htm>


More information about the Linuxppc-dev mailing list