[PATCH v3 0/4] Fixes for 3 separate NMI reentrancy bugs

Satheesh Rajendran sathnaga at linux.vnet.ibm.com
Tue Feb 26 17:51:10 AEDT 2019


On Tue, Feb 26, 2019 at 04:08:57PM +1000, Nicholas Piggin wrote:
> This series fixes several similar but unrelated bugs with NMIs
> clobbering live registers without noticing it, because MSR[RI] is set.
> Pretty rare bugs, but serious silent corruption consequences.
> 
> For the most part these can be observed and tested quite easily
> with the mambo simulator, except that it does not seem to follow
> the architecture wrt leaving MSR[RI] unchanged for HV interrupts.
> Mambo clears MSR[RI], so you have to account for that manually.
> 
> Since v1:
> - Fixed several build bugs.
> 
> Since v2:
> - Improved changelog and comments.
> - Fixed the NIA test for virt mode interrupts.

Hit with below crash on Power8 box, patch built with linuxppc merge branch with `ppc64le_defconfig`

UnknownStateTransition: Something happened system state="8" and we transitioned to UNKNOWN state.  Review the following for more details
Message="OpTestSystem in run_IPLing and Exception="Kernel OOPS (machine in state '5'): Oops: Kernel access of bad area, sig: 11 [#1]
[    0.000000] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc7-gf46b87021 #1
[    0.000000] NIP:  c000000000c1306c LR: c000000000c12f64 CTR: c00000000033d860
[    0.000000] REGS: c0000000014878b0 TRAP: 0380   Not tainted  (5.0.0-rc7-gf46b87021)
[    0.000000] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28002224  XER: 00000000
[    0.000000] CFAR: c000000000c12f7c IRQMASK: 1 
[    0.000000] GPR00: c000000000c12f64 c000000001487b40 c000000001488400 f000000000000000 
[    0.000000] GPR04: c000000001487b18 c000000001487b20 0000000000000000 c000000001388400 
[    0.000000] GPR08: f000000000000000 f000000000000008 0000000000000000 0000000800000000 
[    0.000000] GPR12: c0000000015e1ed0 c000000001670000 0000000000000000 0000000000000000 
[    0.000000] GPR16: 0000000000000000 0000000000000000 c0000000015e0d40 0000000000000001 
[    0.000000] GPR20: ffffffffffffffff ffffffffffffffff 0000000008000000 c000000001413b90 
[    0.000000] GPR24: c000000001413b98 007ffff000000000 0000000000080000 0000000000000000 
[    0.000000] GPR28: 0000000000000000 0000000000000000 007ffff000001000 0000000000000000 
[    0.000000] NIP [c000000000c1306c] memmap_init_zone+0x258/0x308
[    0.000000] LR [c000000000c12f64] memmap_init_zone+0x150/0x308
[    0.000000] Call Trace:
[    0.000000] [c000000001487b40] [c000000000c12f64] memmap_init_zone+0x150/0x308 (unreliable)
[    0.000000] [c000000001487be0] [c000000000f87acc] free_area_init_node+0x480/0x518
[    0.000000] [c000000001487cf0] [c000000000f88630] free_area_init_nodes+0x838/0x940
[    0.000000] [c000000001487e10] [c000000000f6340c] paging_init+0x8c/0xa8
[    0.000000] [c000000001487e80] [c000000000f5bc00] setup_arch+0x3b4/0x3f0
[    0.000000] [c000000001487ef0] [c000000000f53b68] start_kernel+0x94/0x630
[    0.000000] [c000000001487f90] [c00000000000b37c] start_here_common+0x1c/0x520
[    0.000000] Instruction dump:
[    0.000000] 71290002 41820014 ebea0008 7cc6fa14 78df8402 48000070 3d22000c 7bea3664 
[    0.000000] 39299d20 e9090000 7c685214 39230008 <fa290010> fa290018 fa290020 fa290030 
[    0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x80 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] 
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Rebooting in 10 seconds" caused the system to go to UNKNOWN_BAD and the system will be stopping."

Regards,
-Satheesh.
> 
> Nicholas Piggin (4):
>   powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
>   powerpc/64s: system reset interrupt preserve HSRRs
>   powerpc/64s: Prepare to handle data interrupts vs d-side MCE
>     reentrancy
>   powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
> 
>  arch/powerpc/include/asm/asm-prototypes.h |  8 ++
>  arch/powerpc/include/asm/nmi.h            |  2 +
>  arch/powerpc/kernel/exceptions-64s.S      | 92 +++++++++++++++++++----
>  arch/powerpc/kernel/mce.c                 |  3 +
>  arch/powerpc/kernel/traps.c               | 91 +++++++++++++++++++++-
>  5 files changed, 179 insertions(+), 17 deletions(-)
> 
> -- 
> 2.18.0
> 



More information about the Linuxppc-dev mailing list