[PATCH] powerpc/64s/exception: Fix machine check early corrupting AMR
Michael Ellerman
mpe at ellerman.id.au
Sat Jun 29 22:25:18 AEST 2019
Nicholas Piggin <npiggin at gmail.com> writes:
> The early machine check runs in real mode, so locking is unnecessary.
> Worse, the windup does not restore AMR, so this can result in a false
> KUAP fault after a recoverable machine check hits inside a user copy
> operation.
>
> Fix this similarly to HMI by just avoiding the kuap lock in the
> early machine check handler (it will be set by the late handler that
> runs in virtual mode if that runs). If the virtual mode handler is
> reached, it will lock and restore the AMR.
For the archives, this is how I tested this.
Build with KUAP enabled, disassemble load_elf_binary(), in there is a
call to __copy_tofrom_user(), preceded by a write to AMR, eg:
c00000000045eec8: a6 03 3d 7d mtspr 29,r9
c00000000045eecc: 2c 01 00 4c isync
c00000000045eed0: 78 93 44 7e mr r4,r18
c00000000045eed4: 78 e3 83 7f mr r3,r28
c00000000045eed8: b1 c1 c3 4b bl c00000000009b088 <__copy_tofrom_user+0x8>
Boot mambo using skiboot.tcl, break into the mambo shell. Add a
breakpoint at the branch to __copy_tofrom_user():
systemsim % b 0xc00000000045eed8
breakpoint set at [0:0:0]: 0xc00000000045eed8 (0xC00000000045EED8) Enc:0x00000000 : INVALID
Continue, run `ls` in the system shell and it should break at your breakpoint:
systemsim % c
4439260000000: [0:0]: (PC:0x00007FFFB43B2F00) : 2.1 Mega-Inst/Sec : 2.1 Mega-Cycles/Sec [1 Zaps 0 PA-Zaps] *ON* [0:0] pri=4 extra=0
4440009381609: (7208208132): # ls
[0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl $-0x3C3E50
INFO: 4440936223969: (8135050536): ** Execution stopped: user (tcl), **
4440936223969: ** finished running 8135050536 instructions **
Print the AMR, it has been cleared:
systemsim % p amr
0x0000000000000000
Then inject a machine check exception, and continue:
systemsim % exc_mce
systemsim % c
4440936231861: (8135058428): [ 8673.510176] Disabling lock debugging due to kernel taint
4440936246871: (8135073438): [ 8673.510205] MCE: CPU0: machine check (Warning) Host TLB Multihit [Recovered]
4440936266680: (8135093247): [ 8673.510244] MCE: CPU0: NIP: [c00000000045eed8] load_elf_binary+0xef8/0x1970
4440936282657: (8135109224): [ 8673.510275] MCE: CPU0: Probable Software error (some chance of hardware cause)
[0:0:0]: 0xC00000000045EED8 (0x000000000045EED8) Enc:0xB1C1C34B : bl $-0x3C3E50
INFO: 4440936296116: (8135122683): ** Execution stopped: user (tcl), **
4440936296116: ** finished running 8135122683 instructions **
Now we're back at our breakpoint. Continue again and we should get an
oops due to a bad AMR fault:
systemsim % c
4440936301692: (8135128259): [ 8673.510312] ------------[ cut here ]------------
4440936321016: (8135147583): [ 8673.510336] Bug: Write fault blocked by AMR!
4440936331347: (8135157914): [ 8673.510350] WARNING: CPU: 0 PID: 95 at arch/powerpc/include/asm/book3s/64/kup-radix.h:102 __do_page_fault+0x604/0xe60
4440936352510: (8135179077): [ 8673.510410] Modules linked in:
4440936365222: (8135191789): [ 8673.510436] CPU: 0 PID: 95 Comm: ls Tainted: G M 5.2.0-rc2-gcc-8.2.0 #273
4440936383775: (8135210342): [ 8673.510473] NIP: c0000000000716b4 LR: c0000000000716b0 CTR: c000000000ca88b0
4440936401995: (8135228562): [ 8673.510508] REGS: c0000000ec883530 TRAP: 0700 Tainted: G M (5.2.0-rc2-gcc-8.2.0)
4440936430641: (8135257208): [ 8673.510545] MSR: 9000000000021033 <SF,HV,ME,IR,DR,RI,LE> CR: 28002422 XER: 20040000
4440936498754: (8135325321): [ 8673.510597] CFAR: c00000000011b8e4 IRQMASK: 1
4440936505159: (8135331726): [ 8673.510597] GPR00: c0000000000716b0 c0000000ec8837c0 c0000000015f4900 0000000000000020
4440936515814: (8135342381): [ 8673.510597] GPR04: c000000001824550 0000000000000000 746c756166206574 64656b636f6c6220
4440936528594: (8135355161): [ 8673.510597] GPR08: 00000000fed30000 c000000001130de8 0000000000000000 9000000030001033
4440936541374: (8135367941): [ 8673.510597] GPR12: 0000000000002000 c0000000018e0000 0000000080000000 00007fffe2e3de09
4440936554154: (8135380721): [ 8673.510597] GPR16: c000000000ed2c50 0000000010000000 c000000000ed2c50 00000000100d3648
4440936564809: (8135391376): [ 8673.510597] GPR20: c0000000f0968b00 00000000100e3648 00007fff930a0000 0000000002000000
4440936577589: (8135404156): [ 8673.510597] GPR24: 0000000002000000 c0000000ee830600 0000000000000301 00007fffe2e3de09
4440936590369: (8135416936): [ 8673.510597] GPR28: 0000000000000000 000000000a000000 0000000000000000 c0000000ec883900
4440936611699: (8135438266): [ 8673.510918] NIP [c0000000000716b4] __do_page_fault+0x604/0xe60
4440936628747: (8135455314): [ 8673.510951] LR [c0000000000716b0] __do_page_fault+0x600/0xe60
4440936642325: (8135468892): [ 8673.510978] Call Trace:
4440936655614: (8135482181): [ 8673.511000] [c0000000ec8837c0] [c0000000000716b0] __do_page_fault+0x600/0xe60 (unreliable)
4440936677874: (8135504441): [ 8673.511045] [c0000000ec883890] [c00000000000b0d4] handle_page_fault+0x18/0x38
4440936700658: (8135527225): [ 8673.511091] --- interrupt: 301 at __copy_tofrom_user_power7+0x230/0x7ac
4440936709188: (8135535755): [ 8673.511091] LR = load_elf_binary+0xefc/0x1970
4440936728082: (8135554649): [ 8673.511142] [c0000000ec883b90] [c00000000045ee80] load_elf_binary+0xea0/0x1970 (unreliable)
4440936750368: (8135576935): [ 8673.511187] [c0000000ec883c90] [c0000000003d2f88] search_binary_handler.part.12+0xb8/0x2b0
4440936772446: (8135599013): [ 8673.511230] [c0000000ec883d20] [c0000000003d3934] __do_execve_file.isra.14+0x684/0xa10
4440936793891: (8135620458): [ 8673.511272] [c0000000ec883df0] [c0000000003d41b8] sys_execve+0x38/0x50
4440936813829: (8135640396): [ 8673.511311] [c0000000ec883e20] [c00000000000bdf4] system_call+0x5c/0x70
4440936828817: (8135655384): [ 8673.511340] Instruction dump:
4440936848134: (8135674701): [ 8673.511361] 60000000 2fb70000 e93f0168 419e0620 2fa90000 409cfba4 3c82ff8e 38846b88
4440936874244: (8135700811): [ 8673.511412] 3c62ff8e 38636c98 480aa1d1 60000000 <0fe00000> e80100e0 3b80000b eae10088
4440936891327: (8135717894): [ 8673.511464] ---[ end trace 0698ac8ff1068918 ]---
4440938377906: (8137204473): Segmentation fault
Apply the fix, retest, and no oops is seen.
cheers
More information about the Linuxppc-dev
mailing list