[v2 09/12] powerpc/mce: Enable MCE notifiers in external modules

Nicholas Piggin npiggin at gmail.com
Fri Jul 5 15:29:39 AEST 2019

Reza Arbab's on July 5, 2019 12:50 pm:
> On Thu, Jul 04, 2019 at 12:36:18PM +1000, Nicholas Piggin wrote:
>>Reza Arbab's on July 4, 2019 3:20 am:
>>> Since the notifier chain is actually part of the decision between (1)
>>> and (2), it's a hard limitation then that callbacks be in real address
>>> space. Is there any way to structure things so that's not the case?
>>If we tested for KVM guest first, and went through and marked (maybe
>>in a paca flag) everywhere else that put the MMU into a bad / non-host
>>state, and had the notifiers use the machine check stack, then it
>>would be possible to enable MMU here.
>>Hmm, testing for IR|DR after testing for KVM guest might actually be
>>enough without requiring changes outside the machine check handler...
>>Actually no that may not quite work because the handler could take a
>>SLB miss and it might have been triggered inside the SLB miss handler.
>>All in all I'm pretty against turning on MMU in the MCE handler
> Hey, fair enough. Just making sure there really isnt't any room to make 
> things work the way I was trying.


>>> Luckily this patch isn't really necessary for memcpy_mcsafe(), but we
>>> have a couple of other potential users of the notifier from external
>>> modules (so their callbacks would require virtual mode).
>>What users are there? Do they do any significant amount of logic that
>>can not be moved to vmlinux?
> One I had in mind was the NVIDIA driver. When taking a UE from defective 
> GPU memory, it could use the notifier to save the bad address to a 
> blacklist in their nvram. Not so much recovering the machine check, just 
> logging before the system reboots.
> The other user is a prototype driver for the IBM Research project we had 
> a talk about offline a while back.

Okay. It might be possible to save the address in the kernel and
then notify the driver afterward. For user-mode and any non-atomic
user copy AFAIK the irq_work should practically run synchronously
after the machine check returns so it might be enough to have a
notifier in the irq work processing.

> We can make this patchset work for memcpy_mcsafe(), but I think it's 
> back to the drawing board for the others.

For the first stage that would be preferable.


