[PATCH] powerpc/mce: Add MCE notification chain

Mahesh J Salgaonkar mahesh at linux.ibm.com
Tue Apr 7 03:17:37 AEST 2020


On 2020-04-06 12:17:22 Mon, Nicholas Piggin wrote:
> Ganesh's on April 4, 2020 11:05 pm:
> > On 4/3/20 7:38 AM, Nicholas Piggin wrote:
> > 
> >> Ganesh Goudar's on March 30, 2020 5:12 pm:
> >>> From: Santosh S <santosh at fossix.org>
> >>>
> >>> Introduce notification chain which lets know about uncorrected memory
> >>> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> >>> to track bad blocks for better handling of persistent memory allocations.
> >>>
> >>> Signed-off-by: Santosh S <santosh at fossix.org>
> >>> Signed-off-by: Ganesh Goudar <ganeshgr at linux.ibm.com>
> >> Do you have any such users yet? It would be good to refer to an example
> >> user and give a brief description of what it does in its notifier.
> > 
> > Santosh has sent a patch which uses this notification.
> > https://patchwork.ozlabs.org/patch/1265062/
> 
> Okay. So these things are asynchronous after the machine check. I guess
> that's the design of it and memory offlining does something similar by
> the looks, but how do you prevent the memory being allocated for 
> something else before the notifiers run?

We can't. This race even exists today when we call memory_failure(). If
the same memory is allocated again then we may hit another mce on same
address when touched until the subsystem that has resistered for
notification has completed handling the notified address.

Thanks,
-Mahesh.

> 
> >>> @@ -263,6 +277,7 @@ static void machine_process_ue_event(struct work_struct *work)
> >>>   	while (__this_cpu_read(mce_ue_count) > 0) {
> >>>   		index = __this_cpu_read(mce_ue_count) - 1;
> >>>   		evt = this_cpu_ptr(&mce_ue_event_queue[index]);
> >>> +		blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
> >> Can we really use a blocking notifier here? I'm not sure that we can.
> > 
> > I think we can, do you see any problem?
> 
> No it looks okay after better look, sorry for the noise.
> 
> Thanks,
> Nick

-- 
Mahesh J Salgaonkar



More information about the Linuxppc-dev mailing list