[PATCH] powerpc/64s: Fix unrecoverable SLB crashes due to preemption check

Michael Ellerman mpe at ellerman.id.au
Mon May 4 20:53:47 AEST 2020


Hugh Dickins <hughd at google.com> writes:
> On Sun, 3 May 2020, Michael Ellerman wrote:
>
>> Hugh reported that his trusty G5 crashed after a few hours under load
>> with an "Unrecoverable exception 380".
>> 
>> The crash is in interrupt_return() where we check lazy_irq_pending(),
>> which calls get_paca() and with CONFIG_DEBUG_PREEMPT=y that goes to
>> check_preemption_disabled() via debug_smp_processor_id().
>> 
>> As Nick explained on the list:
>> 
>>   Problem is MSR[RI] is cleared here, ready to do the last few things
>>   for interrupt return where we're not allowed to take any other
>>   interrupts.
>> 
>>   SLB interrupts can happen just about anywhere aside from kernel
>>   text, global variables, and stack. When that hits, it appears to be
>>   unrecoverable due to RI=0.
>> 
>> The problematic access is in preempt_count() which is:
>> 
>> 	return READ_ONCE(current_thread_info()->preempt_count);
>> 
>> Because of THREAD_INFO_IN_TASK, current_thread_info() just points to
>> current, so the access is to somewhere in kernel memory, but not on
>> the stack or in .data, which means it can cause an SLB miss. If we
>> take an SLB miss with RI=0 it is fatal.
>> 
>> The easiest solution is to add a version of lazy_irq_pending() that
>> doesn't do the preemption check and call it from the interrupt return
>> path.
>> 
>> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C")
>> Reported-by: Hugh Dickins <hughd at google.com>
>> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
>
> Thank you, Michael and Nick: this has been running fine all day for me.

Thanks Hugh.

cheers


More information about the Linuxppc-dev mailing list