[PATCH] powerpc/eeh: Avoid use after free in eeh_handle_special_event()

Mon Mar 6 10:22:46 AEDT 2017

On Fri, Mar 03, 2017 at 04:59:11PM +1100, Alexey Kardashevskiy wrote:
>On 03/03/17 15:47, Russell Currey wrote:
>> eeh_handle_special_event() is called when an EEH event is detected but
>> can't be narrowed down to a specific PE.  This function looks through
>> every PE to find one in an erroneous state, then calls the regular event
>> handler eeh_handle_normal_event() once it knows which PE has an error.
>> 
>> However, if eeh_handle_normal_event() found that the PE cannot possibly
>> be recovered, it will remove the PE and associated devices.  This leads
>> to a use after free in eeh_handle_special_event() as it attempts to clear
>> the "recovering" state on the PE after eeh_handle_normal_event() returns.
>> 
>> Thus, make sure the PE is valid when attempting to clear state in
>> eeh_handle_special_event().
>> 
>> Cc: <stable at vger.kernel.org> #3.10+
>> Reported-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>> Signed-off-by: Russell Currey <ruscur at russell.cc>
>> ---
>>  arch/powerpc/kernel/eeh_driver.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>> 
>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>> index b94887165a10..492397298a2a 100644
>> --- a/arch/powerpc/kernel/eeh_driver.c
>> +++ b/arch/powerpc/kernel/eeh_driver.c
>> @@ -983,6 +983,19 @@ static void eeh_handle_special_event(void)
>>  		if (rc == EEH_NEXT_ERR_FROZEN_PE ||
>>  		    rc == EEH_NEXT_ERR_FENCED_PHB) {
>>  			eeh_handle_normal_event(pe);
>> +
>> +			/*
>> +			 * eeh_handle_normal_event() can free the PE if it
>> +			 * determines that the PE cannot possibly be recovered.
>> +			 * Make sure the PE still exists before changing its
>> +			 * state.
>> +			 */
>> +			if (!pe || (pe->type & EEH_PE_INVALID)
>> +			    || (pe->state & EEH_PE_REMOVED)) {
>
>
>The bug is that pe becomes stale after eeh_handle_normal_event() returned
>and dereferencing it afterwards is broken.
>

Correct, it won't cause a kernel crash as @pe is deferencing linear mapped
area whose address is always valid. I think the proper fix would be to use
eeh_handle_normal_event() to indicate the @pe has been released and don't
access it any more.

>
>
>> +				pr_warn("EEH: not clearing state on bad PE\n");

The message like this isn't meaningful, no need to have it. The messages that
have prefix "EEH:" is informative messages. We definitely needn't this here.
However, the message might be not needed in next revision.

>> +				continue;
>> +			}
>> +
>>  			eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
>>  		} else {
>>  			pci_lock_rescan_remove();
>> 

Thanks,
Gavin