[PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion

Gavin Shan gwshan at linux.vnet.ibm.com
Tue Oct 13 10:25:10 AEDT 2015


On Tue, Oct 13, 2015 at 09:55:53AM +1100, Daniel Axtens wrote:
>> Currently, we rely on the existence of struct pci_driver::err_handler
>> to judge if the corresponding PCI device should be unplugged during
>> EEH recovery (partially hotplug case). However, it's not elaborate.
>> some device drivers are implementing part of the EEH error handlers
>> to collect diag-data. That means the driver still expects a hotplug
>> to recover from the EEH error.
>
>
>> This makes the hotplug criterion more relaxed: if the device driver
>> doesn't provide all necessary EEH error handlers, it will experience
>> hotplug during EEH recovery.
>
>Interesting.
>
>My understanding of Documentation/PCI/pci-error-recovery.txt is that a
>driver should be able to just supply an error_detected() callback. If
>the driver just wants to collect diag-data and wants to be hotplugged,
>it should return PCI_ERS_RESULT_NONE.
>
>What drivers did you have in mind?
>

Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida
private GPU drivers. I tried to find the source code from upstream kernel,
but failed.

Taking an example, one PE has two different devices A and B. A's driver
privides error_detected()/slot_reset()/resume() and it's returning NEED_RESET.
B's driver just provides error_detected() that returns NONE as you said.
EEH core receives NEED_RESET and B won't be having hotplug during recovery.
The error won't be recovered on B.

Thanks,
Gavin

>>
>> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/eeh_driver.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>> index 3a626ed..32178a4 100644
>> --- a/arch/powerpc/kernel/eeh_driver.c
>> +++ b/arch/powerpc/kernel/eeh_driver.c
>> @@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
>>  	driver = eeh_pcid_get(dev);
>>  	if (driver) {
>>  		eeh_pcid_put(dev);
>> -		if (driver->err_handler)
>> +		if (driver->err_handler &&
>> +		    driver->err_handler->error_detected &&
>> +		    driver->err_handler->slot_reset &&
>> +		    driver->err_handler->resume)
>>  			return NULL;
>>  	}
>>  
>> -- 
>> 2.1.0
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev at lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev




More information about the Linuxppc-dev mailing list