[PATCH] powerpc/eeh: Enable IO path on permanent error
Gavin Shan
gwshan at linux.vnet.ibm.com
Fri Jan 6 12:56:41 AEDT 2017
On Fri, Jan 06, 2017 at 10:46:21AM +1100, Russell Currey wrote:
>On Fri, 2017-01-06 at 10:39 +1100, Gavin Shan wrote:
>> We give up recovery on permanent error, simply shutdown the affected
>> devices and remove them. If the devices can't be put into quiet state,
>> they spew more traffic that is likely to cause another unexpected EEH
>> error. This was observed on "p8dtu2u" machine:
>>
>> 0002:00:00.0 PCI bridge: IBM Device 03dc
>> 0002:01:00.0 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>> 0002:01:00.1 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>> 0002:01:00.2 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>> 0002:01:00.3 Ethernet controller: Intel Corporation \
>> Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
>>
>> On P8 PowerNV platform, the IO path is frozen when shutdowning the
>> devices, meaning the memory registers are inaccessible. It is why
>> the devices can't be put into quiet state before removing them.
>> This fixes the issue by enabling IO path prior to putting the devices
>> into quiet state.
>>
>> Link: https://github.com/open-power/supermicro-openpower/issues/419
>
>FYI this link isn't publicly accessible.
>
Yeah, I knew it. The reason I put it here is more details out there
for you or me.
>> Reported-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
>> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>> ---
>> arch/powerpc/kernel/eeh.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index 8180bfd..9de7f79 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -298,9 +298,17 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int
>> severity)
>> *
>> * For pHyp, we have to enable IO for log retrieval. Otherwise,
>> * 0xFF's is always returned from PCI config space.
>> + *
>> + * When the @severity is EEH_LOG_PERM, the PE is going to be
>> + * removed. Prior to that, the drivers for devices included in
>> + * the PE will be closed. The drivers rely on working IO path
>> + * to bring the devices to quiet state. Otherwise, PCI traffic
>> + * from those devices after they are removed is like to cause
>> + * another unexpected EEH error.
>> */
>> if (!(pe->type & EEH_PE_PHB)) {
>> - if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
>> + if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG) ||
>> + severity == EEH_LOG_PERM)
>> eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
>>
>> /*
>
More information about the Linuxppc-dev
mailing list