EEH regression: PE <-> device binding lost after reset

Daniel Axtens dja at axtens.net
Mon Aug 10 09:23:27 AEST 2015


Hi,

I'm experiencing a regression in EEH that was introduced somewhere
between 4.0 and 4.1.

I have been reproducing this with a CAPI (CXL) card, but the behaviour
isn't CAPI related and the triggering code hasn't changed. CAPI cards
are reprogrammed by PERSTing the slot they sit in, so CAPI exposes a
'reset' file in sysfs that does "pci_set_pcie_reset_state(dev,
pcie_warm_reset)", and then relies on EEH noticing to properly reset the
card.

In 4.0 and earlier, this worked: the slot would be persted, EEH would
notice and hotplug. You could do this as many times as you liked.

In 4.1 and later, you can do 1 successful reset, but any subsequent
reset causes the following to be printed in dmesg:

[  225.118656] cxl-pci 0006:01:00.0: CXL reset
[  225.118663] pcibios_set_pcie_reset_state: No PE found on PCI device 0006:01:00.0
[  225.118672] cxl-pci 0006:01:00.0: cxl: pcie_warm_reset failed

I'm digging through the commits between 4.0 and 4.1 at the moment, but I
thought I'd post it here in hopes someone had an idea what the root
cause was. 


-- 
Regards,
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 860 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20150810/ee9a9cf7/attachment.sig>


More information about the Linuxppc-dev mailing list