[PATCH] Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"

Daniel Axtens dja at axtens.net
Fri Dec 4 07:46:52 AEDT 2015


Hi Andrew and Gavin,

To flesh out Andrew's commit message, and add some surrounding detail to
help in debugging:

Before 527d10ef3a315, all PEs were unfrozen when we called
eeh_reset_device(). That patch changed behaviour to skip the PE
associated with or reserved for the PHB. Indeed, this shouldn't have
made a functional change, because no resources required for the device
should be associated with the PHB's PE.

Obviously, however, it does change behaviour.

We found that not only does it break the cxlflash driver, we can test
for its presence by trying to run cat on the relevant vpd file in /sys/.
Before doing a reset, you can cat the file without issue.
After a reset, catting the file fails with -ENODEV.
Curiously, lspci succeeds both before and after. lspci doesn't try to
read the file from start to end however, it just reads certain bytes. So
there are some bytes in the file that trigger -ENODEV after a reset.

We know that it is failing to read bytes in regular config space (not
CAPI's magic MMIO fake config space for AFUs: we're still in regular PCI
land at this point). What we don't know is precisely what those bytes
are, and why those bytes are (seemingly) being associated with the PHB's
PE.

I had some theories, and maybe Andrew can update this list:
    - CAPI code is doing something wrong.
    - There's a bug in PCI resource allocation or the mapping of
      resources to PEs such that something is hitting space assigned for
      PE0.
    - CAPI is 'special' and requires PE#0 to be unfrozen.

This fix is a fix to the symptom, not the problem; we obviously need to
know the root cause in order to fix the root cause. However, if we are
no closer to figuring it out soon, we should probably take this patch so
as not to release a 4.4 that breaks CAPI.

Congrats to Andrew, btw, for apparently being the only CAPI developer
who is testing CAPI stuff against mainline at the moment.

Regards,
Daniel

> This reverts commit 527d10ef3a315d3cb9dc098dacd61889a6c26439.
>
> The reverted commit breaks cxlflash devices following an EEH reset.
> Attempting to load the cxlflash driver after a reset results in a call to
> pci_read_vpd() returning -ENODEV, causing driver initialisation to fail.
>
> At this stage, we don't fully understand why this is happening, and we
> also haven't tested whether this occurs for other cxl devices. In the
> meantime, though, revert the commit, especially as it was intended to be a
> non-functional change.
>
> Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>
> ---
>
> This issue was identified by bisection following breakage observed in
> 4.4-rc1. I'm continuing to investigate the root cause (and testing on cxl
> devices other than cxlflash), as the commit in question shouldn't have
> caused problems.
> ---
>  arch/powerpc/kernel/eeh_driver.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> index 80dfe89..8d14feb 100644
> --- a/arch/powerpc/kernel/eeh_driver.c
> +++ b/arch/powerpc/kernel/eeh_driver.c
> @@ -590,16 +590,10 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
>  	eeh_ops->configure_bridge(pe);
>  	eeh_pe_restore_bars(pe);
>  
> -	/*
> -	 * If it's PHB PE, the frozen state on all available PEs should have
> -	 * been cleared by the PHB reset. Otherwise, we unfreeze the PE and its
> -	 * child PEs because they might be in frozen state.
> -	 */
> -	if (!(pe->type & EEH_PE_PHB)) {
> -		rc = eeh_clear_pe_frozen_state(pe, false);
> -		if (rc)
> -			return rc;
> -	}
> +	/* Clear frozen state */
> +	rc = eeh_clear_pe_frozen_state(pe, false);
> +	if (rc)
> +		return rc;
>  
>  	/* Give the system 5 seconds to finish running the user-space
>  	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
> -- 
> Andrew Donnellan              Software Engineer, OzLabs
> andrew.donnellan at au1.ibm.com  Australia Development Lab, Canberra
> +61 2 6201 8874 (work)        IBM Australia Limited
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 859 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20151204/8545a633/attachment.sig>


More information about the Linuxppc-dev mailing list