cxl: Remove racy attempt to force EEH invocation in reset
Michael Ellerman
mpe at ellerman.id.au
Fri Aug 28 07:58:33 AEST 2015
On Fri, 2015-21-08 at 07:25:15 UTC, Daniel Axtens wrote:
> cxl_reset currently PERSTs the slot, and then repeatedly tries to
> read MMIO space in order to kick off EEH.
>
> There are 2 problems with this: it's unnecessary, and it's racy.
>
> It's unnecessary because the PERST will bring down the PHB link.
> That will be picked up by the CAPP, which will send out an HMI.
> Skiboot, noticing an HMI from the CAPP, will send an OPAL
> notification to the kernel, which will trigger EEH recovery.
>
> It's also racy: the EEH recovery triggered by the CAPP will
> eventually cause the MMIO space to have its mapping invalidated
> and the pointer NULLed out. This races with our attempt to read
> the MMIO space. This is causing OOPSes in testing.
>
> Simply drop all the attempts to force EEH detection, and trust
> that Skiboot will send the notification and that we'll act on it.
> The Skiboot code to send the EEH notification has been in Skiboot
> for as long as CAPP recovery has been supported, so we don't need
> to worry about breaking obscure setups with ancient firmware.
>
> Cc: Ryan Grimm <grimm at linux.vnet.ibm.com>
> Cc: stable at vger.kernel.org
> Fixes: 62fa19d4b4fd ("cxl: Add ability to reset the card")
> Signed-off-by: Daniel Axtens <dja at axtens.net>
> Acked-by: Ian Munsie <imunsie at au1.ibm.com>
Applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/9d8e27673c45927fee9e7d89
cheers
More information about the Linuxppc-dev
mailing list