[PATCH] powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

Thu Oct 9 13:18:10 EST 2014

> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Thursday, October 09, 2014 7:48 AM
> To: Jia Hongtao-B38951
> Cc: Guenter Roeck; Benjamin Herrenschmidt; Paul Mackerras; Michael
> Ellerman; linuxppc-dev at lists.ozlabs.org; linux-kernel at vger.kernel.org;
> Jojy G Varghese; Guenter Roeck
> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> exception on E500MC / E5500
> 
> On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
> >
> > > -----Original Message-----
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, September 30, 2014 2:36 AM
> > > To: Guenter Roeck
> > > Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman;
> > > linuxppc- dev at lists.ozlabs.org; linux-kernel at vger.kernel.org; Jojy G
> > > Varghese; Guenter Roeck; Jia Hongtao-B38951
> > > Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine
> > > check exception on E500MC / E5500
> > >
> > > On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > > > From: Jojy G Varghese <jojyv at juniper.net>
> > > >
> > > > For E500MC and E5500, a machine check exception in pci(e) memory
> > > > space crashes the kernel.
> > > >
> > > > Testing shows that the MCAR(U) register is zero on a MC exception
> > > > for the
> > > > E5500 core. At the same time, DEAR register has been found to have
> > > > the address of the faulty load address during an MC exception for
> this core.
> > > >
> > > > This fix changes the current behavior to fixup the result register
> > > > and instruction pointers in the case of a load operation on a
> > > > faulty PCI address.
> > > >
> > > > The changes are:
> > > > - Added the hook to pci machine check handing to the e500mc
> > > > machine
> > > check
> > > >   exception handler.
> > > > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > > >   As mentioned above, this is necessary because the E5500 core does
> not
> > > >   report the fault address in the MCAR register.
> > > >
> > > > Cc: Scott Wood <scottwood at freescale.com>
> > > > Signed-off-by: Jojy G Varghese <jojyv at juniper.net> [Guenter Roeck:
> > > > updated description]
> > > > Signed-off-by: Guenter Roeck <groeck at juniper.net>
> > > > Signed-off-by: Guenter Roeck <linux at roeck-us.net>
> > > > ---
> > > >  arch/powerpc/kernel/traps.c   | 3 ++-
> > > >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++
> > > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/powerpc/kernel/traps.c
> > > > b/arch/powerpc/kernel/traps.c index 0dc43f9..ecb709b 100644
> > > > --- a/arch/powerpc/kernel/traps.c
> > > > +++ b/arch/powerpc/kernel/traps.c
> > > > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > > >  	int recoverable = 1;
> > > >
> > > >  	if (reason & MCSR_LD) {
> > > > -		recoverable = fsl_rio_mcheck_exception(regs);
> > > > +		recoverable = fsl_rio_mcheck_exception(regs) ||
> > > > +			fsl_pci_mcheck_exception(regs);
> > > >  		if (recoverable == 1)
> > > >  			goto silent_out;
> > > >  	}
> > > > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > > > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > > > --- a/arch/powerpc/sysdev/fsl_pci.c
> > > > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > > > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > > > *regs)  #endif
> > > >  	addr += mfspr(SPRN_MCAR);
> > > >
> > > > +#ifdef CONFIG_E5500_CPU
> > > > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > > > +		addr = PFN_PHYS(vmalloc_to_pfn((void
> *)mfspr(SPRN_DEAR)));
> > > #endif
> > >
> > > Kconfig tells you what hardware is supported, not what hardware
> > > you're actually running on.
> > >
> > > Jia Hongtao, do you know anything about this issue?  Is there an
> erratum?
> >
> > Sorry for the late response, I just return from my vacation.
> > I don't know this issue.
> >
> > > What chips are affected by the the erratum covered by
> > > <http://patchwork.ozlabs.org/patch/240239/>?
> >
> > MPC8544, MPC8548, MPC8572 are affected by this erratum.
> 
> What is the erratum number?

The number of this erratum for each chip is not consistent.
MPC8544: PCIe 4
MPC8548: PCI-Ex 39
MPC8572: PCI-Ex 3

> 
> > I checked P4080 which using e500mc and no such erratum is found.
> 
> What is the erratum behavior, and how does it differ from the problem
> that Jojy and Guenter are trying to solve?

Here is the description of the erratum:

"When its link goes down, the PCI Express controller clears all outstanding transactions with an
error indicator and sends a link down exception to the interrupt controller if
PEX_PME_MES_DISR[LDDD] = 0. If, however, any transactions are sent to the controller
after the link down event, they will be accepted by the controller and wait for the link to come
back up before starting any timeout counters (e.g. completion timeout). There is no mechanism
to cancel the new transactions short of a device HRESET."

For e500mc as Jojy and Guenter described it's like the same erratum on e500, not 100% sure.

For e5500 I don't quite understand yet.

> 
> -Scott
>