[PATCH V4] powerpc/85xx: Add machine check handler to fix PCIe erratum on mpc85xx

Jia Hongtao-B38951 B38951 at freescale.com
Wed Mar 6 19:28:52 EST 2013



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Wednesday, March 06, 2013 2:48 AM
> To: Jia Hongtao-B38951
> Cc: Wood Scott-B07421; Stuart Yoder; linuxppc-dev at lists.ozlabs.org; Kumar
> Gala
> Subject: Re: [PATCH V4] powerpc/85xx: Add machine check handler to fix
> PCIe erratum on mpc85xx
> 
> On 03/05/2013 04:12:30 AM, Jia Hongtao-B38951 wrote:
> >
> >
> > > -----Original Message-----
> > > From: Wood Scott-B07421
> > > Sent: Tuesday, March 05, 2013 7:46 AM
> > > To: Stuart Yoder
> > > Cc: Jia Hongtao-B38951; linuxppc-dev at lists.ozlabs.org; Kumar Gala
> > > Subject: Re: [PATCH V4] powerpc/85xx: Add machine check handler to
> > fix
> > > PCIe erratum on mpc85xx
> > >
> > > On 03/04/2013 10:16:10 AM, Stuart Yoder wrote:
> > > > On Mon, Mar 4, 2013 at 2:40 AM, Jia Hongtao <B38951 at freescale.com>
> > > > wrote:
> > > > > A PCIe erratum of mpc85xx may causes a core hang when a link of
> > PCIe
> > > > > goes down. when the link goes down, Non-posted transactions
> > issued
> > > > > via the ATMU requiring completion result in an instruction
> > stall.
> > > > > At the same time a machine-check exception is generated to the
> > core
> > > > > to allow further processing by the handler. We implements the
> > > > handler
> > > > > which skips the instruction caused the stall.
> > > >
> > > > Can you explain at a high level how just skipping an instruction
> > > > solves
> > > > anything?   If you just skip a load/store and continue like
> > nothing is
> > > > wrong, isn't your system possibly in a really bad state.
> > >
> > > If the instruction was a load, we probably at least want to fill the
> > > destination register with 0xffffffff or similar.
> >
> > You discuss this with Liu Shuo about a year ago.
> > here is the log:
> >
> > "
> > On 02/01/2012 02:18 AM, shuo.liu at freescale.com wrote:
> > > v3 : Skip the instruction only. Don't access the user space memory
> > in
> > >      mechine check.
> >
> > It may be the least bad option for now, but be aware that there's a
> > small chance that this will cause a leak of sensitive information
> > (such as a piece of a crypto key that happened to be sitting in the
> > register to be loaded into).
> 
> Yes, that's (one reason) why you'd want to fill in a known value.  Note
> the "for now". :-)
> 
> -Scott

I think there is no overwhelming reason to fill the destination register
with 0xffffffff. 

There's a small chance that 0xffffffff is treated as regular data rather
than an error sign.

Also setting this register may influence the user space under certain
circumstance.

So I think just ignore the skipped instruction is an acceptable option for
this fix.

-Hongtao.



More information about the Linuxppc-dev mailing list