[PATCH V4] powerpc/85xx: Add machine check handler to fix PCIe erratum on mpc85xx

Jia Hongtao-B38951 B38951 at freescale.com
Thu Mar 7 19:06:05 EST 2013



> -----Original Message-----
> From: David Laight [mailto:David.Laight at ACULAB.COM]
> Sent: Wednesday, March 06, 2013 6:24 PM
> To: Jia Hongtao-B38951; Wood Scott-B07421
> Cc: linuxppc-dev at lists.ozlabs.org; Stuart Yoder
> Subject: RE: [PATCH V4] powerpc/85xx: Add machine check handler to fix
> PCIe erratum on mpc85xx
> 
> > > Yes, that's (one reason) why you'd want to fill in a known value.
> > > Note the "for now". :-)
> > >
> > > -Scott
> >
> > I think there is no overwhelming reason to fill the destination
> > register with 0xffffffff.
> >
> > There's a small chance that 0xffffffff is treated as regular data
> > rather than an error sign.
> >
> > Also setting this register may influence the user space under certain
> > circumstance.
> >
> > So I think just ignore the skipped instruction is an acceptable option
> > for this fix.
> 
> The 'random' value is just as likely to affect the reader, but only for
> some values - so you'll get almost impossible to repeat bugs.
> If a fixed value (0 or ~0) has an adverse effect, at least it will have
> the same every time.
> 
> Read errors are also likely to affect device drivers reading status bits,
> since these are very likely 'write to clear' any driver would have to be
> willing to process the 'dummy' value in a manner that won't loop forever
> (especially in an ISR).
> 
> You don't need every access to be via a function that explicitly
> (somehow) detects that the fault happened, but knowing that a specific
> value might be caused by a dead PCIe bus, and being able to find out
> whether that is true (to avoid looping forever) is probably useful.
> 
> This is probably similar to what a driver needs to recover from an
> external PCIe list being unplugged.
> 
> 	David
> 
> 

In my understanding filling the register could warn the executing process
an error occurred in some cases. But no way to fix the wrong behavior caused
by the instruction lost. So let's say that filling the register may benefit
a little.

On the other side, we should not access to the addresses of unknown process
in Linux kernel. We must get the instruction before filling the register.
If the instruction is not in the cache we have to access to the unknown
addresses to get it. For system security I think this is strictly forbidden.

Here is the ideas from Scott:
"
> +	if (is_in_pci_mem_space(addr)) {
> +		inst = *(unsigned int *)regs->nip;

Be careful about taking a fault here.  A simple TLB miss should be safe
given that we shouldn't be accessing PCIe in the middle of exception
code, but what if the mapping has gone away (e.g. a userspace driver had
its code munmap()ed or swapped out)?  What if permissions allow execute
but not read (not sure if Linux will allow this, but the hardware does)?

What if it happened in a KVM guest?  You can't access guest addresses
directly.
"

Although I think filling the register have some advantages but it's should
be forbidden for security reason.

-Hongtao.



More information about the Linuxppc-dev mailing list