[PATCH] PPC64: draft version of EEH code.

Paul Mackerras paulus at samba.org
Thu Feb 3 08:10:45 EST 2005


Linas Vepstas writes:

> On Wed, Feb 02, 2005 at 03:17:48PM +1100, Benjamin Herrenschmidt was heard to remark:
> > 
> > cpu 0x0: Vector: 700 (Program Check) at [c0000000332c38b0]
> >     pc: 00000000077d9374
> >     lr: 000000000000dafc
> >     sp: c0000000332c3b30
> >    msr: 81002
> >   current = 0xc000000034b5b030
> >   paca    = 0xc000000000573000
> >     pid   = 11206, comm = errinjct
> 
> MSR shows it died in 32-bit real mode: i.e. in the firmware.
> 
> I've seen this sporadically, I've assumed it was the off-kilter firmware
> on the box I have access to.  But since you're seeing it...
> 
> John Rose, the maintainer of librtas (the user space lib that does
> the memory mapping) tells me that even if corrupt values are passed to
> RTAS, the firmware should not crash.  

When Ben and I dug into it a little using xmon, it turned out that
77d9374 (the pc value) was the RTAS entry point.  We dumped out memory
at that address and it was all zeroes.  Hence the 700 exception.  Ben
then straced the errinjct program, and it was reading the proc file to
get the rmo buffer base and getting back 77c9000.  Then it did an mmap
of /dev/mem at offset 77d9000 for 4096 bytes, which happens to be the
first page of the RTAS private data area.  So it looks very much like
a bug in errinjct is causing it to overwrite the first page of RTAS's
data area.  Hence the desire to see the source for errinjct.

Paul.



More information about the Linuxppc64-dev mailing list