[Cbe-oss-dev] Context switching while page fault handling and warning

Benjamin Herrenschmidt benh at kernel.crashing.org
Tue Jun 20 14:35:01 EST 2006


That looks strangely similar to a bug I'm chasing down with Jeremy at
the moment, where a PPE initiated DMA causes a SIGBUS due to a page
fault. We tracked it down to the MFC trying to DMA from a completely
bogus virtual address. It's not quite yet clear how that happens,
especially since there should have been no context save/restore in our
case (nothing else is running and we use one SPU) except for the initial
context initialisation.

I tend to suspect that the context restore code for the MFC DMA queue is
bogus in some subtle ways that I haven't totally figured out yet that
also involves the initial restore for setting up a fresh SPU thread.
Working on it... I also noticed that we tend to get both a
stop_and_signal and a stop from the actual hash fault at the same time
and depending on the weather today, not sure exactly what's up yet, and
our test program occasionally hangs with our bottom half never called
(probably because the stop & signal, I don't know yet).

There is definitely something going on ...

Ben.





More information about the cbe-oss-dev mailing list