[Cbe-oss-dev] [PATCH] spufs: fix SPU runqueue corruption
Akinobu Mita
mita at fixstars.com
Thu May 10 11:27:39 EST 2007
On Wed, May 09, 2007 at 04:23:53PM +0200, Arnd Bergmann wrote:
> On Wednesday 09 May 2007, Arnd Bergmann wrote:
> > > It is possible that two processes simultaneously try to
> > > make one SPU context runnable. Then both process is trying to add
> > > the context to runqueue and break runqueue list.
> > >
> > > For example, one process is blocking at run_spu() and waiting for idle SPU.
> > > another process is reading its psmap.
> > >
> > > This patch prevents the SPU context which is already in runqueue
> > > to be queueed runqueue again and check whether the context already
> > > has been runnable or being destroyed after spu_prio_wait().
> > >
> > > I'm not sure this patch doing right thing to fix. But it prevents
> > > Oops by runqueue corruption.
> > >
> > > This is reproducible by direct_ps on PS3.
> > > direct_ps is the sample program in Cell SDK
> > > (/opt/ibm/cell-sdk/prototype/src/tests/direct_problem_state/direct_ps.c)
> > >
> > > Signed-off-by: Akinobu Mita <mita at fixstars.com>
> >
> > Looks good to me. I'll add this to my patch series and forward
> > it upstream. Thanks!
>
> Actually, I just remember that Luke Browning has already provided a better
> fix for this bug and we have merged that in 2.6.22-rc, see:
> http://patchwork.ozlabs.org/cbe-oss-dev/patch?id=10598
I made that patch ontop of this fix. I'm still seeing Oops by direct_ps.
Could you try to run direct_ps in Cell SDK with an argument greater
than total number of SPUs on your latest tree?
More information about the cbe-oss-dev
mailing list