[Cbe-oss-dev] [PATCH 5/6] spufs: fix dma restart

Fri Feb 15 06:21:50 EST 2008

On Wed, 2008-02-13 at 05:49 +0100, Arnd Bergmann wrote:

> The spu_deactivate() here seems a little drastic and should not really
> be necessary but rather seems to hide the problem behind a race between
> the context getting scheduled back in and user space fixing up the
> page.

Sort of.  I was concerned that the restart_dma() would take place
immediately since the context was loaded leading to a race condition
between the signal handler and an exception.  

> In case of nonschedulable contexts, it actually looks unconditionally
> broken!

Good point.

> 
> I'm not entirely sure what you are trying to fix, as I thought that
> the original patch was working, although I never fully understood
> why a ctx->ops->restart_dma(ctx); after triggering the signal would
> be the right thing to do...

Yes, Andre's patch works but it is working differently than we expected.

Here's some debug output that I added to Andre's patch: 

spu_run init: ctx=c00000003e528000, status=0, dsisr=0, class0=0, runcntl=0, npc=28 state=1
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=40000000, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=40000000, class0=0, runcntl=1, ret=0, npc=3fa94
fault before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, state=0, runcntl=1 npc=3fa94

The line above is printed the restart dma.  The line below immediately after the restart dma. 

Note in the before and after lines that ctx->state is runnable and all other variables are the same.

fault after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, state=0, runcntl=1 npc=3fa94

The signal gets delivered here.

spu_run init: ctx=c00000003e528000, status=21000002, dsisr=40000000, class0=0, runcntl=1, npc=3fa98 state=0

Note the line above is printed before we set runnable status bit. The exception has been regenerated! The 
restart_dma() takes effect immediately while the spu is stopped as I thought, but it doesn't matter
because now we can handle the fault properly, since the signal handler has run.

The npc=3fa94 corresponds to a printf executed in a loop.

spu_run before: ctx=c00000003e528000, status=9, dsisr=40000000, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=9, dsisr=40000000, class0=0, runcntl=1, ret=0, npc=0
spu_run before: ctx=c00000003e528000, status=1, dsisr=0, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, ret=0, npc=3fa94
spu_run init: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, npc=3fa98 state=0
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, ret=0, npc=3fa94
spu_run init: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, npc=3fa98 state=0
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, ret=0, npc=3fa94
spu_run init: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, npc=3fa98 state=0
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1

So, we don't need my patch which is a good thing as it breaks the nosched option.

Luke