[Cbe-oss-dev] [PATCH 5/6] spufs: fix dma restart
Luke Browning
lukebr at linux.vnet.ibm.com
Fri Feb 15 06:21:50 EST 2008
On Wed, 2008-02-13 at 05:49 +0100, Arnd Bergmann wrote:
> The spu_deactivate() here seems a little drastic and should not really
> be necessary but rather seems to hide the problem behind a race between
> the context getting scheduled back in and user space fixing up the
> page.
Sort of. I was concerned that the restart_dma() would take place
immediately since the context was loaded leading to a race condition
between the signal handler and an exception.
> In case of nonschedulable contexts, it actually looks unconditionally
> broken!
Good point.
>
> I'm not entirely sure what you are trying to fix, as I thought that
> the original patch was working, although I never fully understood
> why a ctx->ops->restart_dma(ctx); after triggering the signal would
> be the right thing to do...
Yes, Andre's patch works but it is working differently than we expected.
Here's some debug output that I added to Andre's patch:
spu_run init: ctx=c00000003e528000, status=0, dsisr=0, class0=0, runcntl=0, npc=28 state=1
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=40000000, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=40000000, class0=0, runcntl=1, ret=0, npc=3fa94
fault before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, state=0, runcntl=1 npc=3fa94
The line above is printed the restart dma. The line below immediately after the restart dma.
Note in the before and after lines that ctx->state is runnable and all other variables are the same.
fault after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, state=0, runcntl=1 npc=3fa94
The signal gets delivered here.
spu_run init: ctx=c00000003e528000, status=21000002, dsisr=40000000, class0=0, runcntl=1, npc=3fa98 state=0
Note the line above is printed before we set runnable status bit. The exception has been regenerated! The
restart_dma() takes effect immediately while the spu is stopped as I thought, but it doesn't matter
because now we can handle the fault properly, since the signal handler has run.
The npc=3fa94 corresponds to a printf executed in a loop.
spu_run before: ctx=c00000003e528000, status=9, dsisr=40000000, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=9, dsisr=40000000, class0=0, runcntl=1, ret=0, npc=0
spu_run before: ctx=c00000003e528000, status=1, dsisr=0, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, ret=0, npc=3fa94
spu_run init: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, npc=3fa98 state=0
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, ret=0, npc=3fa94
spu_run init: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, npc=3fa98 state=0
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1
spu_run after: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, ret=0, npc=3fa94
spu_run init: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1, npc=3fa98 state=0
spu_run before: ctx=c00000003e528000, status=21000002, dsisr=0, class0=0, runcntl=1
So, we don't need my patch which is a good thing as it breaks the nosched option.
Luke
More information about the cbe-oss-dev
mailing list