[Cbe-oss-dev] another context switch question
Arnd Bergmann
arnd at arndb.de
Wed Sep 12 06:22:51 EST 2007
On Tuesday 11 September 2007, Luke Browning wrote:
> long spufs_run_spu(struct file *file, struct spu_context *ctx,
> u32 *npc, u32 *event)
> {
>
> <...>
>
> do {
> ret = spufs_wait(ctx->stop_wq, spu_stopped(ctx, &status));
> spu = ctx->spu;
>
> spu will be NULL after a context switch. I suspect that this is not expected
> and it should be initialized before the spufs_wait(). I believe the intent is
> to identify the spu where the context last ran.
Right, this is a bug, though it's very hard to hit in practice. I think the
patch below is the right solution for this.
Bob, can you try out that patch and make sure that oprofile still behaves
the right way when it's applied?
> if (unlikely(ret))
> break;
>
> spuctx_switch_state(ctx, SPUCTX_UTIL_SYSTEM);
>
> if (unlikely(test_bit(SPU_SCHED_NOTIFY_ACTIVE,
> &ctx->sched_flags))) {
> clear_bit(SPU_SCHED_NOTIFY_ACTIVE, &ctx->sched_flags);
> if (!(status & SPU_STATUS_STOPPED_BY_STOP)) {
> spu_switch_notify(spu, ctx);
> continue;
> }
> }
>
> More importantly, the context lock is held over spu_switch_notify() meaning
> that the context cannot be saved or restored. I don't think the scheduler
> should be dependent on another module. We should be able to context switch
> out the context during these calls.
>
> Note that the callout occurs for every stop and context switch and that the
> callout occurs before spe exceptions are handled. I assume that the handler
> is not allowed to read / write the spe register state. The context may or
> may not be runnable as you can have an event and a context switch occur at
> the same time, so the handler doesn't know how to access the data. And what
> would it do with the data anyway? So, I think we should change the code so
> that the callout is only made for context switches. In that case, I would
> move it after processing of the class 0 and 1 interrupts.
Note the test for SPU_SCHED_NOTIFY_ACTIVE in there. The only time we get into
this path is when a new consumer registers for schedule notifier events,
e.g. when you load the oprofile kernel module. This is such a rare condition
that I feel we should not optimize for that, but instead try to keep this
code path as readable as possible.
Arnd <><
---
Subject: spufs: fix potential NULL pointer dereference in spufs_run_spu
We must never use the ctx->spu member before checking for
ctx->state == SPU_STATE_RUNNABLE, because it will be NULL
for saved states. In the rare case that a context has been
saved at the exact same time that we call the
spu_switch_event_register() function, we should also not
continue the loop in spufs_run_spu.
Signed-off-by: Arnd Bergmann <arnd.bergmann at de.ibm.com>
--- a/arch/powerpc/platforms/cell/spufs/run.c
+++ b/arch/powerpc/platforms/cell/spufs/run.c
@@ -296,7 +296,6 @@ static inline int spu_process_events(struct spu_context *ctx)
long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *event)
{
int ret;
- struct spu *spu;
u32 status;
if (mutex_lock_interruptible(&ctx->run_mutex))
@@ -336,11 +335,11 @@ long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *event)
ret = spufs_wait(ctx->stop_wq, spu_stopped(ctx, &status));
if (unlikely(ret))
break;
- spu = ctx->spu;
if (unlikely(test_and_clear_bit(SPU_SCHED_NOTIFY_ACTIVE,
&ctx->sched_flags))) {
- if (!(status & SPU_STATUS_STOPPED_BY_STOP)) {
- spu_switch_notify(spu, ctx);
+ if (!(status & SPU_STATUS_STOPPED_BY_STOP) &&
+ (ctx->state == SPU_STATE_RUNNABLE)) {
+ spu_switch_notify(ctx->spu, ctx);
continue;
}
}
More information about the cbe-oss-dev
mailing list