[Cbe-oss-dev] another context switch question

Bob Nelson rrnelson at linux.vnet.ibm.com
Wed Oct 10 06:42:29 EST 2007


On Tuesday 11 September 2007 03:22:51 pm Arnd Bergmann wrote:
> On Tuesday 11 September 2007, Luke Browning wrote:
> > long spufs_run_spu(struct file *file, struct spu_context *ctx,
> >                    u32 *npc, u32 *event)
> > {
> > 
> > 	<...>
> > 
> >         do {
> >                 ret = spufs_wait(ctx->stop_wq, spu_stopped(ctx, &status));
> >                 spu = ctx->spu;
> > 
> > spu will be NULL after a context switch.  I suspect that this is not expected
> > and it should be initialized before the spufs_wait().  I believe the intent is
> > to identify the spu where the context last ran.  
> 
> Right, this is a bug, though it's very hard to hit in practice. I think the
> patch below is the right solution for this.
> 
> Bob, can you try out that patch and make sure that oprofile still behaves
> the right way when it's applied?

OProfile seems to run fine with this patch installed on the 2.6.23-rc9 kernel.  I ran through the normal test bucket and tried a few different things and everything appears normal.

> 
> >                 if (unlikely(ret))
> >                         break;
> > 
> >                 spuctx_switch_state(ctx, SPUCTX_UTIL_SYSTEM);
> > 
> >                 if (unlikely(test_bit(SPU_SCHED_NOTIFY_ACTIVE,
> >                                       &ctx->sched_flags))) {
> >                         clear_bit(SPU_SCHED_NOTIFY_ACTIVE, &ctx->sched_flags);
> >                         if (!(status & SPU_STATUS_STOPPED_BY_STOP)) {
> >                                 spu_switch_notify(spu, ctx);
> >                                 continue;
> >                         }
> >                 }
> > 
> > More importantly, the context lock is held over spu_switch_notify() meaning
> > that the context cannot be saved or restored.  I don't think the scheduler 
> > should be dependent on another module.  We should be able to context switch
> > out the context during these calls. 
> > 
> > Note that the callout occurs for every stop and context switch and that the 
> > callout occurs before spe exceptions are handled.  I assume that the handler
> > is not allowed to read / write the spe register state.  The context may or 
> > may not be runnable as you can have an event and a context switch occur at 
> > the same time, so the handler doesn't know how to access the data.  And what 
> > would it do with the data anyway?  So, I think we should change the code so
> > that the callout is only made for context switches.  In that case, I would 
> > move it after processing of the class 0 and 1 interrupts.
> 
> Note the test for SPU_SCHED_NOTIFY_ACTIVE in there. The only time we get into
> this path is when a new consumer registers for schedule notifier events,
> e.g. when you load the oprofile kernel module. This is such a rare condition
> that I feel we should not optimize for that, but instead try to keep this
> code path as readable as possible.
> 
> 	Arnd <><
> ---
> Subject: spufs: fix potential NULL pointer dereference in spufs_run_spu
> 
> We must never use the ctx->spu member before checking for
> ctx->state == SPU_STATE_RUNNABLE, because it will be NULL
> for saved states. In the rare case that a context has been
> saved at the exact same time that we call the
> spu_switch_event_register() function, we should also not
> continue the loop in spufs_run_spu.
> 
> Signed-off-by: Arnd Bergmann <arnd.bergmann at de.ibm.com>
> 
> --- a/arch/powerpc/platforms/cell/spufs/run.c
> +++ b/arch/powerpc/platforms/cell/spufs/run.c
> @@ -296,7 +296,6 @@ static inline int spu_process_events(struct spu_context *ctx)
>  long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *event)
>  {
>  	int ret;
> -	struct spu *spu;
>  	u32 status;
> 
>  	if (mutex_lock_interruptible(&ctx->run_mutex))
> @@ -336,11 +335,11 @@ long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *event)
>  		ret = spufs_wait(ctx->stop_wq, spu_stopped(ctx, &status));
>  		if (unlikely(ret))
>  			break;
> -		spu = ctx->spu;
>  		if (unlikely(test_and_clear_bit(SPU_SCHED_NOTIFY_ACTIVE,
>  						&ctx->sched_flags))) {
> -			if (!(status & SPU_STATUS_STOPPED_BY_STOP)) {
> -				spu_switch_notify(spu, ctx);
> +			if (!(status & SPU_STATUS_STOPPED_BY_STOP) &&
> +			    (ctx->state == SPU_STATE_RUNNABLE)) {
> +				spu_switch_notify(ctx->spu, ctx);
>  				continue;
>  			}
>  		}
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev at ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
> 





More information about the cbe-oss-dev mailing list