[PATCH][RFC] preempt_count corruption across H_CEDE call with CONFIG_PREEMPT on pseries

Thu Sep 2 14:06:07 EST 2010

On Thu, 2010-09-02 at 11:02 +1000, Michael Neuling wrote:

> We need to call smp_startup_cpu on boot when we the cpus are still in
> FW.  smp_startup_cpu does this for us on boot.
> 
> I'm wondering if we just need to move the test down a bit to make sure
> the preempt_count is set.  I've not been following this thread, but
> maybe this might work?

Egad no! Setting the preempt_count to zero _is_ the bug. I think Darren
even said that adding the exit prevented the bug (although now he's
hitting a hard lockup someplace else). The original code he was using
did not have the condition to return for kexec. It was just a
coincidence that this code helped in bringing a CPU back online.

> 
> Untested patch below...
> 
> Mikey
> 
> diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
> index 0317cce..3afaba4 100644
> --- a/arch/powerpc/platforms/pseries/smp.c
> +++ b/arch/powerpc/platforms/pseries/smp.c
> @@ -104,18 +104,18 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu)
>  
>  	pcpu = get_hard_smp_processor_id(lcpu);
>  
> -	/* Check to see if the CPU out of FW already for kexec */
> -	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
> -		cpumask_set_cpu(lcpu, of_spin_mask);
> -		return 1;
> -	}
> -
>  	/* Fixup atomic count: it exited inside IRQ handler. */
>  	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;

We DON'T want to do the above. It's nasty! This is one CPU's task
touching an intimate part of another CPU's task. It's equivalent of me
putting my hand down you wife's blouse. It's offensive, and rude.

OK, if the CPU was never online, then you can do what you want. But what
we see is that this fails on CPU hotplug.  You stop a CPU, and it goes
into this cede_processor() call. When you wake it up, suddenly the task
on that woken CPU has its preempt count fscked up.  This was really
really hard to debug. We thought it was stack corruption or something.
But it ended up being that this code has one CPU touching the breasts of
another CPU. This code is a pervert!

What the trace clearly showed, was that we take down a CPU, and in doing
so, the code on that CPU set the preempt count to 1, and it expected to
have it as 1 when it returned. But the code that kicked started this CPU
back to life (bring the CPU back online), set the preempt count on the
task of that CPU to 0, and screwed everything up.

-- Steve

>  
>  	if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
>  		goto out;
>  
> +	/* Check to see if the CPU out of FW already for kexec */
> +	if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){
> +		cpumask_set_cpu(lcpu, of_spin_mask);
> +		return 1;
> +	}
> +
>  	/* 
>  	 * If the RTAS start-cpu token does not exist then presume the
>  	 * cpu is already spinning.
>