[RFC] powerpc/pseries: Increase busy loop in pseries_cpu_die

Wed Feb 8 13:59:07 AEDT 2017

Thiago Jung Bauermann <bauerman at linux.vnet.ibm.com> writes:

> Am Dienstag, 7. Februar 2017, 08:26:45 BRST schrieb Balbir Singh:
>> On Mon, Feb 06, 2017 at 04:58:16PM -0200, Thiago Jung Bauermann wrote:
>> > [  447.714064] Querying DEAD? cpu 134 (134) shows 2
>> > cpu 0x86: Vector: 300 (Data Access) at [c000000007b0fd40]
>> > 
>> >     pc: 000000001ec3072c
>> >     lr: 000000001ec2fee0
>> >     sp: 1faf6bd0
>> >    
>> >    msr: 8000000102801000
>> >    dar: 212d6c1a2a20c
>> 
>> This looks like we accessed a bad address, but why?
>
> Am Dienstag, 7. Februar 2017, 13:10:22 BRST schrieb Michael Ellerman:
>> We shouldn't be crashing.
>> 
>> So we need to fix that.
>> 
>> We may also need to increase the timeout, though it's pretty gross TBH.
>> 
>> But step one is make sure we don't crash.
>
> I didn't analyze exactly what is causing the CPU to crash because the root 
> cause is the inconsistency between what the kernel thinks the CPU state is and 
> reality. But if we have to be able to handle that inconsistency I will keep 
> digging and try to fix that.

Please do.

It's obviously not a good situation, but it shouldn't crash the kernel.

I realise that may be easier said than done, but that should be the
goal. I see smp_ops->cpu_die() can't fail (returns void), which may be
part of the problem.

>> > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> > @@ -206,7 +206,7 @@ static void pseries_cpu_die(unsigned int cpu)
>> > 
>> >  		}
>> >  	
>> >  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> > 
>> > -		for (tries = 0; tries < 25; tries++) {
>> > +		for (tries = 0; tries < 5000; tries++) {
>> 
>> This fixes some of the asymmetry between handling of CPU_STATE_INACTIVE
>> and CPU_STATE_OFFLINE, but I think we can probably move the cpu_relax()
>> to msleep(1).
>
> I didn't change it to msleep() because I thought it would introduce a 
> regression. commit b906cfa397fd ("powerpc/pseries: Fix cpu hotplug") changed a 
> msleep(200) that was there to a cpu_relax() with this explanation:
>
>     Currently, pseries_cpu_die() calls msleep() while polling RTAS for 
>     the status of the dying cpu.
>
>     However, if the cpu that is going down also happens to be the one 
>     doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is
>     only passed later on in tick_shutdown() when _cpu_down() does the 
>     CPU_DEAD notification.  Therefore jiffies won't be updated anymore.
>
>     This replaces that msleep() with a cpu_relax() to make sure we're not 
>     going to schedule at that point.
>
>     With this patch my test box survives a 100k iterations hotplug stress
>     test on _all_ cpus, whereas without it, it quickly dies after ~50 
>     iterations.
>
> I can try to add it back and see what happens. Perhaps that situation won't 
> happen anymore with today's kernel.

The relative timing of the events may have changed. But the fundamental
problem still exists AFAIK.

So we'd need a pretty solid guarantee that the problem can't happy
anymore before I'd change it to msleep().

However it could/should at least use udelay(), which AFAIK is not
reliant on jiffies. That way the loop will actually take a set amount of
wall time, rather than just cycles.

cheers