[PATCH v2] powerpc/pseries: Only wait for dying CPU after call to rtas_stop_self()

Tue Mar 12 06:29:41 AEDT 2019

Hello Gautham,

Thanks for your review.

Gautham R Shenoy <ego at linux.vnet.ibm.com> writes:

> Hello Thiago,
>
> On Fri, Feb 22, 2019 at 07:57:52PM -0300, Thiago Jung Bauermann wrote:
>> I see two cases that can be causing this race:
>>
>> 1. It's possible that CPU 134 was inactive at the time it was unplugged. In
>>    that case, dlpar_offline_cpu() calls H_PROD on that CPU and immediately
>>    calls pseries_cpu_die(). Meanwhile, the prodded CPU activates and start
>>    the process of stopping itself. It's possible that the busy loop is not
>>    long enough to allow for the CPU to wake up and complete the stopping
>>    process.
>
> The problem is a bit more severe since, after printing "Querying
> DEAD?" for CPU X, this CPU can prod another offline CPU Y on the same
> core which, on waking up, will call rtas_stop_self. Thus we can have two
> concurrent calls to rtas-stop-self, which is prohibited by the PAPR.

Inded, very good point. I added this information to the patch
description.

>> 2. If CPU 134 was online at the time it was unplugged, it would have gone
>>    through the new CPU hotplug state machine in kernel/cpu.c that was
>>    introduced in v4.6 to get itself stopped. It's possible that the busy
>>    loop in pseries_cpu_die() was long enough for the older hotplug code but
>>    not for the new hotplug state machine.
>
> I haven't been able to observe the "Querying DEAD?" messages for the
> online CPU which was being offlined and dlpar'ed out.

Ah, thanks for pointing this out. That was a scenario I thought could
happen when I was investigating this issue but I never confirmed whether
it could really happen. I removed it from the patch description.

>> I don't know if this race condition has any ill effects, but we can make
>> the race a lot more even if we only start querying if the CPU is stopped
>> when the stopping CPU is close to call rtas_stop_self().
>>
>> Since pseries_mach_cpu_die() sets the CPU current state to offline almost
>> immediately before calling rtas_stop_self(), we use that as a signal that
>> it is either already stopped or very close to that point, and we can start
>> the busy loop.
>>
>> As suggested by Michael Ellerman, this patch also changes the busy loop to
>> wait for a fixed amount of wall time.
>>
>> Signed-off-by: Thiago Jung Bauermann <bauerman at linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> I tried to estimate good amounts for the timeout and loop delays, but
>> I'm not sure how reasonable my numbers are. The busy loops will wait for
>> 100 µs between each try, and spin_event_timeout() will timeout after
>> 100 ms. I'll be happy to change these values if you have better
>> suggestions.
>
> Based on the measurements that I did on a POWER9 system, in successful
> cases of smp_query_cpu_stopped(cpu) returning affirmative, the maximum
> time spent inside the loop was was 10ms.

That's very good to know. I added this information to the patch
description.

I also added you in an Analyzed-by tag, I hope it's fine with you.

>> Gautham was able to test this patch and it solved the race condition.
>>
>> v1 was a cruder patch which just increased the number of loops:
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html
>>
>> v1 also mentioned a kernel crash but Gautham narrowed it down to a bug
>> in RTAS, which is in the process of being fixed.
>>
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index 97feb6e79f1a..424146cc752e 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -214,13 +214,21 @@ static void pseries_cpu_die(unsigned int cpu)
>>  			msleep(1);
>>  		}
>>  	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
>> +		/*
>> +		 * If the current state is not offline yet, it means that the
>> +		 * dying CPU (which is in pseries_mach_cpu_die) didn't have a
>> +		 * chance to call rtas_stop_self yet and therefore it's too
>> +		 * early to query if the CPU is stopped.
>> +		 */
>> +		spin_event_timeout(get_cpu_current_state(cpu) == CPU_STATE_OFFLINE,
>> +				   100000, 100);
>>
>>  		for (tries = 0; tries < 25; tries++) {
>
> Can we bumped up the tries to 100, so that we wait for 10ms before
> printing the warning message ?

Good idea. I increased the loop to 200 iterations so that it can take up
to 20 ms, just to be sure.

>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>  			if (cpu_status == QCSS_STOPPED ||
>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>  				break;
>> -			cpu_relax();
>> +			udelay(100);
>>  		}
>>  	}
>>

--
Thiago Jung Bauermann
IBM Linux Technology Center