[PATCH] powerpc/pseries: Disable CPU hotplug across migrations
Nathan Fontenot
nfont at linux.vnet.ibm.com
Fri Sep 21 01:03:40 AEST 2018
On 09/18/2018 05:32 AM, Gautham R Shenoy wrote:
> Hi Nathan,
> On Tue, Sep 18, 2018 at 1:05 AM Nathan Fontenot
> <nfont at linux.vnet.ibm.com> wrote:
>>
>> When performing partition migrations all present CPUs must be online
>> as all present CPUs must make the H_JOIN call as part of the migration
>> process. Once all present CPUs make the H_JOIN call, one CPU is returned
>> to make the rtas call to perform the migration to the destination system.
>>
>> During testing of migration and changing the SMT state we have found
>> instances where CPUs are offlined, as part of the SMT state change,
>> before they make the H_JOIN call. This results in a hung system where
>> every CPU is either in H_JOIN or offline.
>>
>> To prevent this this patch disables CPU hotplug during the migration
>> process.
>>
>> Signed-off-by: Nathan Fontenot <nfont at linux.vnet.ibm.com>
>> ---
>> arch/powerpc/kernel/rtas.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 8afd146bc9c7..2c7ed31c736e 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -981,6 +981,7 @@ int rtas_ibm_suspend_me(u64 handle)
>> goto out;
>> }
>>
>> + cpu_hotplug_disable();
>
> So, some of the onlined CPUs ( via
> rtas_online_cpus_mask(offline_mask);) can go still offline,
> if the userspace issues an offline command, just before we execute
> cpu_hotplug_disable().
>
> So we are narrowing down the race, but it still exists. Am I missing something ?
You're correct, this narrows the window in which a CPU can go offline.
In testing with this patch we have not been able to re-create the failure but
there is still a small window.
-Nathan
>
>> stop_topology_update();
>>
>> /* Call function on all CPUs. One of us will make the
>> @@ -995,6 +996,7 @@ int rtas_ibm_suspend_me(u64 handle)
>> printk(KERN_ERR "Error doing global join\n");
>>
>> start_topology_update();
>> + cpu_hotplug_enable();
>>
>> /* Take down CPUs not online prior to suspend */
>> cpuret = rtas_offline_cpus_mask(offline_mask);
>>
>
>
More information about the Linuxppc-dev
mailing list