[PATCH] powerpc/pseries: Disable CPU hotplug across migrations

Michael Ellerman mpe at ellerman.id.au
Mon Sep 24 17:00:42 AEST 2018


Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:
> On 09/18/2018 05:32 AM, Gautham R Shenoy wrote:
>> Hi Nathan,
>> On Tue, Sep 18, 2018 at 1:05 AM Nathan Fontenot
>> <nfont at linux.vnet.ibm.com> wrote:
>>>
>>> When performing partition migrations all present CPUs must be online
>>> as all present CPUs must make the H_JOIN call as part of the migration
>>> process. Once all present CPUs make the H_JOIN call, one CPU is returned
>>> to make the rtas call to perform the migration to the destination system.
>>>
>>> During testing of migration and changing the SMT state we have found
>>> instances where CPUs are offlined, as part of the SMT state change,
>>> before they make the H_JOIN call. This results in a hung system where
>>> every CPU is either in H_JOIN or offline.
>>>
>>> To prevent this this patch disables CPU hotplug during the migration
>>> process.
>>>
>>> Signed-off-by: Nathan Fontenot <nfont at linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/kernel/rtas.c |    2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 8afd146bc9c7..2c7ed31c736e 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -981,6 +981,7 @@ int rtas_ibm_suspend_me(u64 handle)
>>>                 goto out;
>>>         }
>>>
>>> +       cpu_hotplug_disable();
>> 
>> So, some of the onlined CPUs ( via
>> rtas_online_cpus_mask(offline_mask);) can go still offline,
>> if the userspace issues an offline command, just before we execute
>> cpu_hotplug_disable().
>> 
>> So we are narrowing down the race, but it still exists. Am I missing something ?
>
> You're correct, this narrows the window in which a CPU can go offline.
>
> In testing with this patch we have not been able to re-create the failure but
> there is still a small window.

Well let's close it.

We just need to check that all present CPUs are online after we've
called cpu_hotplug_disable() don't we?

cheers


More information about the Linuxppc-dev mailing list