[PATCH] powerpc/pseries: Disable CPU hotplug across migrations
Michael Ellerman
mpe at ellerman.id.au
Mon Sep 24 17:00:42 AEST 2018
Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:
> On 09/18/2018 05:32 AM, Gautham R Shenoy wrote:
>> Hi Nathan,
>> On Tue, Sep 18, 2018 at 1:05 AM Nathan Fontenot
>> <nfont at linux.vnet.ibm.com> wrote:
>>>
>>> When performing partition migrations all present CPUs must be online
>>> as all present CPUs must make the H_JOIN call as part of the migration
>>> process. Once all present CPUs make the H_JOIN call, one CPU is returned
>>> to make the rtas call to perform the migration to the destination system.
>>>
>>> During testing of migration and changing the SMT state we have found
>>> instances where CPUs are offlined, as part of the SMT state change,
>>> before they make the H_JOIN call. This results in a hung system where
>>> every CPU is either in H_JOIN or offline.
>>>
>>> To prevent this this patch disables CPU hotplug during the migration
>>> process.
>>>
>>> Signed-off-by: Nathan Fontenot <nfont at linux.vnet.ibm.com>
>>> ---
>>> arch/powerpc/kernel/rtas.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>> index 8afd146bc9c7..2c7ed31c736e 100644
>>> --- a/arch/powerpc/kernel/rtas.c
>>> +++ b/arch/powerpc/kernel/rtas.c
>>> @@ -981,6 +981,7 @@ int rtas_ibm_suspend_me(u64 handle)
>>> goto out;
>>> }
>>>
>>> + cpu_hotplug_disable();
>>
>> So, some of the onlined CPUs ( via
>> rtas_online_cpus_mask(offline_mask);) can go still offline,
>> if the userspace issues an offline command, just before we execute
>> cpu_hotplug_disable().
>>
>> So we are narrowing down the race, but it still exists. Am I missing something ?
>
> You're correct, this narrows the window in which a CPU can go offline.
>
> In testing with this patch we have not been able to re-create the failure but
> there is still a small window.
Well let's close it.
We just need to check that all present CPUs are online after we've
called cpu_hotplug_disable() don't we?
cheers
More information about the Linuxppc-dev
mailing list