[PATCH] powerpc/pseries: Disable CPU hotplug across migrations

Michael Ellerman mpe at ellerman.id.au
Tue Sep 25 10:42:05 AEST 2018


Gautham R Shenoy <ego at linux.vnet.ibm.com> writes:
> On Mon, Sep 24, 2018 at 05:00:42PM +1000, Michael Ellerman wrote:
>> Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:
>> > On 09/18/2018 05:32 AM, Gautham R Shenoy wrote:
>> >> On Tue, Sep 18, 2018 at 1:05 AM Nathan Fontenot
>> >>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> >>> index 8afd146bc9c7..2c7ed31c736e 100644
>> >>> --- a/arch/powerpc/kernel/rtas.c
>> >>> +++ b/arch/powerpc/kernel/rtas.c
>> >>> @@ -981,6 +981,7 @@ int rtas_ibm_suspend_me(u64 handle)
>> >>>                 goto out;
>> >>>         }
>> >>>
>> >>> +       cpu_hotplug_disable();
>> >> 
>> >> So, some of the onlined CPUs ( via
>> >> rtas_online_cpus_mask(offline_mask);) can go still offline,
>> >> if the userspace issues an offline command, just before we execute
>> >> cpu_hotplug_disable().
>> >> 
>> >> So we are narrowing down the race, but it still exists. Am I missing something ?
>> >
>> > You're correct, this narrows the window in which a CPU can go offline.
>> >
>> > In testing with this patch we have not been able to re-create the failure but
>> > there is still a small window.
>> 
>> Well let's close it.
>> 
>> We just need to check that all present CPUs are online after we've
>> called cpu_hotplug_disable() don't we?
>
> Yes. However, we cannot use the cpu_up() API to bring the offline CPUs
> online, since will return with an -EBUSY if CPU-Hotplug has been
> disabled.

I'm not suggesting we try to bring them online after we've disabled CPU
hotplug, if we detect that race we can just fail the migration.

Can't we do:
 - save mask of offline CPUs
 - bring all offline CPUs online
 - disable CPU hotplug
 - check if any CPUs are offline
   - if so, we've raced with an offline
   - bail out of the migration with an error


Instead of bailing out we could go back to the start and try again for
some number of retries, but that's probably overkill anyway.

What am I missing?

cheers


More information about the Linuxppc-dev mailing list