[PATCH] powerpc/pseries: Disable CPU hotplug across migrations
Gautham R Shenoy
ego.lkml at gmail.com
Tue Sep 18 20:32:45 AEST 2018
Hi Nathan,
On Tue, Sep 18, 2018 at 1:05 AM Nathan Fontenot
<nfont at linux.vnet.ibm.com> wrote:
>
> When performing partition migrations all present CPUs must be online
> as all present CPUs must make the H_JOIN call as part of the migration
> process. Once all present CPUs make the H_JOIN call, one CPU is returned
> to make the rtas call to perform the migration to the destination system.
>
> During testing of migration and changing the SMT state we have found
> instances where CPUs are offlined, as part of the SMT state change,
> before they make the H_JOIN call. This results in a hung system where
> every CPU is either in H_JOIN or offline.
>
> To prevent this this patch disables CPU hotplug during the migration
> process.
>
> Signed-off-by: Nathan Fontenot <nfont at linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/rtas.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 8afd146bc9c7..2c7ed31c736e 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -981,6 +981,7 @@ int rtas_ibm_suspend_me(u64 handle)
> goto out;
> }
>
> + cpu_hotplug_disable();
So, some of the onlined CPUs ( via
rtas_online_cpus_mask(offline_mask);) can go still offline,
if the userspace issues an offline command, just before we execute
cpu_hotplug_disable().
So we are narrowing down the race, but it still exists. Am I missing something ?
> stop_topology_update();
>
> /* Call function on all CPUs. One of us will make the
> @@ -995,6 +996,7 @@ int rtas_ibm_suspend_me(u64 handle)
> printk(KERN_ERR "Error doing global join\n");
>
> start_topology_update();
> + cpu_hotplug_enable();
>
> /* Take down CPUs not online prior to suspend */
> cpuret = rtas_offline_cpus_mask(offline_mask);
>
--
Thanks and Regards
gautham.
More information about the Linuxppc-dev
mailing list