[PATCH] powerpc/pseries: Disable CPU hotplug across migrations

Gautham R Shenoy ego at linux.vnet.ibm.com
Tue Sep 25 16:19:21 AEST 2018


On Tue, Sep 25, 2018 at 10:42:05AM +1000, Michael Ellerman wrote:
[..snip..]
> I'm not suggesting we try to bring them online after we've disabled CPU
> hotplug, if we detect that race we can just fail the migration.
> 
> Can't we do:
>  - save mask of offline CPUs
>  - bring all offline CPUs online
>  - disable CPU hotplug
>  - check if any CPUs are offline
>    - if so, we've raced with an offline
>    - bail out of the migration with an error
> 
> 
> Instead of bailing out we could go back to the start and try again for
> some number of retries, but that's probably overkill anyway.
> 
> What am I missing?

I guess that will work. The race is unlikely anyway, so I doubt
CPU-Hotplug can DDOS the partition migration.

Does the following implementation of the same look ok ? (Build tested) 

------------------------------------ X8------------------------------------- 
>From acb9eb9f8bb14cf3121aeb0589255cbc31292be7 Mon Sep 17 00:00:00 2001
From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
Date: Tue, 25 Sep 2018 11:01:18 +0530
Subject: [PATCH] powerpc/rtas: Fix a potential race between CPU-Offline & Migration

commit 85a88cabad57 ("powerpc/pseries: Disable CPU hotplug across
migrations") disables any CPU-hotplug operations when Live Partition
Migration is in progress. However, there is a minor race-window
between the time all the CPUs are onlined by rtas_ibm_suspend_me() and
the CPU-Hotplugs are disabled via cpu_hotplug_disable() when some CPUs
could be offlined by the userspace, thus nullifying the assumption
that all the CPUs are online at this point.

This patch fixes this by checking if all the present CPUs are brought
online after disabling CPU-Hotplug. Otherwise, it retries to bring the
CPUs online again for a finite number of times failing which
rtas_ibm_suspend_me() returns -EBUSY.

Suggested-by: Michael Ellerman <mpe at ellerman.id.au>
Signed-off-by: Gautham R. Shenoy <ego at linux.vnet.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 2c7ed31..e6a6425 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -934,6 +934,8 @@ int rtas_offline_cpus_mask(cpumask_var_t cpus)
 }
 EXPORT_SYMBOL(rtas_offline_cpus_mask);
 
+#define MAX_SUSPEND_HOTPLUG_RETRIES    5
+
 int rtas_ibm_suspend_me(u64 handle)
 {
 	long state;
@@ -943,6 +945,7 @@ int rtas_ibm_suspend_me(u64 handle)
 	DECLARE_COMPLETION_ONSTACK(done);
 	cpumask_var_t offline_mask;
 	int cpuret;
+	int retries = MAX_SUSPEND_HOTPLUG_RETRIES;
 
 	if (!rtas_service_present("ibm,suspend-me"))
 		return -ENOSYS;
@@ -972,6 +975,7 @@ int rtas_ibm_suspend_me(u64 handle)
 	data.token = rtas_token("ibm,suspend-me");
 	data.complete = &done;
 
+again:
 	/* All present CPUs must be online */
 	cpumask_andnot(offline_mask, cpu_present_mask, cpu_online_mask);
 	cpuret = rtas_online_cpus_mask(offline_mask);
@@ -982,6 +986,19 @@ int rtas_ibm_suspend_me(u64 handle)
 	}
 
 	cpu_hotplug_disable();
+
+	/* Check if we raced with a CPU-Offline Operation */
+	if (unlikely(!cpumask_equal(cpu_present_mask, cpu_online_mask))) {
+		cpu_hotplug_enable();
+		if (retries-- > 0)
+			goto again;
+
+		pr_err("%s: Too many concurrent CPU-Offline operation in progress\n",
+		       __func__);
+		atomic_set(&data.error, -EBUSY);
+		goto out;
+	}
+
 	stop_topology_update();
 
 	/* Call function on all CPUs.  One of us will make the
-- 
1.9.4



More information about the Linuxppc-dev mailing list