[PATCH 1/1] hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_remove()

Michael Ellerman mpe at ellerman.id.au
Fri Mar 19 22:26:59 AEDT 2021


Daniel Henrique Barboza <danielhb413 at gmail.com> writes:
> Ping
>
> On 3/5/21 2:38 PM, Daniel Henrique Barboza wrote:
>> Of all the reasons that dlpar_cpu_remove() can fail, the 'last online
>> CPU' is one that can be caused directly by the user offlining CPUs
>> in a partition/virtual machine that has hotplugged CPUs. Trying to
>> reclaim a hotplugged CPU can fail if the CPU is now the last online in
>> the system. This is easily reproduced using QEMU [1].

Sorry, I saw this earlier and never got around to replying.

I'm wondering if we neet to catch it earlier, ie. in
dlpar_offline_cpu().

By the time we return to dlpar_cpu_remove() we've dropped the
cpu_add_remove_lock (cpu_maps_update_done), so num_online_cpus() could
change out from under us, meaning the num_online_cpus() == 1 check might
trigger incorrectly (or vice versa).

Something like the patch below (completely untested :D)

And writing that patch makes me wonder, is == 1 the right check?

In most cases we'll remove all but one thread of the core, but we'll
fail on the last thread. Leaving that core effectively stuck in SMT1. Is
that useful behaviour? Should we instead check at the start that we can
remove all threads of the core without going to zero online CPUs?

cheers


diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..498c22331ac8 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -271,6 +271,12 @@ static int dlpar_offline_cpu(struct device_node *dn)
 			if (!cpu_online(cpu))
 				break;
 
+			if (num_online_cpus() == 1) {
+				pr_warn("Unable to remove last online CPU %pOFn\n", dn);
+				rc = EBUSY;
+				goto out_unlock;
+			}
+
 			cpu_maps_update_done();
 			rc = device_offline(get_cpu_device(cpu));
 			if (rc)
@@ -283,6 +289,7 @@ static int dlpar_offline_cpu(struct device_node *dn)
 				thread);
 		}
 	}
+out_unlock:
 	cpu_maps_update_done();
 
 out:


More information about the Linuxppc-dev mailing list