[PATCH 1/1] hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_remove()
Daniel Henrique Barboza
danielhb413 at gmail.com
Tue Mar 23 10:30:36 AEDT 2021
On 3/19/21 8:26 AM, Michael Ellerman wrote:
> Daniel Henrique Barboza <danielhb413 at gmail.com> writes:
>> Ping
>>
>> On 3/5/21 2:38 PM, Daniel Henrique Barboza wrote:
>>> Of all the reasons that dlpar_cpu_remove() can fail, the 'last online
>>> CPU' is one that can be caused directly by the user offlining CPUs
>>> in a partition/virtual machine that has hotplugged CPUs. Trying to
>>> reclaim a hotplugged CPU can fail if the CPU is now the last online in
>>> the system. This is easily reproduced using QEMU [1].
>
> Sorry, I saw this earlier and never got around to replying.
No problem. Thanks for the review!
>
> I'm wondering if we neet to catch it earlier, ie. in
> dlpar_offline_cpu().
>
> By the time we return to dlpar_cpu_remove() we've dropped the
> cpu_add_remove_lock (cpu_maps_update_done), so num_online_cpus() could
> change out from under us, meaning the num_online_cpus() == 1 check might
> trigger incorrectly (or vice versa).
>
> Something like the patch below (completely untested :D)
Makes sense. I'll try it out to see if it works.
>
> And writing that patch makes me wonder, is == 1 the right check?
>
> In most cases we'll remove all but one thread of the core, but we'll
> fail on the last thread. Leaving that core effectively stuck in SMT1. Is
> that useful behaviour? Should we instead check at the start that we can
> remove all threads of the core without going to zero online CPUs?
I think it's ok to allow SMT1 cores, speaking from QEMU perspective.
QEMU does not have a "core hotunplug" operation where the whole core is
hotunplugged at once. The CPU hotplug/unplug operations are done as single
CPU thread add/removal. If the user wants to run 4 cores, all of them with
SMT1, QEMU will allow it.
Libvirt does not operate with the core granularity either - you can specify
the amount of vcpus the guest should run with, and Libvirt will send
hotplug/unplug requests to QEMU to match the desired value. It doesn't
bother with how many threads of a core were offlined or not.
Thanks,
DHB
>
> cheers
>
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 12cbffd3c2e3..498c22331ac8 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -271,6 +271,12 @@ static int dlpar_offline_cpu(struct device_node *dn)
> if (!cpu_online(cpu))
> break;
>
> + if (num_online_cpus() == 1) {
> + pr_warn("Unable to remove last online CPU %pOFn\n", dn);
> + rc = EBUSY;
> + goto out_unlock;
> + }
> +
> cpu_maps_update_done();
> rc = device_offline(get_cpu_device(cpu));
> if (rc)
> @@ -283,6 +289,7 @@ static int dlpar_offline_cpu(struct device_node *dn)
> thread);
> }
> }
> +out_unlock:
> cpu_maps_update_done();
>
> out:
>
More information about the Linuxppc-dev
mailing list