[PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
Michael Ellerman
mpe at ellerman.id.au
Tue Aug 4 23:35:10 AEST 2020
Hi Mike,
There is a bit of history to this code, but not in a good way :)
Michael Roth <mdroth at linux.vnet.ibm.com> writes:
> For a power9 KVM guest with XIVE enabled, running a test loop
> where we hotplug 384 vcpus and then unplug them, the following traces
> can be seen (generally within a few loops) either from the unplugged
> vcpu:
>
> [ 1767.353447] cpu 65 (hwid 65) Ready to die...
> [ 1767.952096] Querying DEAD? cpu 66 (66) shows 2
> [ 1767.952311] list_del corruption. next->prev should be c00a000002470208, but was c00a000002470048
...
>
> At that point the worker thread assumes the unplugged CPU is in some
> unknown/dead state and procedes with the cleanup, causing the race with
> the XIVE cleanup code executed by the unplugged CPU.
>
> Fix this by inserting an msleep() after each RTAS call to avoid
We previously had an msleep(), but it was removed:
b906cfa397fd ("powerpc/pseries: Fix cpu hotplug")
> pseries_cpu_die() returning prematurely, and double the number of
> attempts so we wait at least a total of 5 seconds. While this isn't an
> ideal solution, it is similar to how we dealt with a similar issue for
> cede_offline mode in the past (940ce422a3).
Thiago tried to fix this previously but there was a bit of discussion
that didn't quite resolve:
https://lore.kernel.org/linuxppc-dev/20190423223914.3882-1-bauerman@linux.ibm.com/
Spinning forever seems like a bad idea, but as has been demonstrated at
least twice now, continuing when we don't know the state of the other
CPU can lead to straight up crashes.
So I think I'm persuaded that it's preferable to have the kernel stuck
spinning rather than oopsing.
I'm 50/50 on whether we should have a cond_resched() in the loop. My
first instinct is no, if we're stuck here for 20s a stack trace would be
good. But then we will probably hit that on some big and/or heavily
loaded machine.
So possibly we should call cond_resched() but have some custom logic in
the loop to print a warning if we are stuck for more than some
sufficiently long amount of time.
> Fixes: eac1e731b59ee ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1856588
This is not public.
I tend to trim Bugzilla links from the change log, because I'm not
convinced they will last forever, but it is good to have them in the
mail archive.
cheers
> Cc: Michael Ellerman <mpe at ellerman.id.au>
> Cc: Cedric Le Goater <clg at kaod.org>
> Cc: Greg Kurz <groug at kaod.org>
> Cc: Nathan Lynch <nathanl at linux.ibm.com>
> Signed-off-by: Michael Roth <mdroth at linux.vnet.ibm.com>
> ---
> arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index c6e0d8abf75e..3cb172758052 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -111,13 +111,12 @@ static void pseries_cpu_die(unsigned int cpu)
> int cpu_status = 1;
> unsigned int pcpu = get_hard_smp_processor_id(cpu);
>
> - for (tries = 0; tries < 25; tries++) {
> + for (tries = 0; tries < 50; tries++) {
> cpu_status = smp_query_cpu_stopped(pcpu);
> if (cpu_status == QCSS_STOPPED ||
> cpu_status == QCSS_HARDWARE_ERROR)
> break;
> - cpu_relax();
> -
> + msleep(100);
> }
>
> if (cpu_status != 0) {
> --
> 2.17.1
More information about the Linuxppc-dev
mailing list