[PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
Nathan Lynch
nathanl at linux.ibm.com
Fri Aug 7 17:05:09 AEST 2020
Hi everyone,
Michael Ellerman <mpe at ellerman.id.au> writes:
> Greg Kurz <groug at kaod.org> writes:
>> On Tue, 04 Aug 2020 23:35:10 +1000
>> Michael Ellerman <mpe at ellerman.id.au> wrote:
>>> Spinning forever seems like a bad idea, but as has been demonstrated at
>>> least twice now, continuing when we don't know the state of the other
>>> CPU can lead to straight up crashes.
>>>
>>> So I think I'm persuaded that it's preferable to have the kernel stuck
>>> spinning rather than oopsing.
>>>
>>
>> +1
>>
>>> I'm 50/50 on whether we should have a cond_resched() in the loop. My
>>> first instinct is no, if we're stuck here for 20s a stack trace would be
>>> good. But then we will probably hit that on some big and/or heavily
>>> loaded machine.
>>>
>>> So possibly we should call cond_resched() but have some custom logic in
>>> the loop to print a warning if we are stuck for more than some
>>> sufficiently long amount of time.
>>
>> How long should that be ?
>
> Yeah good question.
>
> I guess step one would be seeing how long it can take on the 384 vcpu
> machine. And we can probably test on some other big machines.
>
> Hopefully Nathan can give us some idea of how long he's seen it take on
> large systems? I know he was concerned about the 20s timeout of the
> softlockup detector.
Maybe I'm not quite clear what this is referring to, but I don't think
stop-self/query-stopped-state latency increases with processor count, at
least not on PowerVM. And IIRC I was concerned with the earlier patch's
potential for causing the softlockup watchdog to rightly complain by
polling the stopped state without ever scheduling away.
The fact that smp_query_cpu_stopped() kind of collapses the two distinct
results from the query-cpu-stopped-state RTAS call into one return value
may make it harder than necessary to reason about the questions around
cond_resched() and whether to warn.
Sorry to pull this stunt but I have had some code sitting in a neglected
branch that I think gets the logic around this right.
What we should have is a simple C wrapper for the RTAS call that reflects the
architected inputs and outputs:
================================================================
(-- rtas.c --)
/**
* rtas_query_cpu_stopped_state() - Call RTAS query-cpu-stopped-state.
* @hwcpu: Identifies the processor thread to be queried.
* @status: Pointer to status, valid only on success.
*
* Determine whether the given processor thread is in the stopped
* state. If successful and @status is non-NULL, the thread's status
* is stored to @status.
*
* Return:
* * 0 - Success
* * -1 - Hardware error
* * -2 - Busy, try again later
*/
int rtas_query_cpu_stopped_state(unsigned int hwcpu, unsigned int *status)
{
unsigned int cpu_status;
int token;
int fwrc;
token = rtas_token("query-cpu-stopped-state");
fwrc = rtas_call(token, 1, 2, &cpu_status, hwcpu);
if (fwrc != 0)
goto out;
if (status != NULL)
*status = cpu_status;
out:
return fwrc;
}
================================================================
And then a utility function that waits for the remote thread to enter
stopped state, with higher-level logic for rescheduling and warning. The
fact that smp_query_cpu_stopped() currently does not handle a -2/busy
status is a bug, fixed below by using rtas_busy_delay(). Note the
justification for the explicit cond_resched() in the outer loop:
================================================================
(-- rtas.h --)
/* query-cpu-stopped-state CPU_status */
#define RTAS_QCSS_STATUS_STOPPED 0
#define RTAS_QCSS_STATUS_IN_PROGRESS 1
#define RTAS_QCSS_STATUS_NOT_STOPPED 2
(-- pseries/hotplug-cpu.c --)
/**
* wait_for_cpu_stopped() - Wait for a cpu to enter RTAS stopped state.
*/
static void wait_for_cpu_stopped(unsigned int cpu)
{
unsigned int status;
unsigned int hwcpu;
hwcpu = get_hard_smp_processor_id(cpu);
do {
int fwrc;
/*
* rtas_busy_delay() will yield only if RTAS returns a
* busy status. Since query-cpu-stopped-state can
* yield RTAS_QCSS_STATUS_IN_PROGRESS or
* RTAS_QCSS_STATUS_NOT_STOPPED for an unbounded
* period before the target thread stops, we must take
* care to explicitly reschedule while polling.
*/
cond_resched();
do {
fwrc = rtas_query_cpu_stopped_state(hwcpu, &status);
} while (rtas_busy_delay(fwrc));
if (fwrc == 0)
continue;
pr_err_ratelimited("query-cpu-stopped-state for "
"thread 0x%x returned %d\n",
hwcpu, fwrc);
goto out;
} while (status == RTAS_QCSS_STATUS_NOT_STOPPED ||
status == RTAS_QCSS_STATUS_IN_PROGRESS);
if (status != RTAS_QCSS_STATUS_STOPPED) {
pr_err_ratelimited("query-cpu-stopped-state yielded unknown "
"status %d for thread 0x%x\n",
status, hwcpu);
}
out:
return;
}
[...]
static void pseries_cpu_die(unsigned int cpu)
{
wait_for_cpu_stopped(cpu);
paca_ptrs[cpu]->cpu_start = 0;
}
================================================================
wait_for_cpu_stopped() should be able to accommodate a time-based
warning if necessary, but speaking as a likely recipient of any bug
reports that would arise here, I'm not convinced of the need and I
don't know what a good value would be. It's relatively easy to sample
the stack of a task that's apparently failing to make progress, plus I
probably would use 'perf probe' or similar to report the inputs and
outputs for the RTAS call.
I'm happy to make this a proper submission after I can clean it up and
retest it, or Michael R. is welcome to appropriate it, assuming it's
acceptable.
More information about the Linuxppc-dev
mailing list