[PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death

Nathan Lynch nathanl at linux.ibm.com
Wed Aug 12 00:46:08 AEST 2020


Michael Ellerman <mpe at ellerman.id.au> writes:

> Michael Roth <mdroth at linux.vnet.ibm.com> writes:
>> Quoting Nathan Lynch (2020-08-07 02:05:09)
> ...
>>> wait_for_cpu_stopped() should be able to accommodate a time-based
>>> warning if necessary, but speaking as a likely recipient of any bug
>>> reports that would arise here, I'm not convinced of the need and I
>>> don't know what a good value would be. It's relatively easy to sample
>>> the stack of a task that's apparently failing to make progress, plus I
>>> probably would use 'perf probe' or similar to report the inputs and
>>> outputs for the RTAS call.
>>
>> I think if we make the timeout sufficiently high like 2 minutes or so
>> it wouldn't hurt and if we did seem them it would probably point to an
>> actual bug. But I don't have a strong feeling either way.
>
> I think we should print a warning after 2 minutes.
>
> It's true that there are fairly easy mechanisms to work out where the
> thread is stuck, but customers are unlikely to use them. They're just
> going to report that it's stuck with no further info, and probably
> reboot the machine before we get a chance to get any further info.
>
> Whereas if the kernel prints a warning with a stack trace we at least
> have that to go on in an initial bug report.
>
>>> I'm happy to make this a proper submission after I can clean it up and
>>> retest it, or Michael R. is welcome to appropriate it, assuming it's
>>> acceptable.
>>> 
>>
>> I've given it a shot with this patch and it seems to be holding up in
>> testing. If we don't think the ~2 minutes warning message is needed I
>> can clean it up to post:
>>
>> https://github.com/mdroth/linux/commit/354b8c97bf0dc1146e36aa72273f5b33fe90d09e
>>
>> I'd likely break the refactoring patches out to a separate patch under
>> Nathan's name since it fixes a separate bug potentially.
>
> While I like Nathan's refactoring, we probably want to do the minimal
> fix first to ease backporting.
>
> Then do the refactoring on top of that.

Fair enough, thanks.


More information about the Linuxppc-dev mailing list