[PATCH] powerpc/rtas: Fix hang in race against concurrent cpu offline

Juliet Kim julietk at linux.vnet.ibm.com
Sat Jun 29 06:03:59 AEST 2019


On 6/26/19 6:51 PM, Nathan Lynch wrote:
> Hi Juliet,
>
> Juliet Kim <julietk at linux.vnet.ibm.com> writes:
>> On 6/25/19 12:29 PM, Nathan Lynch wrote:
>>> Juliet Kim <julietk at linux.vnet.ibm.com> writes:
>>>> However, that fix failed to notify Hypervisor that the LPM attempted
>>>> had been abandoned which results in a system hang.
>>> It is surprising to me that leaving a migration unterminated would cause
>>> Linux to hang. Can you explain more about how that happens?
>>>
>> PHYP will block further requests(next partition migration, dlpar etc) while
>> it's in suspending state. That would have a follow-on effect on the HMC and
>> potentially this and other partitions.
> I can believe that operations on _this LPAR_ would be blocked by the
> platform and/or management console while the migration remains
> unterminated, but the OS should not be able to perpetrate a denial of
> service on other partitions or the management console simply by botching
> the LPM protocol. If it can, that's not Linux's bug to fix.
>
>
>>>> Fix this by sending a signal PHYP to cancel the migration, so that PHYP
>>>> can stop waiting, and clean up the migration.
>>> This is well-spotted and rtas_ibm_suspend_me() needs to signal
>>> cancellation in several error paths. But I don't agree that this is one
>>> of them: this race is going to be a temporary condition in any
>>> production setting, and retrying would allow the migration to
>>> succeed.
>> If LPM and CPU offine requests conflict with one another, it might be better
>> to let them fail and let the customer decide which he prefers.
> Hmm I don't think so. When (if ever) this happens in production it would
> be the result of an unlucky race with a power management daemon or
> similar, not a conscious decision of the administrator in the moment.
>
Guessing that a production race would only be against power mgmt is maybe
reasonable.  But we have an actual failure case where the race was against
an explicit offline request, and that's a legitimate/supported thing for
a customer to do.

>> IBM i cancels migration if the other OS components/operations veto
>> migration. It’s consistent with other OS behavior for LPM.
> But this situation isn't really like that. If we were to have a real
> veto mechanism, it would only make sense to have it run as early as
> possible, before the platform has done a bunch of work. This benign,
> recoverable race is occurring right before we complete the migration,
> which at this point has been copying state to the destination for
> minutes or hours. It doesn't make sense to error out like this.

Let me clarify that the cancellation is occurring in the phase preparing
for migration.It would be even better if it runs before LPM is allowed to make
a start. But that's what a long-term solution might look like.

> As I mentioned earlier though, it does make sense to signal a
> cancellation for these less-recoverable error conditions in
> rtas_ibm_suspend_me():
>
> - rtas_online_cpus_mask() failure
> - alloc_cpumask_var() failure
> - the atomic_read(&data.error) != 0 case after returning from the IPI


More information about the Linuxppc-dev mailing list