[PATCH] powerpc/rtas: Fix hang in race against concurrent cpu offline

Nathan Lynch nathanl at linux.ibm.com
Wed Jun 26 03:29:32 AEST 2019


Juliet Kim <julietk at linux.vnet.ibm.com> writes:
> The commit
> (“powerpc/rtas: Fix a potential race between CPU-Offline & Migration)
> attempted to fix a hang in Live Partition Mobility(LPM) by abandoning
> the LPM attempt if a race between LPM and concurrent CPU offline was
> detected.
>
> However, that fix failed to notify Hypervisor that the LPM attempted
> had been abandoned which results in a system hang.

It is surprising to me that leaving a migration unterminated would cause
Linux to hang. Can you explain more about how that happens?


> Fix this by sending a signal PHYP to cancel the migration, so that PHYP
> can stop waiting, and clean up the migration.

This is well-spotted and rtas_ibm_suspend_me() needs to signal
cancellation in several error paths. But I don't agree that this is one
of them: this race is going to be a temporary condition in any
production setting, and retrying would allow the migration to succeed.


More information about the Linuxppc-dev mailing list