[PATCH] powerpc/rtas: retry when cpu offline races with suspend/migration

Michael Ellerman mpe at ellerman.id.au
Thu Jun 27 15:01:47 AEST 2019


Juliet Kim <julietk at linux.vnet.ibm.com> writes:
> On 6/25/19 1:51 PM, Nathan Lynch wrote:
>> Juliet Kim <julietk at linux.vnet.ibm.com> writes:
>>
>>> There's some concern this could retry forever, resulting in live lock.
>> First of all the system will make progress in other areas even if there
>> are repeated retries; we're not indefinitely holding locks or anything
>> like that.
>
> For instance, system admin runs a script that picks and offlines CPUs in a
> loop to keep a certain rate of onlined CPUs for energy saving. If LPM keeps
> putting CPUs back online, that would never finish, and would keepgenerating
> new offline requests
>
>> Second, Linux checks the H_VASI_STATE result on every retry. If the
>> platform wants to terminate the migration (say, if it imposes a
>> timeout), Linux will abandon it when H_VASI_STATE fails to return
>> H_VASI_SUSPENDING. And it seems incorrect to bail out before that
>> happens, absent hard errors on the Linux side such as allocation
>> failures.
> I confirmed with the PHYP and HMC folks that they wouldn't time out the LPM
> request including H_VASI_STATE, so if the LPM retries were unlucky enough to
> encounter repeated CPU offline attempts (maybe some customer code retrying
> that), then the retries could continue indefinitely, or until some manual
> intervention.  And in the mean time, the LPM delay here would cause PHYP to
> block other operations.

That sounds like a PHYP bug to me.

cheers


More information about the Linuxppc-dev mailing list