[PATCH 13/29] powerpc/pseries/mobility: use stop_machine for join/suspend
Nathan Lynch
nathanl at linux.ibm.com
Sat Dec 5 03:01:52 AEDT 2020
Hi Michael,
Michael Ellerman <mpe at ellerman.id.au> writes:
> Nathan Lynch <nathanl at linux.ibm.com> writes:
>> The partition suspend sequence as specified in the platform
>> architecture requires that all active processor threads call
>> H_JOIN, which:
> ...
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 1b8ae221b98a..44ca7d4e143d 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -412,6 +414,128 @@ static int wait_for_vasi_session_suspending(u64 handle)
> ...
>
>> +
>> +static int do_join(void *arg)
>> +{
>> + atomic_t *counter = arg;
>> + long hvrc;
>> + int ret;
>> +
>> + /* Must ensure MSR.EE off for H_JOIN. */
>> + hard_irq_disable();
>
> Didn't stop_machine() already do that for us?
>
> In the state machine in multi_cpu_stop().
Yes, but I didn't want to rely on something that seems like an
implementation detail of stop_machine(). I assumed it's benign and in
keeping with hard_irq_disable()'s intended semantics to make multiple
calls to it within a critical section.
I don't feel strongly about this though. If I remove this
hard_irq_disable() I'll modify the comment to indicate that this
function relies on stop_machine() to satisfy this condition for H_JOIN.
>> + hvrc = plpar_hcall_norets(H_JOIN);
>> +
>> + switch (hvrc) {
>> + case H_CONTINUE:
>> + /*
>> + * All other CPUs are offline or in H_JOIN. This CPU
>> + * attempts the suspend.
>> + */
>> + ret = do_suspend();
>> + break;
>> + case H_SUCCESS:
>> + /*
>> + * The suspend is complete and this cpu has received a
>> + * prod.
>> + */
>> + ret = 0;
>> + break;
>> + case H_BAD_MODE:
>> + case H_HARDWARE:
>> + default:
>> + ret = -EIO;
>> + pr_err_ratelimited("H_JOIN error %ld on CPU %i\n",
>> + hvrc, smp_processor_id());
>> + break;
>> + }
>> +
>> + if (atomic_inc_return(counter) == 1) {
>> + pr_info("CPU %u waking all threads\n", smp_processor_id());
>> + prod_others();
>> + }
>
> Do we even need the counter? IIUC only one CPU receives H_CONTINUE. So
> couldn't we just have that CPU do the prodding of others?
CPUs may exit H_JOIN due to system reset interrupt at any time, and
H_JOIN may return H_HARDWARE to a caller after other CPUs have entered
the join state successfully. In these cases the counter ensures exactly
one thread performs the prod sequence.
>
>> + /*
>> + * Execution may have been suspended for several seconds, so
>> + * reset the watchdog.
>> + */
>> + touch_nmi_watchdog();
>> + return ret;
>> +}
>> +
>> +static int pseries_migrate_partition(u64 handle)
>> +{
>> + atomic_t counter = ATOMIC_INIT(0);
>> + int ret;
>> +
>> + ret = wait_for_vasi_session_suspending(handle);
>> + if (ret)
>> + goto out;
>
> Direct return would be clearer IMHO.
OK, I can change this.
More information about the Linuxppc-dev
mailing list