[PATCH v2 4/4] powerpc/pseries/cpuhp: remove obsolete comment from pseries_cpu_die

Michael Ellerman mpe at ellerman.id.au
Mon Oct 11 23:34:28 AEDT 2021


Nathan Lynch <nathanl at linux.ibm.com> writes:
> Michael Ellerman <mpe at ellerman.id.au> writes:
>> Daniel Henrique Barboza <danielhb413 at gmail.com> writes:
>>> This is enough to say that we can't easily see the history behind this comment.
>>> I also believe that we're better of without it since it doesn't make sense
>>> with the current codebase.
>>
>> It was added by the original CPU hotplug commit for ppc64::
>>
>> https://github.com/mpe/linux-fullhistory/commit/0e9fd9441cd2113b67b14e739267c9e69761489b
>>
>>
>> The code was fairly similar:
>>
>> void __cpu_die(unsigned int cpu)
>> {
>> 	int tries;
>> 	int cpu_status;
>> 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
>>
>> 	for (tries = 0; tries < 5; tries++) {
>> 		cpu_status = query_cpu_stopped(pcpu);
>>
>> 		if (cpu_status == 0)
>> 			break;
>> 		set_current_state(TASK_UNINTERRUPTIBLE);
>> 		schedule_timeout(HZ);
>> 	}
>> 	if (cpu_status != 0) {
>> 		printk("Querying DEAD? cpu %i (%i) shows %i\n",
>> 		       cpu, pcpu, cpu_status);
>> 	}
>>
>> 	/* Isolation and deallocation are definatly done by
>> 	 * drslot_chrp_cpu.  If they were not they would be
>> 	 * done here.  Change isolate state to Isolate and
>> 	 * change allocation-state to Unusable.
>> 	 */
>> 	paca[cpu].xProcStart = 0;
>>
>> 	/* So we can recognize if it fails to come up next time. */
>> 	cpu_callin_map[cpu] = 0;
>> }
>>
>>
>> drslot_chrp_cpu() still exists in drmgr:
>>
>>   https://github.com/ibm-power-utilities/powerpc-utils/blob/e798c4a09fbf0fa0f421e624cfa366a6c405c9fe/src/drmgr/drslot_chrp_cpu.c#L406
>>
>>
>> I agree the comment is no longer meaningful and can be removed.
>
> Thanks for providing this background.
>
>> It might be good to then add a comment explaining why we need to set
>> cpu_start = 0.
>
> Sure, I can take that as a follow-up. Or perhaps it should be moved to
> the online path.

Yeah possibly.

>> It's not immediately clear why we need to. When we bring a CPU back
>> online in smp_pSeries_kick_cpu() we ask RTAS to start it and then
>> immediately set cpu_start = 1, ie. there isn't a separate step that sets
>> cpu_start = 1 for hotplugged CPUs.
>
> Hmm I'm not following the distinction you seem to be drawing between
> bringing a CPU back online and a hotplugged CPU. kick_cpu is used in all
> cases AFAIK.

Yeah that wasn't very clear, reading it back I have half confused myself.

At boot we need the paca->cpu_start flag because some CPUs can be
spinning in generic_secondary_common_init, in head_64.S.

ie. they're not in RTAS, they're spinning in kernel code, and the only
thing that stops them coming "online" in the Linux sense is
paca->cpu_start.

You can see that in pseries/smp.c:

static inline int smp_startup_cpu(unsigned int lcpu)
{
	...
	if (cpumask_test_cpu(lcpu, of_spin_mask))
		/* Already started by OF and sitting in spin loop */
		return 1;


We also hit that case when kexec'ing, where all the secondaries come in
that way.


On the other hand when we offline a CPU, we set paca->cpu_start = 0, in
pseries_cpu_die(), and then we return the CPU to RTAS.

The only way it *should* come back online is via smp_pSeries_kick_cpu(),
which calls smp_startup_cpu() to bring the CPU out of RTAS, and then
smp_pSeries_kick_cpu() immediately sets cpu_start = 1.

So the sequence is:

	CPU goes offline from Linux POV
	paca->cpu_start = 0;
        CPU offline in RTAS
        ...
        CPU brought out of RTAS
	paca->cpu_start = 1;
	CPU comes back online from Linux POV


But I guess I kind of answered my own question above, where I said
*should*. Clearing paca->cpu_start when we offline the CPU gives us a
little bit of backup if the CPU comes out of RTAS unexpectedly. ie. it
would then spin on paca->cpu_start, rather than spontaneously coming
back into Linux when we weren't expecting it.

cheers


More information about the Linuxppc-dev mailing list