[PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU

Thu May 2 09:12:34 AEST 2019

Nathan Lynch's on May 2, 2019 12:57 am:
> Hi Thiago,
> 
> Thiago Jung Bauermann <bauerman at linux.ibm.com> writes:
>> Nathan Lynch <nathanl at linux.ibm.com> writes:
>>> Thiago Jung Bauermann <bauerman at linux.ibm.com> writes:
>>>> +		while (true) {
>>>>  			cpu_status = smp_query_cpu_stopped(pcpu);
>>>>  			if (cpu_status == QCSS_STOPPED ||
>>>>  			    cpu_status == QCSS_HARDWARE_ERROR)
>>>>  				break;
>>>> -			cpu_relax();
>>>> +			udelay(100);
>>>>  		}
>>>>  	}
>>>
>>> I agree with looping indefinitely but doesn't it need a cond_resched()
>>> or similar check?
>>
>> If there's no kernel or hypervisor bug, it shouldn't take more than a
>> few tens of ms for this loop to complete (Gautham measured a maximum of
>> 10 ms on a POWER9 with an earlier version of this patch).
> 
> 10ms is twice the default scheduler quantum...
> 
> 
>> In case of bugs related to CPU hotplug (either in the kernel or the
>> hypervisor), I was hoping that the resulting lockup warnings would be a
>> good indicator that something is wrong. :-)
> 
> Not convinced we should assume something is wrong if it takes a few
> dozen ms to complete the operation.

Right, and if there is no kernel or hypervisor bug then it will stop
eventually :)

> AFAIK we don't have any guarantees
> about the maximum latency of stop-self, and it can be affected by other
> activity in the system, whether we're in shared processor mode, etc. Not
> to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
> be serialized with other tasks calling into RTAS. So I am concerned
> about generating spurious warnings here.

Agreed.

> 
> If for whatever reason the operation is taking too long, drmgr or
> whichever application is initiating the change will appear to stop
> making progress. It's not too hard to find out what's going on with
> facilities like perf or /proc/pid/stack.
> 
> 
>> Though perhaps adding a cond_resched() every 10 ms or so, with a
>> WARN_ON() if it loops for more than 50 ms would be better.
> 
> A warning doesn't seem appropriate to me, and cond_resched should be
> invoked in each iteration. Or just msleep(1) in each iteration would be
> fine, I think.
> 
> But I'd like to bring in some more context -- here is the body of
> pseries_cpu_die:
> 
> static void pseries_cpu_die(unsigned int cpu)
> {
> 	int tries;
> 	int cpu_status = 1;
> 	unsigned int pcpu = get_hard_smp_processor_id(cpu);
> 
> 	if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
> 		cpu_status = 1;
> 		for (tries = 0; tries < 5000; tries++) {
> 			if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
> 				cpu_status = 0;
> 				break;
> 			}
> 			msleep(1);
> 		}
> 	} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> 
> 		for (tries = 0; tries < 25; tries++) {
> 			cpu_status = smp_query_cpu_stopped(pcpu);
> 			if (cpu_status == QCSS_STOPPED ||
> 			    cpu_status == QCSS_HARDWARE_ERROR)
> 				break;
> 			cpu_relax();
> 		}
> }
> 
> This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
> branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
> is to use H_CEDE instead of stop-self, correct?
> 
> And isn't entering H_CEDE expected to be quite a bit faster than
> stop-self? If so, why does that path get five whole seconds[*] while
> we're bikeshedding about tens of milliseconds for stop-self? :-)
> 
> [*] And should it be made to retry indefinitely as well?

I think so.

Thanks,
Nick