[3/6] pseries: Update CPU hotplug error recovery

Nathan Fontenot nfont at linux.vnet.ibm.com
Wed Jul 22 05:14:38 AEST 2015


On 07/20/2015 11:46 PM, Michael Ellerman wrote:
> On Mon, 2015-22-06 at 20:59:20 UTC, Nathan Fontenot wrote:
>> Update the cpu dlpar add/remove paths to do better error recovery when 
>> a failure occurs during the add/remove operation. This includes adding
>> some pr_info and pr_debug statements.
> 
> So I'm happy with the idea there, but ..
> 
>> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> index f58d902..7890b2f 100644
>> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
>> @@ -18,6 +18,8 @@
>>   *      2 of the License, or (at your option) any later version.
>>   */
>>  
>> +#define pr_fmt(fmt)     "pseries-hotplug-cpu: " fmt
> 
> This is good.
> 
>>  #include <linux/kernel.h>
>>  #include <linux/interrupt.h>
>>  #include <linux/delay.h>
>> @@ -386,22 +388,35 @@ static ssize_t dlpar_cpu_add(struct device_node *parent, u32 drc_index)
>>  	struct device_node *dn;
>>  	int rc;
>>  
>> +	pr_info("Attempting to add CPU, drc index %x\n", drc_index);
>> +
>>  	rc = dlpar_acquire_drc(drc_index);
>>  	if (rc)
>>  		return -EINVAL;
>>  
>>  	dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
>> -	if (!dn)
>> +	if (!dn) {
>> +		pr_debug("Failed call to configure-connector\n");
>> +		dlpar_release_drc(drc_index);
>>  		return -EINVAL;
>> +	}
>>  
>>  	rc = dlpar_attach_node(dn);
>>  	if (rc) {
>> +		pr_debug("Failed to attach node (%d)\n", rc);
>>  		dlpar_release_drc(drc_index);
>>  		dlpar_free_cc_nodes(dn);
>>  		return rc;
>>  	}
>>  
>>  	rc = dlpar_online_cpu(dn);
>> +	if (rc) {
>> +		pr_debug("Failed to online cpu (%d)\n", rc);
>> +		dlpar_detach_node(dn);
>> +		dlpar_release_drc(drc_index);
>> +	}
>> +
>> +	pr_info("Successfully added CPU, drc index %x\n", drc_index);
>>  	return rc;
> 
> 
> But this is the opposite of what we want.
> 
> By default this will print two info lines for each successful cpu hotplug, but
> when we get an error nothing will be printed - because pr_debug() is off by
> default. What's worse, if dlpar_online_cpu() fails, the pr_debug() will produce
> nothing but we will *still* print "Successfully added CPU".
> 
> So the pr_info()s should go entirely and the pr_debugs() should become
> pr_warns(). The warning messages should become more verbose so they stand on
> their own, ie. include the drc_index.
> 
> When everything goes perfectly there should be no output.
>

So... good idea, bad implementation :)

I have a feeling I may have messed this up somewhere else in the patch set too
so I'll take a look at all the pr_* calls.
 
> 
>> @@ -465,18 +480,29 @@ static ssize_t dlpar_cpu_remove(struct device_node *dn, u32 drc_index)
>>  {
>>  	int rc;
>>  
>> +	pr_info("Attemping to remove CPU, drc index %x\n", drc_index);
>> +
>>  	rc = dlpar_offline_cpu(dn);
>> -	if (rc)
>> +	if (rc) {
>> +		pr_debug("Failed top offline cpu (%d)\n", rc);
>                                    ^
> 				   should be to
> 
>>  		return -EINVAL;
>> +	}
>>  
>>  	rc = dlpar_release_drc(drc_index);
>> -	if (rc)
>> +	if (rc) {
>> +		pr_debug("Failed to release drc (%d)\n", rc);
>> +		dlpar_online_cpu(dn);
>>  		return rc;
>> +	}
>>  
>>  	rc = dlpar_detach_node(dn);
>> -	if (rc)
>> +	if (rc) {
>> +		pr_debug("Failed to detach cpu node (%d)\n", rc);
>>  		dlpar_acquire_drc(drc_index);
> 
> But that can fail?
> 
>> +		dlpar_online_cpu(dn);
> 
> And if it did this would presumably not be safe?

Ahh, good catch. I'll fix in the next version of patches.

Thanks for the review.

-Nathan



More information about the Linuxppc-dev mailing list