[PATCH 1/1] powerpc: Ignore IPIs to offline CPUs

Brian King brking at linux.vnet.ibm.com
Thu Apr 22 08:15:01 EST 2010


On 04/21/2010 04:03 PM, Michael Neuling wrote:
> In message <4BCF029B.1020805 at linux.vnet.ibm.com> you wrote:
>> On 04/21/2010 08:35 AM, Michael Ellerman wrote:
>>> On Tue, 2010-04-20 at 22:15 -0500, Brian King wrote:
>>>> On 04/20/2010 09:04 PM, Michael Neuling wrote:
>>>>> In message <201004210154.o3L1sXaR001791 at d01av04.pok.ibm.com> you wrote:
>>>>>>
>>>>>> Since there is nothing to stop an IPI from occurring to an
>>>>>> offline CPU, rather than printing a warning to the logs,
>>>>>> just ignore the IPI. This was seen while stress testing
>>>>>> SMT enable/disable.
>>>>>
>>>>> This seems like a recipe for disaster.  Do we at least need a
>>>>> WARN_ON_ONCE?
>>>>
>>>> Actually we are only seeing it once per offlining of a CPU,
>>>> and only once in a while.
>>>>  
>>>> My guess is that once the CPU is marked offline fewer IPIs
>>>> get sent to it since its no longer in the online mask.
>>>
>>> Hmm, right. Once it's offline it shouldn't get _any_ IPIs, AFAICS.
>>>
>>>> Perhaps we should be disabling IPIs to offline CPUs instead?
>>>
>>> You mean not sending them? We do:
>>>
>>> void smp_xics_message_pass(int target, int msg)
>>> {
>>>         unsigned int i;
>>>
>>>         if (target < NR_CPUS) {
>>>                 smp_xics_do_message(target, msg);
>>>         } else {
>>>                 for_each_online_cpu(i) {
>>>                         if (target == MSG_ALL_BUT_SELF
>>>                             && i == smp_processor_id())
>>>                                 continue;
>>>                         smp_xics_do_message(i, msg);
>>>                 }
>>>         }
>>> }      
>>>
>>> So it does sound like the IPI was sent while the cpu was online (ie.
>>> before pseries_cpu_disable(), but xics_migrate_irqs_away() has not
>>> caused the IPI to be cancelled.
>>>
>>> Problem is I don't think we can just ignore the IPI. The IPI might have
>>> been sent for a smp_call_function() which is waiting for the result, in
>>> which case if we ignore it the caller will block for ever.
>>>
>>> I don't see how to fix it :/
>>
>> Any objections to just removing the warning?
> 
> Well someone could be waiting for the result, so it could be a real
> problem.  
> 
> IMHO the warning should stay.

Looking in arch/powerpc/kernel/smp.c, there are four possible IPIs:

void smp_message_recv(int msg)
{
	switch(msg) {
	case PPC_MSG_CALL_FUNCTION:
		generic_smp_call_function_interrupt();
		break;
	case PPC_MSG_RESCHEDULE:
		/* we notice need_resched on exit */
		break;
	case PPC_MSG_CALL_FUNC_SINGLE:
		generic_smp_call_function_single_interrupt();
		break;
	case PPC_MSG_DEBUGGER_BREAK:
		if (crash_ipi_function_ptr) {
			crash_ipi_function_ptr(get_irq_regs());
			break;
		}
#ifdef CONFIG_DEBUGGER
		debugger_ipi(get_irq_regs());
		break;
#endif /* CONFIG_DEBUGGER */
		/* FALLTHROUGH */

Both generic_smp_call_function_interrupt and generic_smp_call_function_single_interrupt
have WARN_ON(!cpu_online(cpu)); in them. The debugger IPI, appears to ignore the IPI
if the cpu is offline, which leaves the reschedule IPI. This is likely the one I am
seeing in test, since I'm not seeing the other WARN_ON's. 


-Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center


More information about the Linuxppc-dev mailing list