[PATCH 1/1] powerpc: Ignore IPIs to offline CPUs

Brian King brking at linux.vnet.ibm.com
Thu Apr 22 09:33:47 EST 2010


On 04/21/2010 05:49 PM, Michael Neuling wrote:
> In message <4BCF78E5.9020502 at linux.vnet.ibm.com> you wrote:
>> On 04/21/2010 04:03 PM, Michael Neuling wrote:
>>> In message <4BCF029B.1020805 at linux.vnet.ibm.com> you wrote:
>>>> On 04/21/2010 08:35 AM, Michael Ellerman wrote:
>>>>> On Tue, 2010-04-20 at 22:15 -0500, Brian King wrote:
>>>>>> On 04/20/2010 09:04 PM, Michael Neuling wrote:
>>>>>>> In message <201004210154.o3L1sXaR001791 at d01av04.pok.ibm.com> you wrote:
>>>>>>>>
>>>>>>>> Since there is nothing to stop an IPI from occurring to an
>>>>>>>> offline CPU, rather than printing a warning to the logs,
>>>>>>>> just ignore the IPI. This was seen while stress testing
>>>>>>>> SMT enable/disable.
>>>>>>>
>>>>>>> This seems like a recipe for disaster.  Do we at least need a
>>>>>>> WARN_ON_ONCE?
>>>>>>
>>>>>> Actually we are only seeing it once per offlining of a CPU,
>>>>>> and only once in a while.
>>>>>>  
>>>>>> My guess is that once the CPU is marked offline fewer IPIs
>>>>>> get sent to it since its no longer in the online mask.
>>>>>
>>>>> Hmm, right. Once it's offline it shouldn't get _any_ IPIs, AFAICS.
>>>>>
>>>>>> Perhaps we should be disabling IPIs to offline CPUs instead?
>>>>>
>>>>> You mean not sending them? We do:
>>>>>
>>>>> void smp_xics_message_pass(int target, int msg)
>>>>> {
>>>>>         unsigned int i;
>>>>>
>>>>>         if (target < NR_CPUS) {
>>>>>                 smp_xics_do_message(target, msg);
>>>>>         } else {
>>>>>                 for_each_online_cpu(i) {
>>>>>                         if (target == MSG_ALL_BUT_SELF
>>>>>                             && i == smp_processor_id())
>>>>>                                 continue;
>>>>>                         smp_xics_do_message(i, msg);
>>>>>                 }
>>>>>         }
>>>>> }      
>>>>>
>>>>> So it does sound like the IPI was sent while the cpu was online (ie.
>>>>> before pseries_cpu_disable(), but xics_migrate_irqs_away() has not
>>>>> caused the IPI to be cancelled.
>>>>>
>>>>> Problem is I don't think we can just ignore the IPI. The IPI might have
>>>>> been sent for a smp_call_function() which is waiting for the result, in
>>>>> which case if we ignore it the caller will block for ever.
>>>>>
>>>>> I don't see how to fix it :/
>>>>
>>>> Any objections to just removing the warning?
>>>
>>> Well someone could be waiting for the result, so it could be a real
>>> problem.  
>>>
>>> IMHO the warning should stay.
>>
>> Looking in arch/powerpc/kernel/smp.c, there are four possible IPIs:
>>
>> void smp_message_recv(int msg)
>> {
>> 	switch(msg) {
>> 	case PPC_MSG_CALL_FUNCTION:
>> 		generic_smp_call_function_interrupt();
>> 		break;
>> 	case PPC_MSG_RESCHEDULE:
>> 		/* we notice need_resched on exit */
>> 		break;
>> 	case PPC_MSG_CALL_FUNC_SINGLE:
>> 		generic_smp_call_function_single_interrupt();
>> 		break;
>> 	case PPC_MSG_DEBUGGER_BREAK:
>> 		if (crash_ipi_function_ptr) {
>> 			crash_ipi_function_ptr(get_irq_regs());
>> 			break;
>> 		}
>> #ifdef CONFIG_DEBUGGER
>> 		debugger_ipi(get_irq_regs());
>> 		break;
>> #endif /* CONFIG_DEBUGGER */
>> 		/* FALLTHROUGH */
>>
>>
>> Both generic_smp_call_function_interrupt and
>> generic_smp_call_function_single_interrupt have
>> WARN_ON(!cpu_online(cpu)); in them. The debugger IPI, appears to
>> ignore the IPI if the cpu is offline, which leaves the reschedule
>> IPI. This is likely the one I am seeing in test, since I'm not seeing
>> the other WARN_ON's.
> 
> I'm not sure what you are suggesting?
> 
> If the other methods produce the warning when a CPU is offline, surely
> we should keep the warning?  Maybe we need to add one to the debugger
> case too if we want to be consistent.  

I guess my point was that perhaps the warning in xics_ipi_dispatch is redundant.
I'm not sure if there are issues with not having a warning in the reschedule path,
which is really the one I care about. Am I correct in assuming that for the reschedule
IPI we wouldn't have to worry about someone waiting on that? There shouldn't be
anything running on the cpu we are disabling, or should we be scheduling anything on it...

Thanks,

Brian


-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center


More information about the Linuxppc-dev mailing list