[PATCH 1/1] powerpc: Ignore IPIs to offline CPUs

Michael Neuling mikey at neuling.org
Thu Apr 22 08:49:05 EST 2010


In message <4BCF78E5.9020502 at linux.vnet.ibm.com> you wrote:
> On 04/21/2010 04:03 PM, Michael Neuling wrote:
> > In message <4BCF029B.1020805 at linux.vnet.ibm.com> you wrote:
> >> On 04/21/2010 08:35 AM, Michael Ellerman wrote:
> >>> On Tue, 2010-04-20 at 22:15 -0500, Brian King wrote:
> >>>> On 04/20/2010 09:04 PM, Michael Neuling wrote:
> >>>>> In message <201004210154.o3L1sXaR001791 at d01av04.pok.ibm.com> you wrote:
> >>>>>>
> >>>>>> Since there is nothing to stop an IPI from occurring to an
> >>>>>> offline CPU, rather than printing a warning to the logs,
> >>>>>> just ignore the IPI. This was seen while stress testing
> >>>>>> SMT enable/disable.
> >>>>>
> >>>>> This seems like a recipe for disaster.  Do we at least need a
> >>>>> WARN_ON_ONCE?
> >>>>
> >>>> Actually we are only seeing it once per offlining of a CPU,
> >>>> and only once in a while.
> >>>>  
> >>>> My guess is that once the CPU is marked offline fewer IPIs
> >>>> get sent to it since its no longer in the online mask.
> >>>
> >>> Hmm, right. Once it's offline it shouldn't get _any_ IPIs, AFAICS.
> >>>
> >>>> Perhaps we should be disabling IPIs to offline CPUs instead?
> >>>
> >>> You mean not sending them? We do:
> >>>
> >>> void smp_xics_message_pass(int target, int msg)
> >>> {
> >>>         unsigned int i;
> >>>
> >>>         if (target < NR_CPUS) {
> >>>                 smp_xics_do_message(target, msg);
> >>>         } else {
> >>>                 for_each_online_cpu(i) {
> >>>                         if (target == MSG_ALL_BUT_SELF
> >>>                             && i == smp_processor_id())
> >>>                                 continue;
> >>>                         smp_xics_do_message(i, msg);
> >>>                 }
> >>>         }
> >>> }      
> >>>
> >>> So it does sound like the IPI was sent while the cpu was online (ie.
> >>> before pseries_cpu_disable(), but xics_migrate_irqs_away() has not
> >>> caused the IPI to be cancelled.
> >>>
> >>> Problem is I don't think we can just ignore the IPI. The IPI might have
> >>> been sent for a smp_call_function() which is waiting for the result, in
> >>> which case if we ignore it the caller will block for ever.
> >>>
> >>> I don't see how to fix it :/
> >>
> >> Any objections to just removing the warning?
> > 
> > Well someone could be waiting for the result, so it could be a real
> > problem.  
> > 
> > IMHO the warning should stay.
> 
> Looking in arch/powerpc/kernel/smp.c, there are four possible IPIs:
> 
> void smp_message_recv(int msg)
> {
> 	switch(msg) {
> 	case PPC_MSG_CALL_FUNCTION:
> 		generic_smp_call_function_interrupt();
> 		break;
> 	case PPC_MSG_RESCHEDULE:
> 		/* we notice need_resched on exit */
> 		break;
> 	case PPC_MSG_CALL_FUNC_SINGLE:
> 		generic_smp_call_function_single_interrupt();
> 		break;
> 	case PPC_MSG_DEBUGGER_BREAK:
> 		if (crash_ipi_function_ptr) {
> 			crash_ipi_function_ptr(get_irq_regs());
> 			break;
> 		}
> #ifdef CONFIG_DEBUGGER
> 		debugger_ipi(get_irq_regs());
> 		break;
> #endif /* CONFIG_DEBUGGER */
> 		/* FALLTHROUGH */
> 
> 
> Both generic_smp_call_function_interrupt and
> generic_smp_call_function_single_interrupt have
> WARN_ON(!cpu_online(cpu)); in them. The debugger IPI, appears to
> ignore the IPI if the cpu is offline, which leaves the reschedule
> IPI. This is likely the one I am seeing in test, since I'm not seeing
> the other WARN_ON's.

I'm not sure what you are suggesting?

If the other methods produce the warning when a CPU is offline, surely
we should keep the warning?  Maybe we need to add one to the debugger
case too if we want to be consistent.  

Mikey


More information about the Linuxppc-dev mailing list