[RFC] maxcpus boot option leads to dropped interrupts

Tue Oct 19 00:46:44 EST 2004

Hi-

Our test group has discovered that booting a 2.6 kernel on a SMP pSeries
LPAR with maxcpus=1 will either hang or take a very long time to boot,
with lots of dropped interrupt messages or scsi timeouts, e.g.

Probing IDE interface ide2...
hde: IBM DROM00205, ATAPI CD/DVD-ROM drive
Using cfq io scheduler
ide2 at 0xfe400-0xfe407,0xfdc02 on irq 166
Probing IDE interface ide3...
Probing IDE interface ide3...
hde: ATAPI 24X DVD-ROM drive, 256kB Cache
Uniform CD-ROM driver Revision: 3.20
ide-cd: cmd 0x25 timed out
hde: lost interrupt
hde: lost interrupt

The problem goes away if CONFIG_IRQ_ALL_CPUS is not set.

I am about 85% sure that this is due to the OF "start-cpu" method
placing the primary threads of secondary cpus in the global interrupt
queue (see the comment in arch/ppc64/kernel/smp.c::start_secondary). 
With the maxcpus parameter, we never "boot" those cpus; they simply sit
in their spin loops waiting to be kicked.  However, from the platform's
point of view they are fair game to service device interrupts.

The RTAS "start-cpu" method apparently does not behave the same way -- I
can boot a single CPU (with SMT) Power5 LPAR with maxcpus=1 and
interrupts are not lost, even though the secondary thread on the boot
cpu has been started by RTAS.  So this problem is limited to systems
which have more than one cpu device node.

I've worked around the problem by modifying the xics code to use the
default interrupt server (the boot cpu) if cpu_online_map !=
cpu_present_map.  However that's a nasty hack which will keep interrupts
from being distributed in the smt-enabled=off case.

I'm not sure whether this happens on non-xics machines.

I'm looking for ideas on how to handle this.  Some options that occur to
me are:

o  Not booting secondary cpus from the OF client code (but the PPC-OF
binding document says we can't do this).  I believe I've tried this
before, and RTAS was unable to start the secondary cpus later.  So this
is probably not the way to go.

o  In smp_cpus_done(), "shoot down" any cpus which have not been kicked
out of their spin loops.  I've got a very rough version of this
working.  However, this method assumes that the RTAS "stop-cpu"
interface is available, which is a given on LPAR, but I'm not sure it's
a safe bet on other systems.

o  Directing interrupts to the boot cpu instead of using the GIQ when
the maxcpus option is detected.  This might be the easiest alternative;
however this could have a performance impact.

Any other ideas?  Keep in mind that I would like to get the code to a
state which will allow us to hotplug-online cpus which were not started
at boot.

Nathan