Hard hang in hypervisor!?

Milton Miller miltonm at bga.com
Fri Oct 12 07:35:49 EST 2007


On Thu Oct 11 10:04:40 EST 2007, Paul Mackerras wrote:
> Linas Vepstas writes:
>> Err ..  it was cpu 0 that was spinlocked.  Are interrupts not
>> distributed?
> 
> We have some bogosities in the xics code that I noticed a couple of
> days ago.  Basically we only set the xics to distribute interrupts to
> all cpus if (a) the affinity mask is equal to CPU_MASK_ALL (which has
> ones in every bit position from 0 to NR_CPUS-1) and (b) all present
> cpus are online (cpu_online_map == cpu_present_map).  Otherwise we
> direct interrupts to the first cpu in the affinity map.  So you can
> easily have the affinity mask containing all the online cpus and still
> not get distributed interrupts.


The second condition was just added to try fix some issues where a 
vendor wants to always run the kdump kernel with maxcpus=1 on all
architectures, and the emulated xics on js20 was not working.
For a true xics, this should work because we (1) remove all but 1
cpu from the global server list and (2) raise the prioirity of the
cpu to disabled and the hardware will deliver to another cpu in the
parition.

http://ozlabs.org/pipermail/linuxppc-dev/2006-December/028941.html
http://ozlabs.org/pipermail/linuxppc-dev/2007-January/029607.html
http://ozlabs.org/pipermail/linuxppc-dev/2007-March/032621.html

However, my experience the other day on a js21 was that firmware
delivered either to all cpus (if we bound to the global server) or
the first online cpu in the partition, regardless of to which cpu
we bound the interrupt, so I don't know that the change will fix
the original problem.

It does mean that taking a cpu offline but not dlpar removing it from the
kernel will result in the inability to actually distribute interrupts
to all cpus.

I'd be happy to say remove the extra check and work with firmware to
property distribute the interrupts.

milton



More information about the Linuxppc-dev mailing list