Node 0 not necessary for powerpc?
nacc at linux.vnet.ibm.com
Thu May 22 05:57:43 EST 2014
On 21.05.2014 [14:58:12 -0400], Tejun Heo wrote:
> On Wed, May 21, 2014 at 09:16:27AM -0500, Christoph Lameter wrote:
> > On Mon, 19 May 2014, Nishanth Aravamudan wrote:
> > > I'm seeing a panic at boot with this change on an LPAR which actually
> > > has no Node 0. Here's what I think is happening:
> > >
> > > start_kernel
> > > ...
> > > -> setup_per_cpu_areas
> > > -> pcpu_embed_first_chunk
> > > -> pcpu_fc_alloc
> > > -> ___alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu), ...
> > > -> smp_prepare_boot_cpu
> > > -> set_numa_node(boot_cpuid)
> > >
> > > So we panic on the NODE_DATA call. It seems that ia64, at least, uses
> > > pcpu_alloc_first_chunk rather than embed. x86 has some code to handle
> > > early calls of cpu_to_node (early_cpu_to_node) and sets the mapping for
> > > all CPUs in setup_per_cpu_areas().
> > Maybe we can switch ia64 too embed? Tejun: Why are there these
> > dependencies?
> > > Thoughts? Does that mean we need something similar to x86 for powerpc?
> I'm missing context to properly understand what's going on but the
> specific allocator in use shouldn't matter. e.g. x86 can use both
> embed and page allocators. If the problem is that the arch is
> accessing percpu memory before percpu allocator is initialized and the
> problem was masked before somehow, the right thing to do would be
> removing those premature percpu accesses. If early percpu variables
> are really necessary, doing similar early_percpu thing as in x86 would
> be necessary.
For context: I was looking at why N_ONLINE was statically setting Node 0
to be online, whether or not the topology is that way -- I've been
getting several bugs lately where Node 0 is online, but has no CPUs and
no memory on it, on powerpc.
On powerpc, setup_per_cpu_areas calls into ___alloc_bootmem_node using
Currently, cpu_to_node() in arch/powerpc/include/asm/topology.h does:
* During early boot, the numa-cpu lookup table might not have been
* setup for all CPUs yet. In such cases, default to node 0.
return (nid < 0) ? 0 : nid;
And so early at boot, if node 0 is not present, we end up accessing an
unitialized NODE_DATA(). So this seems buggy (I'll contact the powerpc
deveopers separately on that).
I recently submitted patches to have powerpc turn on
USE_PERCPU_NUMA_NODEID and HAVE_MEMORYLESS_NODES. But then, cpu_to_node
will be accessing percpu data in setup_per_cpu_areas, which seems like a
no-no. And more specifically, since we haven't yet run
smp_prepare_boot_cpu() at this point, cpu_to_node has not yet been
initialized to provide a sane value.
More information about the Linuxppc-dev