[PATCH 1/3] mm: return NUMA_NO_NODE in local_memory_node if zonelists are not setup

Nishanth Aravamudan nacc at linux.vnet.ibm.com
Sat Feb 22 10:56:16 EST 2014


On 21.02.2014 [14:42:03 -0800], Andrew Morton wrote:
> On Thu, 20 Feb 2014 10:28:47 -0800 Nishanth Aravamudan <nacc at linux.vnet.ibm.com> wrote:
> 
> > On 20.02.2014 [10:05:39 -0600], Christoph Lameter wrote:
> > > On Wed, 19 Feb 2014, Nishanth Aravamudan wrote:
> > > 
> > > > We can call local_memory_node() before the zonelists are setup. In that
> > > > case, first_zones_zonelist() will not set zone and the reference to
> > > > zone->node will Oops. Catch this case, and, since we presumably running
> > > > very early, just return that any node will do.
> > > 
> > > Really? Isnt there some way to avoid this call if zonelists are not setup
> > > yet?
> > 
> > How do I best determine if zonelists aren't setup yet?
> > 
> > The call-path in question (after my series is applied) is:
> > 
> > arch/powerpc/kernel/setup_64.c::setup_arch ->
> > 	arch/powerpc/mm/numa.c::do_init_bootmem() ->
> > 		cpu_numa_callback() ->
> > 			numa_setup_cpu() ->
> > 				map_cpu_to_node() ->
> > 					update_numa_cpu_node() ->
> > 						set_cpu_numa_mem()
> > 
> > and setup_arch() is called before build_all_zonelists(NULL, NULL) in
> > start_kernel(). This seemed like the most reasonable path, as it's used
> > on hotplug as well.
> > 
> 
> But the call to local_memory_node() you added was in start_secondary(),
> which isn't in that trace.

I added two calls to local_memory_node(), I *think* both are necessary,
but am willing to be corrected.

One is in map_cpu_to_node() and one is in start_secondary(). The
start_secondary() path is fine, AFAICT, as we are up & running at that
point. But in [the renamed function] update_numa_cpu_node() which is
used by hotplug, we get called from do_init_bootmem(), which is before
the zonelists are setup.

I think both calls are necessary because I believe the
arch_update_cpu_topology() is used for supporting firmware-driven
home-noding, which does not invoke start_secondary() again (the
processor is already running, we're just updating the topology in that
situation).

Then again, I could special-case the do_init_bootmem callpath, which is
only called at kernel init time?

> I do agree that calling local_memory_node() too early then trying to
> fudge around the consequences seems rather wrong.

If the answer is to simply not call local_memory_node() early, I'll
submit a patch to at least add a comment, as there's nothing in the code
itself to prevent this from happening and is guaranteed to oops.

Thanks,
Nish



More information about the Linuxppc-dev mailing list