[RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map

David Rientjes rientjes at google.com
Fri Mar 6 08:58:27 AEDT 2015


On Fri, 6 Mar 2015, Michael Ellerman wrote:

> > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > > index 0257a7d659ef..24de29b3651b 100644
> > > --- a/arch/powerpc/mm/numa.c
> > > +++ b/arch/powerpc/mm/numa.c
> > > @@ -958,9 +958,17 @@ void __init initmem_init(void)
> > >  
> > >  	memblock_dump_all();
> > >  
> > > +	/*
> > > +	 * zero out the possible nodes after we parse the device-tree,
> > > +	 * so that we lower the maximum NUMA node ID to what is actually
> > > +	 * present.
> > > +	 */
> > > +	nodes_clear(node_possible_map);
> > > +
> > >  	for_each_online_node(nid) {
> > >  		unsigned long start_pfn, end_pfn;
> > >  
> > > +		node_set(nid, node_possible_map);
> > >  		get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> > >  		setup_node_data(nid, start_pfn, end_pfn);
> > >  		sparse_memory_present_with_active_regions(nid);
> > 
> > This seems a bit strange, node_possible_map is supposed to be a superset 
> > of node_online_map and this loop is iterating over node_online_map to set 
> > nodes in node_possible_map.
>  
> Yeah. Though at this point in boot I don't think it matters that the two maps
> are out-of-sync temporarily.
> 
> But it would simpler to just set the possible map to be the online map. That
> would also maintain the invariant that the possible map is always a superset of
> the online map.
> 
> Or did I miss a detail there (sleep deprived parent mode).
> 

I think reset_numa_cpu_lookup_table() which iterates over the possible 
map, and thus only a subset of nodes now, may be concerning.

I'm not sure why this is being proposed as a powerpc patch and now a patch 
for mem_cgroup_css_alloc().  In other words, why do we have to allocate 
for all possible nodes?  We should only be allocating for online nodes in 
N_MEMORY with mem hotplug disabled initially and then have a mem hotplug 
callback implemented to alloc_mem_cgroup_per_zone_info() for nodes that 
transition from memoryless -> memory.  The extra bonus is that 
alloc_mem_cgroup_per_zone_info() need never allocate remote memory and the 
TODO in that function can be removed.


More information about the Linuxppc-dev mailing list