[RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map
David Rientjes
rientjes at google.com
Fri Mar 6 08:58:27 AEDT 2015
On Fri, 6 Mar 2015, Michael Ellerman wrote:
> > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > > index 0257a7d659ef..24de29b3651b 100644
> > > --- a/arch/powerpc/mm/numa.c
> > > +++ b/arch/powerpc/mm/numa.c
> > > @@ -958,9 +958,17 @@ void __init initmem_init(void)
> > >
> > > memblock_dump_all();
> > >
> > > + /*
> > > + * zero out the possible nodes after we parse the device-tree,
> > > + * so that we lower the maximum NUMA node ID to what is actually
> > > + * present.
> > > + */
> > > + nodes_clear(node_possible_map);
> > > +
> > > for_each_online_node(nid) {
> > > unsigned long start_pfn, end_pfn;
> > >
> > > + node_set(nid, node_possible_map);
> > > get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
> > > setup_node_data(nid, start_pfn, end_pfn);
> > > sparse_memory_present_with_active_regions(nid);
> >
> > This seems a bit strange, node_possible_map is supposed to be a superset
> > of node_online_map and this loop is iterating over node_online_map to set
> > nodes in node_possible_map.
>
> Yeah. Though at this point in boot I don't think it matters that the two maps
> are out-of-sync temporarily.
>
> But it would simpler to just set the possible map to be the online map. That
> would also maintain the invariant that the possible map is always a superset of
> the online map.
>
> Or did I miss a detail there (sleep deprived parent mode).
>
I think reset_numa_cpu_lookup_table() which iterates over the possible
map, and thus only a subset of nodes now, may be concerning.
I'm not sure why this is being proposed as a powerpc patch and now a patch
for mem_cgroup_css_alloc(). In other words, why do we have to allocate
for all possible nodes? We should only be allocating for online nodes in
N_MEMORY with mem hotplug disabled initially and then have a mem hotplug
callback implemented to alloc_mem_cgroup_per_zone_info() for nodes that
transition from memoryless -> memory. The extra bonus is that
alloc_mem_cgroup_per_zone_info() need never allocate remote memory and the
TODO in that function can be removed.
More information about the Linuxppc-dev
mailing list