[RFC/PATCH] numa: distinguish associativity domain from node id

Thu Apr 7 10:15:19 EST 2005

Hi Nathan,

> The ppc64 numa code makes some possibly invalid assumptions about the
> numbering of "associativity domains" (which may be considered NUMA
> nodes).  As far as I've been able to determine from the architecture
> docs, there is no guarantee about the numbering of associativity
> domains, i.e. the values that are contained in ibm,associativity
> device node properties.  Yet we seem to assume that the numbering of
> the domains begins at zero and that the range is contiguous, and we
> use the domain number for a given resource as its logical node id.
> This strikes me as a problem waiting to happen, and in fact I've been
> seeing some problems in the lab with larger machines violating or at
> least straining these assumptions.

Im reluctant to have a mapping between the Linux concept of a node and
the firmware concept if possible. Its nice to be able to jump on a
machine and determine if it is set up correctly by looking at sysfs and
/proc/device-tree.

> Consider one such case: the associativity domain for all memory in a
> partition is 0x1, but the processors are in shared mode (so no
> associativity info for them) -- all the memory is placed in node 1
> while all cpus are mapped to node 0.  But in this case, we should
> really have only one logical node, with all memory and cpus mapped to
> it.

Even in shared processor mode it makes sense to have separate memory
nodes so we can still do striping across memory controllers. For the
shared processor case where all our memory is in one node that isnt
zero, perhaps we could just stuff all the cpus in that node at boot.
When we support memory hotplug we then add new nodes as normal. New cpus
go into the node we chose at boot.

> Another case I've seen is that of a partition with all processors and
> memory having an associativity domain of 0x1.  We end up with
> everything in node 1 and an empty (yet online) node 0.

I saw some core changes go in recently that may allow us to have
discontiguous node numbers. I agree onlining all nodes from 0...max node
is pretty ugly, but perhaps thats fixable. Also, with hot memory unplug
we are going to end up with holes.

The main problem with not doing a mapping is if firmware decides to
exceed the maximum node number (we have it set to 16 at the moment).

Anton