[RFC/PATCH] numa: distinguish associativity domain from node id
Nathan Lynch
ntl at pobox.com
Thu Apr 7 11:37:05 EST 2005
On Thu, Apr 07, 2005 at 10:15:19AM +1000, Anton Blanchard wrote:
>
> > The ppc64 numa code makes some possibly invalid assumptions about the
> > numbering of "associativity domains" (which may be considered NUMA
> > nodes). As far as I've been able to determine from the architecture
> > docs, there is no guarantee about the numbering of associativity
> > domains, i.e. the values that are contained in ibm,associativity
> > device node properties. Yet we seem to assume that the numbering of
> > the domains begins at zero and that the range is contiguous, and we
> > use the domain number for a given resource as its logical node id.
> > This strikes me as a problem waiting to happen, and in fact I've been
> > seeing some problems in the lab with larger machines violating or at
> > least straining these assumptions.
>
> Im reluctant to have a mapping between the Linux concept of a node and
> the firmware concept if possible. Its nice to be able to jump on a
> machine and determine if it is set up correctly by looking at sysfs and
> /proc/device-tree.
Ok... just throwing out an idea here. What if we could add an
attribute to the node sysdevs which would give us the firmware domain
number?
> > Consider one such case: the associativity domain for all memory in a
> > partition is 0x1, but the processors are in shared mode (so no
> > associativity info for them) -- all the memory is placed in node 1
> > while all cpus are mapped to node 0. But in this case, we should
> > really have only one logical node, with all memory and cpus mapped to
> > it.
>
> Even in shared processor mode it makes sense to have separate memory
> nodes so we can still do striping across memory controllers. For the
> shared processor case where all our memory is in one node that isnt
> zero, perhaps we could just stuff all the cpus in that node at boot.
> When we support memory hotplug we then add new nodes as normal. New cpus
> go into the node we chose at boot.
OK.
>
> > Another case I've seen is that of a partition with all processors and
> > memory having an associativity domain of 0x1. We end up with
> > everything in node 1 and an empty (yet online) node 0.
>
> I saw some core changes go in recently that may allow us to have
> discontiguous node numbers. I agree onlining all nodes from 0...max node
> is pretty ugly, but perhaps thats fixable. Also, with hot memory unplug
> we are going to end up with holes.
The nodemap stuff, I assume. I'll look into whether we can get away
with discontiguous online node numbers.
> The main problem with not doing a mapping is if firmware decides to
> exceed the maximum node number (we have it set to 16 at the moment).
We may need to bump that up. If I'm interpreting the
ibm,max-associativity-domains property correctly, it should be 32.
This is from a box which (I think) can have only two domains worth of
cpus and memory. I guess all the extra is to account for I/O which
could be added to the system? Unlike ibm,lrdr-capacity, this property
doesn't seem to be affected by the partition profile settings.
# od -x /proc/device-tree/rtas/ibm,max-associativity-domains
0000000 0000 0005 0000 0001 0000 0001 0000 0020
0000020 0000 0020 0000 0040 ^^^^
Thanks,
Nathan
More information about the Linuxppc64-dev
mailing list