libnuma interleaving oddness
nacc at us.ibm.com
Wed Aug 30 09:02:00 EST 2006
While trying to add NUMA-awareness to libhugetlbfs' morecore
functionality (hugepage-backed malloc), I ran into an issue on a
ppc64-box with 8 memory nodes, running SLES10. I am using two functions
from libnuma: numa_available() and numa_interleave_memory(). When I ask
numa_interleave_memory() to interleave over all nodes (numa_all_nodes is
the nodemask from libnuma), it exhausts node 0, then moves to node 1,
then node 2, etc, until the allocations are satisfied. If I custom
generate a nodemask, such that bits 1 through 7 are set, but bit 0 is
not, then I get proper interleaving, where the first hugepage is on node
1, the second is on node 2, etc. Similarly, if I set bits 0 through 6 in
a custom nodemask, interleaving works across the requested 7 nodes. But
it has yet to work across all 8.
I don't know if this is a libnuma bug (I extracted out the code from
libnuma, it looked sane; and even reimplemented it in libhugetlbfs for
testing purposes, but got the same results) or a NUMA kernel bug (mbind
is some hairy code...) or a ppc64 bug or maybe not a bug at all.
Regardless, I'm getting somewhat inconsistent behavior. I can provide
more debugging output, or whatever is requested, but I wasn't sure what
to include. I'm hoping someone has heard of or seen something similar?
The test application I'm using makes some mallopt calls then justs
mallocs large chunks in a loop (4096 * 100 bytes). libhugetlbfs is
LD_PRELOAD'd so that we can override malloc.
Nishanth Aravamudan <nacc at us.ibm.com>
IBM Linux Technology Center
More information about the Linuxppc-dev