Fw: 2.6.16 crashes when running numastat on p575
pj at sgi.com
Tue Apr 4 10:25:33 EST 2006
> There may be some complicated interactions with cpusets.
Hopefully not, but it's possible. I did my best not to assume that
cpus or memory nodes stayed online, and to build in fallbacks if one
of them disappeared out from under me.
But the code has never been tested to these assumptions, nor even
carefully reviewed to them.
So something will likely break somewhere, though quite possibly not
in the cpuset code itself, but in the scheduler or allocator code
that depends on selecting an online cpu or memory for some purpose.
Someone from the hotplug side will have to look at this, with an
understanding of how the coming and going of cpus and memory nodes
can affect the scheduler and allocators.
Look at the callers of cpuset_cpus_allowed(), cpuset_mems_allowed(),
cpuset_update_task_memory_state(), and cpuset_zone_allowed() to see
the likely hot spots for interaction between cpusets and hotplug.
The callers of these routines are important hooks into the scheduler
and allocator, and the question needs to be asked what happens if
a cpu or memory node goes offline after one of these cpuset_*()
routines is called before the scheduler or allocator finishes doing
what it is doing with what it thought was a usable cpu or node.
Also, there is currently no hotplug aware code that I know of in
the kernel/cpuset.c code, so each of the references to the globals
cpu_online_map and node_online_map therein are suspect. Presumably as
anywhere else in the kernel these appear, or the for_each_online_*()
macros appear, the code needs to be examined for perhaps needing some
sort of guard.
So, in sum, the kernel needs to be inspected at the location of at
least each of the following calls or references, for possible problems
if the cpus or nodes online change unexpectedly:
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj at sgi.com> 1.925.600.0401
More information about the Linuxppc-dev