[RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot

Tejun Heo tj at kernel.org
Thu Jul 16 06:37:39 AEST 2015


Hello,

On Fri, Jul 10, 2015 at 09:15:47AM -0700, Nishanth Aravamudan wrote:
> On 08.07.2015 [16:16:23 -0700], Nishanth Aravamudan wrote:
> > On 08.07.2015 [14:00:56 +1000], Michael Ellerman wrote:
> > > On Thu, 2015-02-07 at 23:02:02 UTC, Nishanth Aravamudan wrote:
> > > > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
> > > > have an ordering issue during boot with early calls to cpu_to_node().
> > > 
> > > "now that .." implies we changed something and broke this. What commit was
> > > it that changed the behaviour?
> > 
> > Well, that's something I'm trying to still unearth. In the commits
> > before and after adding USE_PERCPU_NUMA_NODE_ID (8c272261194d
> > "powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), the dmesg reports:
> > 
> > pcpu-alloc: [0] 0 1 2 3 4 5 6 7
> 
> Ok, I did a bisection, and it seems like prior to commit
> 1a4d76076cda69b0abf15463a8cebc172406da25 ("percpu: implement
> asynchronous chunk population"), we emitted the above, e.g.:
> 
> pcpu-alloc: [0] 0 1 2 3 4 5 6 7
> 
> And after that commit, we emitted:
> 
> pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7
> 
> I'm not exactly sure why that changed, but I'm still
> reading/understanding the commit. Tejun might be able to explain.
> 
> Tejun, for reference, I noticed on Power systems since the
> above-mentioned commit, pcpu-alloc is not reflecting the topology of the
> system correctly -- that is, the pcpu areas are all on node 0
> unconditionally (based up on pcpu-alloc's output). Prior to that, there
> was just one group, it seems like, which completely ignored the NUMA
> topology.
> 
> Is this just an ordering thing that changed with the introduction of the
> async code?

It's just each unit growing and percpu allocator deciding to split
them into separate allocation units.  Before it was serving all cpus
in a single alloc unit as they looked like they belong to the same
NUMA node and small enough to fit into one alloc unit.  In the latter,
the async one added more reserve space, so the allocator is deciding
to split them into two alloc units while assigning them to the same
group as the NUMA info wasn't still there.

Thanks.

-- 
tejun


More information about the Linuxppc-dev mailing list