[PATCH] powerpc/numa: Fix percpu allocations to be NUMA aware
Balbir Singh
bsingharora at gmail.com
Tue Jun 6 21:42:30 AEST 2017
On Fri, Jun 2, 2017 at 3:14 PM, Michael Ellerman <mpe at ellerman.id.au> wrote:
> In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
> switched to the generic implementation of cpu_to_node(), which uses a percpu
> variable to hold the NUMA node for each CPU.
>
> Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
> of our percpu areas, leading to a chicken and egg problem. In practice what
> happens is when we are setting up the percpu areas, cpu_to_node() reports that
> all CPUs are on node 0, so we allocate all percpu areas on node 0.
>
> This is visible in the dmesg output, as all pcpu allocs being in group 0:
>
> pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
> pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
> pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
> pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
> pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
> pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
>
> To fix it we need an early_cpu_to_node() which can run prior to percpu being
> setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
> in. With the patch dmesg output shows two groups, 0 and 1:
>
> pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
> pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
> pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
> pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
> pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
> pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
>
> We can also check the data_offset in the paca of various CPUs, with the fix we
> see:
>
> CPU 0: data_offset = 0x0ffe8b0000
> CPU 24: data_offset = 0x1ffe5b0000
>
> And we can see from dmesg that CPU 24 has an allocation on node 1:
>
> node 0: [mem 0x0000000000000000-0x0000000fffffffff]
> node 1: [mem 0x0000001000000000-0x0000001fffffffff]
>
> Cc: stable at vger.kernel.org # v3.16+
> Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> ---
> arch/powerpc/include/asm/topology.h | 14 ++++++++++++++
> arch/powerpc/kernel/setup_64.c | 19 ++++++++++++++-----
> 2 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index 8b3b46b7b0f2..8f3b2ec09b9e 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topology(void);
> extern int sysfs_add_device_to_node(struct device *dev, int nid);
> extern void sysfs_remove_device_from_node(struct device *dev, int nid);
>
> +static inline int early_cpu_to_node(int cpu)
> +{
> + int nid;
> +
> + nid = numa_cpu_lookup_table[cpu];
> +
> + /*
> + * Some functions, eg. node_distance() don't cope with -1, so instead
> + * fall back to node 0 if nid is unset (it should be, except bugs).
> + */
> + return (nid < 0) ? 0 : nid;
> +}
> #else
Not sure if its entirely related, but I had tried to do
https://patchwork.ozlabs.org/patch/683556/
to setup the mapping earlier, but we would have still missed the pcpu_fc_alloc.
Balbir
More information about the Linuxppc-dev
mailing list