[PATCH] powerpc/numa: Fix percpu allocations to be NUMA aware

Balbir Singh bsingharora at gmail.com
Tue Jun 6 21:42:30 AEST 2017


On Fri, Jun 2, 2017 at 3:14 PM, Michael Ellerman <mpe at ellerman.id.au> wrote:
> In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
> switched to the generic implementation of cpu_to_node(), which uses a percpu
> variable to hold the NUMA node for each CPU.
>
> Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
> of our percpu areas, leading to a chicken and egg problem. In practice what
> happens is when we are setting up the percpu areas, cpu_to_node() reports that
> all CPUs are on node 0, so we allocate all percpu areas on node 0.
>
> This is visible in the dmesg output, as all pcpu allocs being in group 0:
>
>   pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
>   pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
>   pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
>   pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
>   pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
>   pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
>
> To fix it we need an early_cpu_to_node() which can run prior to percpu being
> setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
> in. With the patch dmesg output shows two groups, 0 and 1:
>
>   pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
>   pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
>   pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
>   pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
>   pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
>   pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
>
> We can also check the data_offset in the paca of various CPUs, with the fix we
> see:
>
>   CPU 0:  data_offset = 0x0ffe8b0000
>   CPU 24: data_offset = 0x1ffe5b0000
>
> And we can see from dmesg that CPU 24 has an allocation on node 1:
>
>   node   0: [mem 0x0000000000000000-0x0000000fffffffff]
>   node   1: [mem 0x0000001000000000-0x0000001fffffffff]
>
> Cc: stable at vger.kernel.org # v3.16+
> Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> ---
>  arch/powerpc/include/asm/topology.h | 14 ++++++++++++++
>  arch/powerpc/kernel/setup_64.c      | 19 ++++++++++++++-----
>  2 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index 8b3b46b7b0f2..8f3b2ec09b9e 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topology(void);
>  extern int sysfs_add_device_to_node(struct device *dev, int nid);
>  extern void sysfs_remove_device_from_node(struct device *dev, int nid);
>
> +static inline int early_cpu_to_node(int cpu)
> +{
> +       int nid;
> +
> +       nid = numa_cpu_lookup_table[cpu];
> +
> +       /*
> +        * Some functions, eg. node_distance() don't cope with -1, so instead
> +        * fall back to node 0 if nid is unset (it should be, except bugs).
> +        */
> +       return (nid < 0) ? 0 : nid;
> +}
>  #else

Not sure if its entirely related, but I had tried to do

https://patchwork.ozlabs.org/patch/683556/

to setup the mapping earlier, but we would have still missed the pcpu_fc_alloc.

Balbir


More information about the Linuxppc-dev mailing list