[PATCH v3] pseries/hotplug-memory: hot-add: skip redundant LMB lookup

Laurent Dufour ldufour at linux.vnet.ibm.com
Wed Sep 16 18:22:03 AEST 2020


Le 15/09/2020 à 21:46, Scott Cheloha a écrit :
> During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid()
> to determine which node id (nid) to use when later calling __add_memory().
> 
> This is wasteful.  On pseries, memory_add_physaddr_to_nid() finds an
> appropriate nid for a given address by looking up the LMB containing the
> address and then passing that LMB to of_drconf_to_nid_single() to get the
> nid.  In dlpar_add_lmb() we get this address from the LMB itself.
> 
> In short, we have a pointer to an LMB and then we are searching for
> that LMB *again* in order to find its nid.
> 
> If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we
> can skip the redundant lookup.  The only error handling we need to
> duplicate from memory_add_physaddr_to_nid() is the fallback to the
> default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or
> an invalid nid.
> 
> Skipping the extra lookup makes hot-add operations faster, especially
> on machines with many LMBs.
> 
> Consider an LPAR with 126976 LMBs.  In one test, hot-adding 126000
> LMBs on an upatched kernel took ~3.5 hours while a patched kernel
> completed the same operation in ~2 hours:
> 
> Unpatched (12450 seconds):
> Sep  9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000
> Sep  9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
> [...]
> Sep  9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
> 
> Patched (7065 seconds):
> Sep  8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000
> Sep  8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
> [...]
> Sep  8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
> 
> It should be noted that the speedup grows more substantial when
> hot-adding LMBs at the end of the drconf range.  This is because we
> are skipping a linear LMB search.
> 
> To see the distinction, consider smaller hot-add test on the same
> LPAR.  A perf-stat run with 10 iterations showed that hot-adding 4096
> LMBs completed less than 1 second faster on a patched kernel:
> 
> Unpatched:
>   Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
> 
>          104,753.42 msec task-clock                #    0.992 CPUs utilized            ( +-  0.55% )
>               4,708      context-switches          #    0.045 K/sec                    ( +-  0.69% )
>               2,444      cpu-migrations            #    0.023 K/sec                    ( +-  1.25% )
>                 394      page-faults               #    0.004 K/sec                    ( +-  0.22% )
>     445,902,503,057      cycles                    #    4.257 GHz                      ( +-  0.55% )  (66.67%)
>       8,558,376,740      stalled-cycles-frontend   #    1.92% frontend cycles idle     ( +-  0.88% )  (49.99%)
>     300,346,181,651      stalled-cycles-backend    #   67.36% backend cycles idle      ( +-  0.76% )  (50.01%)
>     258,091,488,691      instructions              #    0.58  insn per cycle
>                                                    #    1.16  stalled cycles per insn  ( +-  0.22% )  (66.67%)
>      70,568,169,256      branches                  #  673.660 M/sec                    ( +-  0.17% )  (50.01%)
>       3,100,725,426      branch-misses             #    4.39% of all branches          ( +-  0.20% )  (49.99%)
> 
>             105.583 +- 0.589 seconds time elapsed  ( +-  0.56% )
> 
> Patched:
>   Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
> 
>          104,055.69 msec task-clock                #    0.993 CPUs utilized            ( +-  0.32% )
>               4,606      context-switches          #    0.044 K/sec                    ( +-  0.20% )
>               2,463      cpu-migrations            #    0.024 K/sec                    ( +-  0.93% )
>                 394      page-faults               #    0.004 K/sec                    ( +-  0.25% )
>     442,951,129,921      cycles                    #    4.257 GHz                      ( +-  0.32% )  (66.66%)
>       8,710,413,329      stalled-cycles-frontend   #    1.97% frontend cycles idle     ( +-  0.47% )  (50.06%)
>     299,656,905,836      stalled-cycles-backend    #   67.65% backend cycles idle      ( +-  0.39% )  (50.02%)
>     252,731,168,193      instructions              #    0.57  insn per cycle
>                                                    #    1.19  stalled cycles per insn  ( +-  0.20% )  (66.66%)
>      68,902,851,121      branches                  #  662.173 M/sec                    ( +-  0.13% )  (49.94%)
>       3,100,242,882      branch-misses             #    4.50% of all branches          ( +-  0.15% )  (49.98%)
> 
>             104.829 +- 0.325 seconds time elapsed  ( +-  0.31% )
> 
> This is consistent.  An add-by-count hot-add operation adds LMBs
> greedily, so LMBs near the start of the drconf range are considered
> first.  On an otherwise idle LPAR with so many LMBs we would expect to
> find the LMBs we need near the start of the drconf range, hence the
> smaller speedup.
> 
> Signed-off-by: Scott Cheloha <cheloha at linux.ibm.com>
> ---
> Changelog:
> 
> v1: https://lore.kernel.org/linuxppc-dev/20200910175637.2865160-1-cheloha@linux.ibm.com/
> 
> v2:
> - Move prototype for of_drconf_to_nid_single() to topology.h.
>    Requested by Michael Ellerman.
> 
> v3:
> - Send the right patch.  v2 is from the wrong branch, my mistake.
> 
>   arch/powerpc/include/asm/topology.h             | 3 +++
>   arch/powerpc/mm/numa.c                          | 2 +-
>   arch/powerpc/platforms/pseries/hotplug-memory.c | 6 ++++--
>   3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index f0b6300e7dd3..d15d9999bad6 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -86,6 +86,9 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
> 
>   #endif /* CONFIG_NUMA */
> 
> +struct drmem_lmb;
> +extern int of_drconf_to_nid_single(struct drmem_lmb *);

Hi Scott,

checkpatch.pl reports:

CHECK: extern prototypes should be avoided in .h files
#106: FILE: arch/powerpc/include/asm/topology.h:90:
+extern int of_drconf_to_nid_single(struct drmem_lmb *);

WARNING: function definition argument 'struct drmem_lmb *' should also have an 
identifier name
#106: FILE: arch/powerpc/include/asm/topology.h:90:
+extern int of_drconf_to_nid_single(struct drmem_lmb *);

cheers,
Laurent.

> +
>   #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
>   extern int find_and_online_cpu_nid(int cpu);
>   #else
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 1f61fa2148b5..63507b47164d 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -430,7 +430,7 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa)
>    * This is like of_node_to_nid_single() for memory represented in the
>    * ibm,dynamic-reconfiguration-memory node.
>    */
> -static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
> +int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>   {
>   	struct assoc_arrays aa = { .arrays = NULL };
>   	int default_nid = NUMA_NO_NODE;
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 0ea976d1cac4..9a533acf8ad0 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -611,8 +611,10 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> 
>   	block_sz = memory_block_size_bytes();
> 
> -	/* Find the node id for this address. */
> -	nid = memory_add_physaddr_to_nid(lmb->base_addr);
> +	/* Find the node id for this LMB.  Fake one if necessary. */
> +	nid = of_drconf_to_nid_single(lmb);
> +	if (nid < 0 || !node_possible(nid))
> +		nid = first_online_node;
> 
>   	/* Add the memory */
>   	rc = __add_memory(nid, lmb->base_addr, block_sz);
> 



More information about the Linuxppc-dev mailing list