[PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology
Srikar Dronamraju
srikar at linux.vnet.ibm.com
Mon Jul 20 18:10:52 AEST 2020
* Gautham R Shenoy <ego at linux.vnet.ibm.com> [2020-07-17 11:14:36]:
> Hi Srikar,
>
> On Tue, Jul 14, 2020 at 10:06:15AM +0530, Srikar Dronamraju wrote:
> > A new sched_domain_topology_level was added just for Power9. However the
> > same can be achieved by merging powerpc_topology with power9_topology
> > and makes the code more simpler especially when adding a new sched
> > domain.
> >
> > Cc: linuxppc-dev <linuxppc-dev at lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele at au1.ibm.com>
> > Cc: Nick Piggin <npiggin at au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh at au1.ibm.com>
> > Cc: Nathan Lynch <nathanl at linux.ibm.com>
> > Cc: Michael Neuling <mikey at linux.ibm.com>
> > Cc: Anton Blanchard <anton at au1.ibm.com>
> > Cc: Gautham R Shenoy <ego at linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy at linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar at linux.vnet.ibm.com>
> > ---
> > arch/powerpc/kernel/smp.c | 33 ++++++++++-----------------------
> > 1 file changed, 10 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 680c0edcc59d..069ea4b21c6d 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1315,7 +1315,7 @@ int setup_profiling_timer(unsigned int multiplier)
> > }
> >
> > #ifdef CONFIG_SCHED_SMT
> > -/* cpumask of CPUs with asymetric SMT dependancy */
> > +/* cpumask of CPUs with asymmetric SMT dependency */
> > static int powerpc_smt_flags(void)
> > {
> > int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> > @@ -1328,14 +1328,6 @@ static int powerpc_smt_flags(void)
> > }
> > #endif
> >
> > -static struct sched_domain_topology_level powerpc_topology[] = {
> > -#ifdef CONFIG_SCHED_SMT
> > - { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > -#endif
> > - { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > - { NULL, },
> > -};
> > -
> > /*
> > * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
> > * This topology makes it *much* cheaper to migrate tasks between adjacent cores
> > @@ -1353,7 +1345,13 @@ static int powerpc_shared_cache_flags(void)
> > */
> > static const struct cpumask *shared_cache_mask(int cpu)
> > {
> > - return cpu_l2_cache_mask(cpu);
> > + if (shared_caches)
> > + return cpu_l2_cache_mask(cpu);
> > +
> > + if (has_big_cores)
> > + return cpu_smallcore_mask(cpu);
> > +
> > + return cpu_smt_mask(cpu);
> > }
> >
> > #ifdef CONFIG_SCHED_SMT
> > @@ -1363,7 +1361,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
> > }
> > #endif
> >
> > -static struct sched_domain_topology_level power9_topology[] = {
> > +static struct sched_domain_topology_level powerpc_topology[] = {
>
>
> > #ifdef CONFIG_SCHED_SMT
> > { cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > #endif
> > @@ -1388,21 +1386,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > #ifdef CONFIG_SCHED_SMT
> > if (has_big_cores) {
> > pr_info("Big cores detected but using small core scheduling\n");
> I> - power9_topology[0].mask = smallcore_smt_mask;
> > powerpc_topology[0].mask = smallcore_smt_mask;
> > }
> > #endif
> > - /*
> > - * If any CPU detects that it's sharing a cache with another CPU then
> > - * use the deeper topology that is aware of this sharing.
> > - */
> > - if (shared_caches) {
> > - pr_info("Using shared cache scheduler topology\n");
> > - set_sched_topology(power9_topology);
> > - } else {
> > - pr_info("Using standard scheduler topology\n");
> > - set_sched_topology(powerpc_topology);
>
>
> Ok, so we will go with the three level topology by default (SMT,
> CACHE, DIE) and will rely on the sched-domain creation code to
> degenerate CACHE domain in case SMT and CACHE have the same set of
> CPUs (POWER8 for eg).
>
Right.
> From a cleanup perspective this is better, since we won't have to
> worry about defining multiple topology structures, but from a
> performance point of view, wouldn't we now pay an extra penalty of
> degenerating the CACHE domains on POWER8 kind of systems, each time
> when a CPU comes online ?
>
So if we end up either adding a topology definition for each of the new
topologies we support or we have to take the extra penalty.
But going ahead
> Do we know how bad it is ? If the degeneration takes a few extra
> microseconds, that should be ok I suppose.
>
It certainly will add to the penalty, I haven't captured per degeneration
statistics. However I ran an experiment where I run ppc64_cpu --smt=8 ,
followed by ppc64_cpu --smt=1 in a loop of 100 iterations.
On a Power8 System with 256 cpus 8 nodes.
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 8
Core(s) per socket: 4
Socket(s): 8
NUMA node(s): 8
Model: 2.1 (pvr 004b 0201)
Model name: POWER8 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-31
NUMA node1 CPU(s): 32-63
NUMA node2 CPU(s): 64-95
NUMA node3 CPU(s): 96-127
NUMA node4 CPU(s): 128-159
NUMA node5 CPU(s): 160-191
NUMA node6 CPU(s): 192-223
NUMA node7 CPU(s): 224-255
ppc64_cpu --smt=1
N Min Max Median Avg Stddev
x 100 38.17 53.78 46.81 46.6766 2.8421603
x 100 41.34 58.24 48.35 47.9649 3.6866087
ppc64_cpu --smt=8
N Min Max Median Avg Stddev
x 100 57.43 75.88 60.61 61.0246 2.418685
x 100 58.21 79.24 62.59 63.3326 3.4094558
But once we cleanup, we could add ways to fixup topologies so that we
reverse the overhead.
--
Thanks and Regards
Srikar Dronamraju
More information about the Linuxppc-dev
mailing list