[PATCH 1/3] sched/topology: Allow archs to populate distance map

Srikar Dronamraju srikar at linux.vnet.ibm.com
Fri May 21 01:44:25 AEST 2021


Currently scheduler populates the distance map by looking at distance
of each node from all other nodes. This should work for most
architectures and platforms.

However there are some architectures like POWER that may not expose
the distance of nodes that are not yet onlined because those resources
are not yet allocated to the OS instance. Such architectures have
other means to provide valid distance data for the current platform.

For example distance info from numactl from a fully populated 8 node
system at boot may look like this.

node distances:
node   0   1   2   3   4   5   6   7
  0:  10  20  40  40  40  40  40  40
  1:  20  10  40  40  40  40  40  40
  2:  40  40  10  20  40  40  40  40
  3:  40  40  20  10  40  40  40  40
  4:  40  40  40  40  10  20  40  40
  5:  40  40  40  40  20  10  40  40
  6:  40  40  40  40  40  40  10  20
  7:  40  40  40  40  40  40  20  10

However the same system when only two nodes are online at boot, then the
numa topology will look like
node distances:
node   0   1
  0:  10  20
  1:  20  10

It may be implementation dependent on what node_distance(0,3) where
node 0 is online and node 3 is offline. In POWER case, it returns
LOCAL_DISTANCE(10). Here at boot the scheduler would assume that the max
distance between nodes is 20. However that would not be true.

When Nodes are onlined and CPUs from those nodes are hotplugged,
the max node distance would be 40.

To handle such scenarios, let scheduler allow architectures to populate
the distance map. Architectures that like to populate the distance map
can overload arch_populate_distance_map().

Cc: LKML <linux-kernel at vger.kernel.org>
Cc: linuxppc-dev at lists.ozlabs.org
Cc: Nathan Lynch <nathanl at linux.ibm.com>
Cc: Michael Ellerman <mpe at ellerman.id.au>
Cc: Ingo Molnar <mingo at kernel.org>
Cc: Peter Zijlstra <peterz at infradead.org>
Cc: Valentin Schneider <valentin.schneider at arm.com>
Cc: Scott Cheloha <cheloha at linux.ibm.com>
Cc: Gautham R Shenoy <ego at linux.vnet.ibm.com>
Cc: Dietmar Eggemann <dietmar.eggemann at arm.com>
Cc: Mel Gorman <mgorman at techsingularity.net>
Cc: Vincent Guittot <vincent.guittot at linaro.org>
Cc: Rik van Riel <riel at surriel.com>
Cc: Geetika Moolchandani <Geetika.Moolchandani1 at ibm.com>
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1 at ibm.com>
Signed-off-by: Srikar Dronamraju <srikar at linux.vnet.ibm.com>
---
 kernel/sched/topology.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 053115b55f89..ccb9aff59add 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1630,6 +1630,26 @@ static void init_numa_topology_type(void)
 
 #define NR_DISTANCE_VALUES (1 << DISTANCE_BITS)
 
+#ifndef arch_populate_distance_map
+static int arch_populate_distance_map(unsigned long *distance_map)
+{
+	int i, j;
+
+	for (i = 0; i < nr_node_ids; i++) {
+		for (j = 0; j < nr_node_ids; j++) {
+			int distance = node_distance(i, j);
+
+			if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) {
+				sched_numa_warn("Invalid distance value range");
+				return -1;
+			}
+			bitmap_set(distance_map, distance, 1);
+		}
+	}
+	return 0;
+}
+#endif
+
 void sched_init_numa(void)
 {
 	struct sched_domain_topology_level *tl;
@@ -1646,18 +1666,10 @@ void sched_init_numa(void)
 		return;
 
 	bitmap_zero(distance_map, NR_DISTANCE_VALUES);
-	for (i = 0; i < nr_node_ids; i++) {
-		for (j = 0; j < nr_node_ids; j++) {
-			int distance = node_distance(i, j);
 
-			if (distance < LOCAL_DISTANCE || distance >= NR_DISTANCE_VALUES) {
-				sched_numa_warn("Invalid distance value range");
-				return;
-			}
+	if (arch_populate_distance_map(distance_map))
+		return;
 
-			bitmap_set(distance_map, distance, 1);
-		}
-	}
 	/*
 	 * We can now figure out how many unique distance values there are and
 	 * allocate memory accordingly.
-- 
2.27.0



More information about the Linuxppc-dev mailing list