[Skiboot] [PATCH] numa/associativity: Add a new level of NUMA for GPU's

Balbir Singh bsingharora at gmail.com
Thu Jul 6 11:57:54 AEST 2017


Today we have an issue where the NUMA nodes corresponding
to GPU's have the same affinity/distance as normal memory
nodes. Our reference-points today supports two levels
[0x4, 0x4] for normal systems and [0x4, 0x3] for Power8E
systems. This patch adds a new level [0x4, X, 0x2] and
uses node-id as at all levels for the GPU.

Cc: Reza Arbab <arbab at linux.vnet.ibm.com>
Cc: Alistair Popple <alistair at popple.id.au>
Cc: Benjamin Herrenschmidt <benh at kernel.crashing.org>

Signed-off-by: Balbir Singh <bsingharora at gmail.com>
---

Tested on a system, ensured existing nodes have a node
distances are not impacted. GPU nodes have a distance
of 80 w.r.t all other nodes. No changes are needed in
the Linux kernel.

 core/affinity.c              | 14 +++++++++-----
 doc/device-tree/ibm,opal.rst |  2 +-
 hw/npu2.c                    |  3 ++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/core/affinity.c b/core/affinity.c
index 9f489d3..10d483d 100644
--- a/core/affinity.c
+++ b/core/affinity.c
@@ -72,10 +72,10 @@ void add_associativity_ref_point(void)
 	/*
 	 * Note about our use of reference points:
 	 *
-	 * Linux currently supports two levels of NUMA. We use the first
-	 * reference point for the node ID and the second reference point
-	 * for a second level of affinity. We always use the chip ID (4)
-	 * for the first reference point.
+	 * Linux currently supports up to three levels of NUMA. We use the
+	 * first reference point for the node ID and the second reference
+	 * point for a second level of affinity. We always use the chip ID
+	 * (4) for the first reference point.
 	 *
 	 * Choosing the second level of affinity is model specific
 	 * unfortunately. Current POWER8E models should use the DCM
@@ -83,12 +83,16 @@ void add_associativity_ref_point(void)
 	 *
 	 * If there is a way to obtain this information from the FSP
 	 * that would be ideal, but for now hardwire our POWER8E setting.
+	 *
+	 * For GPU nodes we add a third level of NUMA, such that the
+	 * distance of the GPU node from all other nodes is uniformly
+	 * the highest.
 	 */
 	if (PVR_TYPE(mfspr(SPR_PVR)) == PVR_TYPE_P8E)
 		ref2 = 0x3;
 
 	dt_add_property_cells(opal_node, "ibm,associativity-reference-points",
-			      0x4, ref2);
+			      0x4, ref2, 0x2);
 }
 
 void add_chip_dev_associativity(struct dt_node *dev)
diff --git a/doc/device-tree/ibm,opal.rst b/doc/device-tree/ibm,opal.rst
index 149050c..932f41d 100644
--- a/doc/device-tree/ibm,opal.rst
+++ b/doc/device-tree/ibm,opal.rst
@@ -25,7 +25,7 @@ Top level ibm,opal node
     * ibm,opal-v2 is *NOT* present on POWER9 and above.
     */
 
-		ibm,associativity-reference-points = <0x4 0x3>;
+		ibm,associativity-reference-points = <0x4 0x3, 0x2>;
 		ibm,heartbeat-ms = <0x7d0>;
 
    /* how often any OPAL call needs to be made to avoid a watchdog timer on BMC
diff --git a/hw/npu2.c b/hw/npu2.c
index b81e49d..83451c3 100644
--- a/hw/npu2.c
+++ b/hw/npu2.c
@@ -521,7 +521,8 @@ static struct dt_node *npu2_create_memory_dn(uint64_t addr, uint64_t size)
 	dt_add_property_u64s(mem, "reg", addr, size);
 	dt_add_property_cells(mem, "ibm,chip-id", chip_id);
 	dt_add_property_u64s(mem, "linux,usable-memory", addr, 0);
-	dt_add_property_cells(mem, "ibm,associativity", 4, 0, 0, 0, chip_id--);
+	dt_add_property_cells(mem, "ibm,associativity", 4, chip_id, chip_id, chip_id, chip_id);
+	chip_id--;
 
 	assert(chip_id);
 	return mem;
-- 
2.9.4



More information about the Skiboot mailing list