[Skiboot] [PATCH] numa/associativity: Add a new level of NUMA for GPU's
Alistair Popple
alistair at popple.id.au
Thu Jul 6 16:06:40 AEST 2017
I am not familiar with how changing the associativity properties achieves the
desired outcome but the manipulation of the properties themselves looks correct
and I think the idea is a reasonable one.
Reviewed-by: Alistair Popple <alistair at popple.id.au>
On Thu, 6 Jul 2017 11:57:54 AM Balbir Singh wrote:
> Today we have an issue where the NUMA nodes corresponding
> to GPU's have the same affinity/distance as normal memory
> nodes. Our reference-points today supports two levels
> [0x4, 0x4] for normal systems and [0x4, 0x3] for Power8E
> systems. This patch adds a new level [0x4, X, 0x2] and
> uses node-id as at all levels for the GPU.
>
> Cc: Reza Arbab <arbab at linux.vnet.ibm.com>
> Cc: Alistair Popple <alistair at popple.id.au>
> Cc: Benjamin Herrenschmidt <benh at kernel.crashing.org>
>
> Signed-off-by: Balbir Singh <bsingharora at gmail.com>
> ---
>
> Tested on a system, ensured existing nodes have a node
> distances are not impacted. GPU nodes have a distance
> of 80 w.r.t all other nodes. No changes are needed in
> the Linux kernel.
>
> core/affinity.c | 14 +++++++++-----
> doc/device-tree/ibm,opal.rst | 2 +-
> hw/npu2.c | 3 ++-
> 3 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/core/affinity.c b/core/affinity.c
> index 9f489d3..10d483d 100644
> --- a/core/affinity.c
> +++ b/core/affinity.c
> @@ -72,10 +72,10 @@ void add_associativity_ref_point(void)
> /*
> * Note about our use of reference points:
> *
> - * Linux currently supports two levels of NUMA. We use the first
> - * reference point for the node ID and the second reference point
> - * for a second level of affinity. We always use the chip ID (4)
> - * for the first reference point.
> + * Linux currently supports up to three levels of NUMA. We use the
> + * first reference point for the node ID and the second reference
> + * point for a second level of affinity. We always use the chip ID
> + * (4) for the first reference point.
> *
> * Choosing the second level of affinity is model specific
> * unfortunately. Current POWER8E models should use the DCM
> @@ -83,12 +83,16 @@ void add_associativity_ref_point(void)
> *
> * If there is a way to obtain this information from the FSP
> * that would be ideal, but for now hardwire our POWER8E setting.
> + *
> + * For GPU nodes we add a third level of NUMA, such that the
> + * distance of the GPU node from all other nodes is uniformly
> + * the highest.
> */
> if (PVR_TYPE(mfspr(SPR_PVR)) == PVR_TYPE_P8E)
> ref2 = 0x3;
>
> dt_add_property_cells(opal_node, "ibm,associativity-reference-points",
> - 0x4, ref2);
> + 0x4, ref2, 0x2);
> }
>
> void add_chip_dev_associativity(struct dt_node *dev)
> diff --git a/doc/device-tree/ibm,opal.rst b/doc/device-tree/ibm,opal.rst
> index 149050c..932f41d 100644
> --- a/doc/device-tree/ibm,opal.rst
> +++ b/doc/device-tree/ibm,opal.rst
> @@ -25,7 +25,7 @@ Top level ibm,opal node
> * ibm,opal-v2 is *NOT* present on POWER9 and above.
> */
>
> - ibm,associativity-reference-points = <0x4 0x3>;
> + ibm,associativity-reference-points = <0x4 0x3, 0x2>;
> ibm,heartbeat-ms = <0x7d0>;
>
> /* how often any OPAL call needs to be made to avoid a watchdog timer on BMC
> diff --git a/hw/npu2.c b/hw/npu2.c
> index b81e49d..83451c3 100644
> --- a/hw/npu2.c
> +++ b/hw/npu2.c
> @@ -521,7 +521,8 @@ static struct dt_node *npu2_create_memory_dn(uint64_t addr, uint64_t size)
> dt_add_property_u64s(mem, "reg", addr, size);
> dt_add_property_cells(mem, "ibm,chip-id", chip_id);
> dt_add_property_u64s(mem, "linux,usable-memory", addr, 0);
> - dt_add_property_cells(mem, "ibm,associativity", 4, 0, 0, 0, chip_id--);
> + dt_add_property_cells(mem, "ibm,associativity", 4, chip_id, chip_id, chip_id, chip_id);
> + chip_id--;
>
> assert(chip_id);
> return mem;
>
More information about the Skiboot
mailing list