[Skiboot] [PATCH] numa/associativity: Add a new level of NUMA for GPU's

Alistair Popple alistair at popple.id.au
Thu Jul 6 16:06:40 AEST 2017


I am not familiar with how changing the associativity properties achieves the
desired outcome but the manipulation of the properties themselves looks correct
and I think the idea is a reasonable one.

Reviewed-by: Alistair Popple <alistair at popple.id.au>

On Thu, 6 Jul 2017 11:57:54 AM Balbir Singh wrote:
> Today we have an issue where the NUMA nodes corresponding
> to GPU's have the same affinity/distance as normal memory
> nodes. Our reference-points today supports two levels
> [0x4, 0x4] for normal systems and [0x4, 0x3] for Power8E
> systems. This patch adds a new level [0x4, X, 0x2] and
> uses node-id as at all levels for the GPU.
> 
> Cc: Reza Arbab <arbab at linux.vnet.ibm.com>
> Cc: Alistair Popple <alistair at popple.id.au>
> Cc: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> 
> Signed-off-by: Balbir Singh <bsingharora at gmail.com>
> ---
> 
> Tested on a system, ensured existing nodes have a node
> distances are not impacted. GPU nodes have a distance
> of 80 w.r.t all other nodes. No changes are needed in
> the Linux kernel.
> 
>  core/affinity.c              | 14 +++++++++-----
>  doc/device-tree/ibm,opal.rst |  2 +-
>  hw/npu2.c                    |  3 ++-
>  3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/core/affinity.c b/core/affinity.c
> index 9f489d3..10d483d 100644
> --- a/core/affinity.c
> +++ b/core/affinity.c
> @@ -72,10 +72,10 @@ void add_associativity_ref_point(void)
>  	/*
>  	 * Note about our use of reference points:
>  	 *
> -	 * Linux currently supports two levels of NUMA. We use the first
> -	 * reference point for the node ID and the second reference point
> -	 * for a second level of affinity. We always use the chip ID (4)
> -	 * for the first reference point.
> +	 * Linux currently supports up to three levels of NUMA. We use the
> +	 * first reference point for the node ID and the second reference
> +	 * point for a second level of affinity. We always use the chip ID
> +	 * (4) for the first reference point.
>  	 *
>  	 * Choosing the second level of affinity is model specific
>  	 * unfortunately. Current POWER8E models should use the DCM
> @@ -83,12 +83,16 @@ void add_associativity_ref_point(void)
>  	 *
>  	 * If there is a way to obtain this information from the FSP
>  	 * that would be ideal, but for now hardwire our POWER8E setting.
> +	 *
> +	 * For GPU nodes we add a third level of NUMA, such that the
> +	 * distance of the GPU node from all other nodes is uniformly
> +	 * the highest.
>  	 */
>  	if (PVR_TYPE(mfspr(SPR_PVR)) == PVR_TYPE_P8E)
>  		ref2 = 0x3;
>  
>  	dt_add_property_cells(opal_node, "ibm,associativity-reference-points",
> -			      0x4, ref2);
> +			      0x4, ref2, 0x2);
>  }
>  
>  void add_chip_dev_associativity(struct dt_node *dev)
> diff --git a/doc/device-tree/ibm,opal.rst b/doc/device-tree/ibm,opal.rst
> index 149050c..932f41d 100644
> --- a/doc/device-tree/ibm,opal.rst
> +++ b/doc/device-tree/ibm,opal.rst
> @@ -25,7 +25,7 @@ Top level ibm,opal node
>      * ibm,opal-v2 is *NOT* present on POWER9 and above.
>      */
>  
> -		ibm,associativity-reference-points = <0x4 0x3>;
> +		ibm,associativity-reference-points = <0x4 0x3, 0x2>;
>  		ibm,heartbeat-ms = <0x7d0>;
>  
>     /* how often any OPAL call needs to be made to avoid a watchdog timer on BMC
> diff --git a/hw/npu2.c b/hw/npu2.c
> index b81e49d..83451c3 100644
> --- a/hw/npu2.c
> +++ b/hw/npu2.c
> @@ -521,7 +521,8 @@ static struct dt_node *npu2_create_memory_dn(uint64_t addr, uint64_t size)
>  	dt_add_property_u64s(mem, "reg", addr, size);
>  	dt_add_property_cells(mem, "ibm,chip-id", chip_id);
>  	dt_add_property_u64s(mem, "linux,usable-memory", addr, 0);
> -	dt_add_property_cells(mem, "ibm,associativity", 4, 0, 0, 0, chip_id--);
> +	dt_add_property_cells(mem, "ibm,associativity", 4, chip_id, chip_id, chip_id, chip_id);
> +	chip_id--;
>  
>  	assert(chip_id);
>  	return mem;
> 



More information about the Skiboot mailing list