[Skiboot] [PATCH 1/2] sensors: occ: Fix the GPU detection code

Vaidyanathan Srinivasan svaidy at linux.ibm.com
Fri Apr 24 19:33:27 AEST 2020


* Gautham R Shenoy <ego at linux.vnet.ibm.com> [2020-04-24 12:11:14]:

> From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
> 
> commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu
> systems") assumes that presence of "ibm,power9-npu" compatible node
> indicates the presence of GPUs. However this is incorrect, as even
> OpenCAPI is supported via NPU. Thus ZZ systems, which have OpenCAPI
> connectors but not GPUs will have "ibm,power9-npu" compatible nodes.
> This results in OPAL creating device-tree entries for the GPU sensors
> on ZZ systems which don't even have GPUs.
> 
> This patch fixes the GPU detection code in occ-sensors, by first
> checking for "ibm,ioda2-npu2-phb" compatible node which indicates the
> presence of nvlink. Only if such a node exists, do we check with the
> OCC for presence of GPUs on systems to confirm the presence of the
> GPU. Otherwise, we cut the GPU sensors.
> 
> Thanks to Frederic Barrat <fbarrat at linux.ibm.com> for suggesting
> "ibm,ioda2-npu2-phb" for detecting the presence of nvlink GPUs.
> 
> Fixes: commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu
>         systems")
> Reported-by: Pavaman Subramaniyam <pavsubra at in.ibm.com>
> Tested-by: Pavaman Subramaniyam <pavsubra at in.ibm.com>
> Signed-off-by: Gautham R. Shenoy <ego at linux.vnet.ibm.com>

Reviewed-by: Vaidyanathan Srinivasan <svaidy at linux.ibm.com>

> ---
>  hw/occ-sensor.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/occ-sensor.c b/hw/occ-sensor.c
> index 524d00f..a5d0974 100644
> --- a/hw/occ-sensor.c
> +++ b/hw/occ-sensor.c
> @@ -521,8 +521,26 @@ bool occ_sensors_init(void)
>  	dt_add_property_cells(sg, "#address-cells", 1);
>  	dt_add_property_cells(sg, "#size-cells", 0);
> 
> -	if (dt_find_compatible_node(dt_root, NULL, "ibm,power9-npu"))
> -		has_gpu = true;
> +	/*
> +	 * On POWER9, ibm,ioda2-npu2-phb indicates the presence of a
> +	 * GPU NVlink.
> +	 */
> +	if (dt_find_compatible_node(dt_root, NULL, "ibm,ioda2-npu2-phb")) {
> +
> +		for_each_chip(chip) {
> +			int max_gpus_per_chip = 2, i;
                                                ^3  Max nr gpus is 3 :)

> +
> +			for(i = 0; i < max_gpus_per_chip; i++) {
                                        max_gpus_per_chip-1;  Index of GPU are 0,1,2


> +				has_gpu = occ_get_gpu_presence(chip, i);
> +
> +				if (has_gpu)
> +					break;
> +			}
> +
> +			if (has_gpu)
> +				break;
> +		}
> +	}
> 
>  	for_each_chip(chip) {
>  		struct occ_sensor_data_header *hb;

Thanks for the fix.  Minor nits above.

--Vaidy



More information about the Skiboot mailing list