[Skiboot] [PATCH 1/2] sensors: occ: Fix the GPU detection code
Vaidyanathan Srinivasan
svaidy at linux.ibm.com
Fri Apr 24 19:33:27 AEST 2020
* Gautham R Shenoy <ego at linux.vnet.ibm.com> [2020-04-24 12:11:14]:
> From: "Gautham R. Shenoy" <ego at linux.vnet.ibm.com>
>
> commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu
> systems") assumes that presence of "ibm,power9-npu" compatible node
> indicates the presence of GPUs. However this is incorrect, as even
> OpenCAPI is supported via NPU. Thus ZZ systems, which have OpenCAPI
> connectors but not GPUs will have "ibm,power9-npu" compatible nodes.
> This results in OPAL creating device-tree entries for the GPU sensors
> on ZZ systems which don't even have GPUs.
>
> This patch fixes the GPU detection code in occ-sensors, by first
> checking for "ibm,ioda2-npu2-phb" compatible node which indicates the
> presence of nvlink. Only if such a node exists, do we check with the
> OCC for presence of GPUs on systems to confirm the presence of the
> GPU. Otherwise, we cut the GPU sensors.
>
> Thanks to Frederic Barrat <fbarrat at linux.ibm.com> for suggesting
> "ibm,ioda2-npu2-phb" for detecting the presence of nvlink GPUs.
>
> Fixes: commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu
> systems")
> Reported-by: Pavaman Subramaniyam <pavsubra at in.ibm.com>
> Tested-by: Pavaman Subramaniyam <pavsubra at in.ibm.com>
> Signed-off-by: Gautham R. Shenoy <ego at linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy at linux.ibm.com>
> ---
> hw/occ-sensor.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/hw/occ-sensor.c b/hw/occ-sensor.c
> index 524d00f..a5d0974 100644
> --- a/hw/occ-sensor.c
> +++ b/hw/occ-sensor.c
> @@ -521,8 +521,26 @@ bool occ_sensors_init(void)
> dt_add_property_cells(sg, "#address-cells", 1);
> dt_add_property_cells(sg, "#size-cells", 0);
>
> - if (dt_find_compatible_node(dt_root, NULL, "ibm,power9-npu"))
> - has_gpu = true;
> + /*
> + * On POWER9, ibm,ioda2-npu2-phb indicates the presence of a
> + * GPU NVlink.
> + */
> + if (dt_find_compatible_node(dt_root, NULL, "ibm,ioda2-npu2-phb")) {
> +
> + for_each_chip(chip) {
> + int max_gpus_per_chip = 2, i;
^3 Max nr gpus is 3 :)
> +
> + for(i = 0; i < max_gpus_per_chip; i++) {
max_gpus_per_chip-1; Index of GPU are 0,1,2
> + has_gpu = occ_get_gpu_presence(chip, i);
> +
> + if (has_gpu)
> + break;
> + }
> +
> + if (has_gpu)
> + break;
> + }
> + }
>
> for_each_chip(chip) {
> struct occ_sensor_data_header *hb;
Thanks for the fix. Minor nits above.
--Vaidy
More information about the Skiboot
mailing list