[Skiboot] [PATCH v2 1/6] occ: Wait if OCC GPU presence status not immediately available

Frederic Barrat fbarrat at linux.ibm.com
Wed Aug 29 22:44:41 AEST 2018



Le 27/08/2018 à 10:55, Andrew Donnellan a écrit :
> It takes a few seconds for the OCC to set everything up in order to read
> GPU presence. At present, we try to kick off OCC initialisation as early as
> possible to maximise the time it has to read GPU presence.
> 
> Unfortunately sometimes that's not enough, so add a loop in
> occ_get_gpu_presence() so that on the first time we try to get GPU presence
> we keep trying for up to 2 seconds. Experimentally this seems to be
> adequate.
> 
> Fixes: 9b394a32c8ea ("occ: Add support for GPU presence detection")
> Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
> ---
>   hw/occ.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/occ.c b/hw/occ.c
> index a55bf8ed4f54..9fcac3f9581c 100644
> --- a/hw/occ.c
> +++ b/hw/occ.c
> @@ -1238,14 +1238,26 @@ exit:
>   bool occ_get_gpu_presence(struct proc_chip *chip, int gpu_num)
>   {
>   	struct occ_dynamic_data *ddata;
> +	static int max_retries = 20;
> +	static bool found = false;
> 
>   	assert(gpu_num <= 2);
> 
>   	ddata = get_occ_dynamic_data(chip);
> -
> -	if (ddata->major_version != 0 || ddata->minor_version < 1) {
> +	while (!found && max_retries) {
> +		if (ddata->major_version == 0 && ddata->minor_version >= 1) {
> +			found = true;
> +			break;
> +		}
>   		prlog(PR_INFO, "OCC: OCC not reporting GPU slot presence, "
> -		      "assuming device is present\n");
> +		      "waiting\n");

Do we really want to print up to 20 times the same message?
Other than that:
Reviewed-by: Frederic Barrat <fbarrat at linux.vnet.ibm.com>


> +		time_wait_ms(100);
> +		max_retries--;
> +		ddata = get_occ_dynamic_data(chip);
> +	}
> +
> +	if (!found) {
> +		prlog(PR_INFO, "OCC: No GPU slot presence, assuming GPU present\n");
>   		return true;
>   	}
> 



More information about the Skiboot mailing list