[PATCH v3 3/3] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Thu Jun 27 02:58:43 AEST 2019


Vaibhav Jain <vaibhav at linux.ibm.com> writes:

> In some cases initial bind of scm memory for an lpar can fail if
> previously it wasn't released using a scm-unbind hcall. This situation
> can arise due to panic of the previous kernel or forced lpar
> fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
>
> To mitigate such cases the patch updates papr_scm_probe() to force a
> call to drc_pmem_unbind() in case the initial bind of scm memory fails
> with EBUSY error. In case scm-bind operation again fails after the
> forced scm-unbind then we follow the existing error path. We also
> update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
> and indicate it as a EBUSY error back to the caller.
>
> Suggested-by: "Oliver O'Halloran" <oohall at gmail.com>
> Signed-off-by: Vaibhav Jain <vaibhav at linux.ibm.com>
> Reviewed-by: Oliver O'Halloran <oohall at gmail.com>
> ---
> Change-log:
> v3:
> * Minor update to a code comment. [Oliver]
>
> v2:
> * Moved the retry code from drc_pmem_bind() to papr_scm_probe()
>   [Oliver]
> * Changed the type of variable 'rc' in drc_pmem_bind() to
>   int64_t. [Oliver]
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index c01a03fd3ee7..7c5e10c063a0 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -43,8 +43,9 @@ struct papr_scm_priv {
>  static int drc_pmem_bind(struct papr_scm_priv *p)
>  {
>  	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> -	uint64_t rc, token;
>  	uint64_t saved = 0;
> +	uint64_t token;
> +	int64_t rc;
>  
>  	/*
>  	 * When the hypervisor cannot map all the requested memory in a single
> @@ -64,6 +65,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
>  	} while (rc == H_BUSY);
>  
>  	if (rc) {
> +		/* H_OVERLAP needs a separate error path */
> +		if (rc == H_OVERLAP)
> +			return -EBUSY;
> +
>  		dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
>  		return -ENXIO;
>  	}
> @@ -331,6 +336,14 @@ static int papr_scm_probe(struct platform_device *pdev)
>  
>  	/* request the hypervisor to bind this region to somewhere in memory */
>  	rc = drc_pmem_bind(p);
> +
> +	/* If phyp says drc memory still bound then force unbound and retry */
> +	if (rc == -EBUSY) {
> +		dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> +		drc_pmem_unbind(p);


This should only be caused by kexec right? And considering kernel nor
hypervisor won't change device binding details, can you check switching
this to H_SCM_QUERY_BLOCK_MEM_BINDING?  Will that result in faster boot? 



> +		rc = drc_pmem_bind(p);
> +	}
> +
>  	if (rc)
>  		goto err;
>  

I am also not sure about the module reference count here. Should we
increment the module reference count after a bind so that we can track
failures in ubind and fail the module unload?

-aneesh



More information about the Linuxppc-dev mailing list