[PATCH v3 3/3] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
Aneesh Kumar K.V
aneesh.kumar at linux.ibm.com
Thu Jun 27 02:58:43 AEST 2019
Vaibhav Jain <vaibhav at linux.ibm.com> writes:
> In some cases initial bind of scm memory for an lpar can fail if
> previously it wasn't released using a scm-unbind hcall. This situation
> can arise due to panic of the previous kernel or forced lpar
> fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
>
> To mitigate such cases the patch updates papr_scm_probe() to force a
> call to drc_pmem_unbind() in case the initial bind of scm memory fails
> with EBUSY error. In case scm-bind operation again fails after the
> forced scm-unbind then we follow the existing error path. We also
> update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
> and indicate it as a EBUSY error back to the caller.
>
> Suggested-by: "Oliver O'Halloran" <oohall at gmail.com>
> Signed-off-by: Vaibhav Jain <vaibhav at linux.ibm.com>
> Reviewed-by: Oliver O'Halloran <oohall at gmail.com>
> ---
> Change-log:
> v3:
> * Minor update to a code comment. [Oliver]
>
> v2:
> * Moved the retry code from drc_pmem_bind() to papr_scm_probe()
> [Oliver]
> * Changed the type of variable 'rc' in drc_pmem_bind() to
> int64_t. [Oliver]
> ---
> arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index c01a03fd3ee7..7c5e10c063a0 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -43,8 +43,9 @@ struct papr_scm_priv {
> static int drc_pmem_bind(struct papr_scm_priv *p)
> {
> unsigned long ret[PLPAR_HCALL_BUFSIZE];
> - uint64_t rc, token;
> uint64_t saved = 0;
> + uint64_t token;
> + int64_t rc;
>
> /*
> * When the hypervisor cannot map all the requested memory in a single
> @@ -64,6 +65,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
> } while (rc == H_BUSY);
>
> if (rc) {
> + /* H_OVERLAP needs a separate error path */
> + if (rc == H_OVERLAP)
> + return -EBUSY;
> +
> dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
> return -ENXIO;
> }
> @@ -331,6 +336,14 @@ static int papr_scm_probe(struct platform_device *pdev)
>
> /* request the hypervisor to bind this region to somewhere in memory */
> rc = drc_pmem_bind(p);
> +
> + /* If phyp says drc memory still bound then force unbound and retry */
> + if (rc == -EBUSY) {
> + dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> + drc_pmem_unbind(p);
This should only be caused by kexec right? And considering kernel nor
hypervisor won't change device binding details, can you check switching
this to H_SCM_QUERY_BLOCK_MEM_BINDING? Will that result in faster boot?
> + rc = drc_pmem_bind(p);
> + }
> +
> if (rc)
> goto err;
>
I am also not sure about the module reference count here. Should we
increment the module reference count after a bind so that we can track
failures in ubind and fail the module unload?
-aneesh
More information about the Linuxppc-dev
mailing list