[PATCH v2 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()

Nathan Lynch nathanl at linux.ibm.com
Wed Apr 22 09:33:36 AEST 2020


Sam Bobroff <sbobroff at linux.ibm.com> writes:
> If a device is hot unplgged during EEH recovery, it's possible for the
> RTAS call to ibm,configure-pe in pseries_eeh_configure() to return
> parameter error (-3), however negative return values are not checked
> for and this leads to an infinite loop.
>
> Fix this by correctly bailing out on negative values.
>
> Signed-off-by: Sam Bobroff <sbobroff at linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index 893ba3f562c4..c4ef03bec0de 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -605,7 +605,7 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
>  				config_addr, BUID_HI(pe->phb->buid),
>  				BUID_LO(pe->phb->buid));
>  
> -		if (!ret)
> +		if (ret <= 0)
>  			return ret;

Note that this returns the firmware error value (e.g. -3 parameter
error) without converting it to a Linux errno. Nothing checks the error
value of this function as best I can tell, but -EINVAL would be better
than an implicit -ESRCH here.

And while this will behave correctly, the pr_warn() at the end of
pseries_eeh_configure_bridge() hints that someone had the intention
that this code should log a message on such an error:

static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
{
	int config_addr;
	int ret;
	/* Waiting 0.2s maximum before skipping configuration */
	int max_wait = 200;

	/* Figure out the PE address */
	config_addr = pe->config_addr;
	if (pe->addr)
		config_addr = pe->addr;

	while (max_wait > 0) {
		ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
				config_addr, BUID_HI(pe->phb->buid),
				BUID_LO(pe->phb->buid));

		if (!ret)
			return ret;

		/*
		 * If RTAS returns a delay value that's above 100ms, cut it
		 * down to 100ms in case firmware made a mistake.  For more
		 * on how these delay values work see rtas_busy_delay_time
		 */
		if (ret > RTAS_EXTENDED_DELAY_MIN+2 &&
		    ret <= RTAS_EXTENDED_DELAY_MAX)
			ret = RTAS_EXTENDED_DELAY_MIN+2;

		max_wait -= rtas_busy_delay_time(ret);

		if (max_wait < 0)
			break;

		rtas_busy_delay(ret);
	}

	pr_warn("%s: Unable to configure bridge PHB#%x-PE#%x (%d)\n",
		__func__, pe->phb->global_number, pe->addr, ret);
	return ret;
}

So perhaps the error path should be made to break out of the loop
instead of returning. Or is the parameter error result simply
uninteresting in this scenario?


More information about the Linuxppc-dev mailing list