[PATCH v2 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()
Sam Bobroff
sbobroff at linux.ibm.com
Wed Apr 22 13:30:24 AEST 2020
On Tue, Apr 21, 2020 at 06:33:36PM -0500, Nathan Lynch wrote:
> Sam Bobroff <sbobroff at linux.ibm.com> writes:
> > If a device is hot unplgged during EEH recovery, it's possible for the
> > RTAS call to ibm,configure-pe in pseries_eeh_configure() to return
> > parameter error (-3), however negative return values are not checked
> > for and this leads to an infinite loop.
> >
> > Fix this by correctly bailing out on negative values.
> >
> > Signed-off-by: Sam Bobroff <sbobroff at linux.ibm.com>
> > ---
> > arch/powerpc/platforms/pseries/eeh_pseries.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> > index 893ba3f562c4..c4ef03bec0de 100644
> > --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> > +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> > @@ -605,7 +605,7 @@ static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
> > config_addr, BUID_HI(pe->phb->buid),
> > BUID_LO(pe->phb->buid));
> >
> > - if (!ret)
> > + if (ret <= 0)
> > return ret;
>
> Note that this returns the firmware error value (e.g. -3 parameter
> error) without converting it to a Linux errno. Nothing checks the error
> value of this function as best I can tell, but -EINVAL would be better
> than an implicit -ESRCH here.
Right, it's never used but I agree. I'll change it for v3.
> And while this will behave correctly, the pr_warn() at the end of
> pseries_eeh_configure_bridge() hints that someone had the intention
> that this code should log a message on such an error:
>
> static int pseries_eeh_configure_bridge(struct eeh_pe *pe)
> {
> int config_addr;
> int ret;
> /* Waiting 0.2s maximum before skipping configuration */
> int max_wait = 200;
>
> /* Figure out the PE address */
> config_addr = pe->config_addr;
> if (pe->addr)
> config_addr = pe->addr;
>
> while (max_wait > 0) {
> ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
> config_addr, BUID_HI(pe->phb->buid),
> BUID_LO(pe->phb->buid));
>
> if (!ret)
> return ret;
>
> /*
> * If RTAS returns a delay value that's above 100ms, cut it
> * down to 100ms in case firmware made a mistake. For more
> * on how these delay values work see rtas_busy_delay_time
> */
> if (ret > RTAS_EXTENDED_DELAY_MIN+2 &&
> ret <= RTAS_EXTENDED_DELAY_MAX)
> ret = RTAS_EXTENDED_DELAY_MIN+2;
>
> max_wait -= rtas_busy_delay_time(ret);
>
> if (max_wait < 0)
> break;
>
> rtas_busy_delay(ret);
> }
>
> pr_warn("%s: Unable to configure bridge PHB#%x-PE#%x (%d)\n",
> __func__, pe->phb->global_number, pe->addr, ret);
> return ret;
> }
>
> So perhaps the error path should be made to break out of the loop
> instead of returning. Or is the parameter error result simply
> uninteresting in this scenario?
Sounds reasonable to me, and given that the only way I know to trigger
the error path (see the commit message) is not going to be common, I
think a message is a good idea. (And, as one of the people likely to
debug a future issue here, I'll probably appreciate it.)
Cheers,
Sam.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20200422/efe64fe3/attachment.sig>
More information about the Linuxppc-dev
mailing list